Schema Editing Guide


This guide outlines the steps required to export the schema information for a file to Excel for editing and to import the edited information into Kourier for further processing.

Assumptions made in this Guide

  1. Microsoft Excel is installed on the user's workstation
  2. Microsoft SQL Server Management Studio is installed on the user's workstation

Procedure

  1. In the Kourier GUI, use the Quickstart Workbench to analyze the data file.
  1. On the Quickstart Workbench page, after the analysis is complete, click the Check Out button to view the resulting Schema in Microsoft Excel
  1. The following table describes the columns in the Excel spreadsheet, their meaning and how they are used by the Quick-Start tools. Many of the columns are informational only (highlighted as No in the table below, and shaded grey in the generated Schema workbook). The other columns highlighted as Yes  can be changed as noted by the "Change" column. The data in the Excel spreadsheet will be used when building the data export specifications and services when using the Generate Exports button in the Quickstart Workbench.
Column Description - Usage - Notes

Change

ItemID The item-id for each record exported.  It indicates the source file and the corresponding attribute number.  This column should not be changed, except when splitting a single delimited field into separate columns in SQL as described here. No *
Type The source dictionary type code.  This column should not be changed. No
Name The name of the source dictionary that was used to determine the contents of the Type, Conversion, Column_Heading, Format, Structure and Association columns. This column should not be changed. No
Conversion The conversion code found in the source dictionary.  Change this field as required. Any legal Multivalue conversion which is supported by the database may be used, as well as special Kourier conversions as described here. Yes
Column_Heading The column heading found in the source dictionary with special characters and spaces removed.  Edit this column to reflect the naming conventions used in your target SQL database (i.e., title casing, standard abbreviations, etc.). Yes
Format The format code found in the source dictionary.  This column should not be changed. No
Structure The structure code found in the source dictionary.  This column should not be changed. No
Association The association found in the source dictionary.  Edit this column to indicate if two or more multi-valued fields are associated with each other and should be grouped into a single "child" SQL table. All fields with the same Association will be placed into a single export, together with the primary key. Yes
DataType The data type field is derived from the conversion code found in the source dictionary.  Possible values are: CHARACTER, INTEGER, NUMERIC, CURRENCY, DATE and TIME.

Note: normally you do not need to change this field. If you need to change it, understand that the data type may be incompatible with the actual data in the file and if so the data might not load into the MS SQL Server database.

Yes
MVType The MVType field is derived from the actual structure of an attribute and will only contain a value if the analyzed value conflicts with structure code in the source dictionary.  Edit this column if you would like to override the structure code of the source dictionary. Valid entries are S (Single Valued), MV (MultiValued) and MS (MultiSubvalued). Yes
DataLen The data length field is derived from the actual length of the data in an attribute, value or sub-value.  This column will only contain a value if the actual data length of an attribute, value or sub-value exceeds the length specified in the format code in the source dictionary.  Edit this column if you would like to override the data length of the source dictionary when creating the associated SQL Column. Yes
Xclud The exclude field is derived from the actual data in an attribute and will contain a value of "Y" if all records analyzed contain no values for this attribute.  Edit this column if you would like to exclude additional fields, or if you would like to include fields which did not have any values at the time the analysis was performed. Yes
No_AM The number of records that contain a non-null attribute. No
No_VM The number of records that contain values for this attribute. No
No_SVM The number of records that contain sub-values for this attribute. No
No_NNUM The number of records that contain non-numeric data for this attribute.  This column will only contain a value if non-numeric data was found for a field with a data type of either TIME, DATE, CURRENCY, NUMERIC or INTEGER. No
!! Indicates that this row contains non-conforming data in either the MVType, DataLen or No_NNUM columns. No
  1. Once all edits are complete, save the Excel workbook in the ...\ODS\Schema folder, and then close Excel. (The workbench will not be able to upload the Schema if the Excel workbook is open.)
  1. On the Quickstart Workbench page of the Kourier GUI, click on the Check In button. In the resulting dialog box, either enter the full path to the edited and saved Excel workbook, or use the Browse button to locate the file. Then, click on the Continue button to upload the Schema to Kourier.

Handling Multi-Line Text or Comment fields

Text and Comment fields are often set up as multiple lines of free-form text in a single Multivalue attribute, with value marks, subvalue marks or text marks separating the lines. In most cases, these fields should be brought over to SQL as a single text field, rather than as multple lines of text. This may be accomplished using the Kourier @TEXT conversion. To accomplish this, change the MVType value to S, and the DataLen to 999. In the Conversion field, enter '@TEXT (the single quote preceding the @ sign prevents Excel from treating this entry as an invalid formula).

One variation on this scenario is when Text or Comments fields are part of an associated set of multvalued attributes, and the lines of text are delimited by value marks or text marks. In these cases, you would change the MVType to MV, make sure that the Association is set to group the field with the other associated fields, and use either @SVM2SPC or @TM2SPC conversions as required.

Splitting delimited fields to multiple SQL Columns - a special case

In many Multivalue applications, data is stored in an attribute or an Item ID as multiple segments delimited by a special character (such as an asterisk *, a period . or an exclamation point !. It is often desirable to split these fields into separate SQL columns to make the data more easily accessed in SQL.

For example a common method of storing a record's last update information is as a single "stamp" attribute, in which the user ID, the date and the time are separated by asterisks. If User JOHNDOE updated a record on January 31, 2011 at 1:35:42pm, the stamp would look like this:

JOHNDOE*15737*48942

This would not be particularly useful in SQL Server. So, in the Excel workbook, the following procedure may be used to split this data up into separate SQL columns for the Update User ID and the Update DateTime:

  1. Select the entire Excel row, and click the Copy button (or right-click and select Copy, or type <Ctrl>-C)
  2. Right-click on the selected row, and click on Insert Copied Cells. The result is two identical copies of the row.
  3. On the first of the two rows, make the following changes:
    1. Add .1 at the end of the Filename*AMC cell. In  other words, change FILENAME*0 to FILENAME*0.1, or change FILENAME*23 to FILENAME*23.1
    2. Change the Column Heading to an appropriate value (such as "LastUpdateUser" in our example).
    3. Add a G conversion to return just the 1st segment of the delimited value. In our example, the conversion would be G0*1. See the Conversion Codes page for more details.
    4. If any other conversions are required, add a pipe symbol "|" and the secondary conversion(s).
  4. On the second of the two rows, make the following very similar changes:
    1. Add .2 at the end of the Filename*AMC cell. In  other words, change FILENAME*0 to FILENAME*0.2, or change FILENAME*23 to FILENAME*23.2
    2. Change the Column Heading to an appropriate value (such as "LastUpdate" in our example).
    3. Add a G conversion to return the desired delimited segments. In our example, we would take advantage of another Kourier special conversion and use the conversion code G1*2. This will skip 1 element, and then return the next two elements (i.e. "15737*48942" in our example).
    4. If any other conversions are required, again add a pipe symbol and the conversions. In our example, we would add USQLDATETIME, which converts InternalDate*InternalTime into a SQL-compliant date format. So, on our second line the conversion would be "G1*2|USQLDATETIME".

Using this technique, a delimited field may be split into as many SQL columns as necessary.


User Guide - Table of Contents