Quick Start Workbench Schema Editing Guide.htm

Kourier Integrator Online Help

Quick Start Workbench Schema Editing Guide

While working with the Data Warehouse Quick Start Workbench or ODBC Quick Start Workbench, you will analyze a U2 or ODBC file in order to profile the data and then generate a schema for that file that will be later used to generate Export Specifications for that file.

This guide outlines the steps required to export the schema information for a file to Excel for editing, how to modify the schema, and then how to import the edited schema information into Kourier for further processing.

Assumptions made in this Guide

Microsoft Excel is installed on the user's workstation
Microsoft SQL Server Management Studio is installed on the user's workstation

Procedure

In the Kourier GUI, use the Quick Start Workbench or ODBC Quick Start Workbench to analyze the data file.
On the Quick Start Workbench page, after the analysis is complete, click the Check Out button to view the resulting Schema in Microsoft Excel.
The following table describes the columns in the Excel spreadsheet, their meaning and how they are used by the Quick-Start tools.
The columns highlighted as No in the table below are informational only, and are shaded grey in the generated Schema workbook.
The columns highlighted as Yes can be changed as noted by the "Change" column. The data in the Excel spreadsheet will be used when building the data export specifications and services when using the Generate Exports button in the Quick Start Workbench.

Column	Description - Usage - Notes	Change
Position	The item-id for each record exported.	Yes
Type	The source dictionary type code. This column should not be changed.	No
Name	The name of the source dictionary that was used to determine the contents of the Type, Conversion, Column_Heading, Format, Structure and Association columns. This column should not be changed.	No
Conversion	Enter any valid dictionary conversion code, substitution parameter, or a literal string. Multiple conversion may be separated by the \| (pipe) character.	Yes
Column Heading	The column heading found in the source dictionary with special characters and spaces removed. Edit this column to reflect the naming conventions used in your target SQL database (i.e., title casing, standard abbreviations, etc.).	Yes
Format	The format code found in the source dictionary. This column should not be changed.	No
Structure	The structure code found in the source dictionary. This column should not be changed.	No
Association	The association found in the source dictionary. Edit this column to indicate if two or more multi-valued fields are associated with each other and should be grouped into a single "child" SQL table. This also controls how export names (item-ids) will be generated. All fields with the same Association will be placed into a single export, together with the primary key.	Yes
Data Type	The data type field is derived from the conversion code found in the source dictionary. Edit this column if you would like to override the data length or data type of the source dictionary when creating the associated SQL Column. If you change it make sure it's one of the valid SQL Data Types. Normally you do not need to change this field. If you need to change it, understand that the data type may be incompatible with the actual data in the file and if so the data might not load into the MS SQL Server database.	Yes
MVType	The MVType field is derived from the actual structure of an attribute and will only contain a value if the analyzed value conflicts with structure code in the source dictionary. Edit this column if you would like to override the structure code of the source dictionary. Valid entries are S (Single Valued), MV (MultiValued) and MS (Multi-Subvalued).	Yes
Ignore	The Ignore field is derived from the actual data in an attribute and will contain a value of "Y" if all records analyzed contain no values for this attribute. Edit this column if you would like to Ignore additional fields, or if you would like to include fields which did not have any values at the time the analysis was performed.	Yes
No AM	The number of records that contain a non-null attribute.	No
No VM	The number of records that contain values for this attribute.	No
No SVM	The number of records that contain sub-values for this attribute.	No
Max DataLen	The data length field is derived from the actual length of the data in an attribute, value or sub-value. This column will only contain a value if the actual data length of an attribute, value or sub-value exceeds the length specified in the format code in the source dictionary	No
Non Numeric	The number of records that contain non-numeric data for this attribute. This column will only contain a value if non-numeric data was found for a field with a data type of either TIME, DATE, CURRENCY, NUMERIC or INTEGER.	No
Analysis Exceptions	Indicates what is wrong with the field (e.g. Structure, DataLen, DupColumn, and so on).	No
Comparison Exceptions	The Compare Exceptions field is linked to a new command button option called “Compare”. It compares information in all exports for an integration and selected schema name and reports back information such as NotUsed and DiffDataType.	No

Once all edits are complete, save the Excel workbook in the ...\ODS\Schema folder, and then close Excel. (The workbench will not be able to upload the Schema if the Excel workbook is open.)
On the Quick Start Workbench page of the Kourier GUI, click on the Check In button. In the resulting dialog box, either enter the full path to the edited and saved Excel workbook, or use the Browse button to locate the file. Then, click on the Continue button to upload the Schema to Kourier.

Handling Multi-Line Text or Comment fields

Text and Comment fields are often set up as multiple lines of free-form text in an attribute, with value marks, subvalue marks or text marks separating the lines. In most cases, these fields should be brought over to SQL as a single text field, rather than as multiple lines of text.

Easily combine multiple lines of MultiValue text into one SQL text field using the Kourier @TEXT conversion. To accomplish this, change the MVType value to S. In the Conversion field, enter '@TEXT (the single quote preceding the @ sign prevents Excel from treating this entry as an invalid formula).

One variation on this scenario is when Text or Comments fields are part of an associated set of MultiValue attributes, and the lines of text are delimited by value marks or text marks. In these cases, you would change the MVType to MV, make sure that the Association is set to group the field with the other associated fields, and use either @SVM2SPC or @TM2SPC conversions as required.

Splitting delimited fields to multiple SQL Columns

In many MultiValue applications, data is stored in an attribute or an Item ID as multiple segments delimited by a special character (such as an asterisk *, a period . or an exclamation point !. It is often desirable to split these fields into separate SQL columns to make the data more easily accessed in SQL.

For example a common method of storing a record's last update information is as a single "stamp" attribute, in which the user ID, the date and the time are separated by asterisks. If User JOHNDOE updated a record on January 31, 2011 at 1:35:42pm, the stamp would look like this:

JOHNDOE*15737*48942

This would not be particularly useful in SQL Server. So, in the Excel workbook, the following procedure may be used to split this data up into separate SQL columns for the Update User ID and the Update DateTime:

Select the entire Excel row, and click the Copy button (or right-click and select Copy, or type <Ctrl>-C)
Right-click on the selected row, and click on Insert Copied Cells. The result is two identical copies of the row.
On the first of the two rows, make the following changes:
1. Add .1 at the end of the Filename*AMC cell. In other words, change FILENAME*0 to FILENAME*0.1, or change FILENAME*23 to FILENAME*23.1
2. Change the Column Heading to an appropriate value (such as "LastUpdateUser" in our example).
3. Add a G conversion to return just the 1st segment of the delimited value. In our example, the conversion would be G0*1. See the Conversion Codes page for more details.
4. If any other conversions are required, add a pipe symbol "|" and the secondary conversion(s).
On the second of the two rows, make the following very similar changes:
1. Add .2 at the end of the Filename*AMC cell. In other words, change FILENAME*0 to FILENAME*0.2, or change FILENAME*23 to FILENAME*23.2
2. Change the Column Heading to an appropriate value (such as "LastUpdate" in our example).
3. Add a G conversion to return the desired delimited segments. In our example, we would take advantage of another Kourier special conversion and use the conversion code G1*2. This will skip 1 element, and then return the next two elements (i.e. "15737*48942" in our example).
4. If any other conversions are required, again add a pipe symbol and the conversions. In our example, we would add USQLDATETIME, which converts InternalDate*InternalTime into a SQL-compliant date format. So, on our second line the conversion would be "G1*2|USQLDATETIME".

Using this technique, a delimited field may be split into as many SQL columns as necessary.