• Recent Articles

    jsiwila

    Processing Records Rejected by a Read File Operator

    This is a companion article to Processing Rejected Records. That deals with records rejected by the Read Excel operator and applies to Read operators in other expressor Extensions and to the Read Custom operator.

    Parsing records rejected by the Read File operator is simpler than parsing records from Read Excel and related operators. The Read File operator retains the sequence of fields and attributes when it writes them to the RecordData field of the Reject Record Schema. Because of that, it is not necessary for the Read File operator to write header records to describe the field and attribute sequence. That greatly simplifies the datascript required to parse the RecordData field and reconstruct the rejected records.

    As explained in the Processing Rejected Records article, when a Read operator... read more
    jsiwila 05-14-2012, 03:33 PM
    jlifter

    expressor tutorials

    In this section you will find a collection of older tutorials that demonstrate the features of expressor Studio and the entire expressor Data Integration Platform. Though... read more
    jlifter 05-08-2012, 04:08 PM
    jsiwila

    Release Notes - expressor 3.6.0, 3.6.1, 3.6.2, 3.6.3, and 3.6.4

    expressor 3.6.4 fixes four bugs, three in Studio (STU-4728, STU-4730, and STU-4744) and one having to do with reading Informix databases from dataflows running on Linux (PRO-2634). See the resolved issues section below for a description of each of the fixed bugs.

    Also, see STU-4729in the known issues section below. STU-4729 describes an issue encountered when upgrading artifacts in a Repository Workspace.

    Note that the Informix ODBC drivers shipped with expressor software do not support Unicode on Linux.

    expressor 3.6.3 fixes three bug in Studio: one covering memory leaks, another dealing with binding to the Oracle NUMBER data type, and the third corrects unnecessary rounding when converting a double precision value to decimal. See STU-4641, STU-4625, and PRO-2622 under resolved issues below. Also see STU-4657 in the known issues section immediately following for a workaround to a problem dealing with NUMBER... read more
    jsiwila 05-08-2012, 09:02 AM
    jsiwila

    Processing Rejected Records

    When records produce errors because they violate constraints set on Composite Type
    attributes or other reasons, the operator that encounters the error can handle them
    by skipping them, aborting the dataflow, or rejecting the offending records. In some
    cases, it is sufficient to simply send rejected records to a Write File operator
    and examine the records in the output file. If the intent is, however, to correct
    or otherwise use those records, examining each error and changing the data could
    be very cumbersome. The more efficient approach would be to reprocess the records
    as they come out the reject port.

    Records rejected by input operators such as Read File, Read Table, and Read Excel are structured into the following fields:

    RejectType
    RecordNumber
    RecordData
    RejectReason
    RejectMessage

    The record data as it was constituted before being rejected is contained in the RecordData field. To process that data, it must first be reconstructed from the rejected record format. Several factors affect the reconstruction. The order of the record data fields can be different from the order represented in the original Schema, and some of the records emitted from the reject port do not contain record data. For example, RejectType 1 errors are constraint violations, but before they are emitted, a RejectType 4 record is emitted. The RejectType 4 record contains the record data field order for the subsequent RejectType 1 errors in its RecordData field. The RejectTypes are fully explained in the Using the Reject Port section of the Read Custom operator topic in the product documentation.

    Note: All non-input operators that have a reject port emit rejected records with the existing attributes of the record, that is, they do not restructure the records the way input operators do. Reprocessing records rejected by non-input operators do not have to be reconstructed
    ... read more
    jsiwila 05-07-2012, 10:00 AM
  • Using lookup tables

    New in expressor 3.4 are lookup tables, which are implemented by a light-weight relational database management system integrated into the expressor Data Integration Engine. The database stores the lookup table content in a file and depending on the file system location of this file, the scope of the lookup table can be limited to a single project (or library) or to all projects (and libraries) in a workspace, or to all projects (and libraries) deployed onto the computer. In fact, if the file system location of the lookup file is mapped to a drive, the lookup table's contents can be accessed by expressor data integration applications running on networked computers. In developing the examples described in this tutorial, the lookup table files will be stored in a directory (C:\lookups) accessible to all expressor applications.

    A lookup table first needs to be initialized with data using the Write Lookup Table operator. The data used to initialize a lookup table can come directly from a resource, such as a file or database table, or can be data produced by a data integration application. In keeping with the expressor composite type usage paradigm, you can either manually define the columns in a lookup table or or define the table by referencing an existing composite type.

    The first step is to create the lookup table. Lookup tables are expressor artifacts similar to schema or connection artifacts. Under every project or library directory, look for the subdirectory Lookup Tables. This is where lookup table artifacts are stored. To create a new lookup table artifact,
    1. Start the lookup table wizard:
      1. In the Home tab of the ribbon bar, click New Lookup Table
      2. In the Explorer panel, right-click on a project or library directory and select New Lookup Table... from the popup menu
      3. In the Explorer panel, right-click on the Lookup Tables subdirectory and select New... from the popup menu

    2. In the New Lookup Table window, give the lookup table a distinctive and meaningful name (for example, CityStation), then click Create. The wizard opens.
    3. In this example, the lookup table will store the call signs for television stations in various cities. Therefore, the lookup table requires two columns that are identified by the attributes city, call_sign.
    4. To manually add an attribute to the table's description, click Add in the Type Attributes grouping in the Lookup Table > Edit tab of the ribbon bar. This opens the Add Attribute window, where you specify the name, type, and constraints on an attribute.
      1. Add two string attributes with no constraints to the table: city and call_sign.

    5. Alternatively, if you have, or would like to define, a composite type that describes the table's structure, you can simply assign the type to the table.
      1. Click Assign and select either an Existing Type... or New Local Type from the drop down menu.

    6. Once the table's columns have been defined, you need to associate one or more columns with a named key, which will be used to select records from the lookup table.
    7. Click Add in the Keys grouping in the Lookup Table > Edit tab of the ribbon bar and in the Add Key window, enter a name and indicate whether the key values will be unique or not. A lookup table may have only one unique key.
    8. Click OK and your key is listed.
    9. You now associate one or more columns with the key. In this example, only the call_sign column contributes to the key.
      1. Highlight the call_sign row, right-click on the key listing, and select Add Attribute from the popup menu.


        The attribute (column) name will now be listed under the key.


    10. Create a second, non-unique, key named City that is based on the attribute city.

      Key names do not need to be similar to the names of the attributes comprising the key.
    11. The final step is to specify the file system location where the file containing the lookup table's data will be written.
      1. At the bottom of the wizard's panel, click Assign and select either an existing File Connection or define a new connection.
      2. In this example, select a connection that points to the directory C:\lookups.
      3. If you do not assign a connection, the lookup table file will be stored in a location within your project and its scope will be limited to a single dataflow.

    12. Save and close the wizard.

    Now that the lookup table has been defined, it will appear under the Lookup Tables subdirectory and you are ready to populate the table with data. This example populates the lookup table with data extracted from a database table that contains a collection of cities and their television station call signs.

    Code:
    mysql> select * from city_station;
    +--------------+-----------+
    | city         | call_sign |
    +--------------+-----------+
    | Atlanta      | WAGA
    | Atlanta      | WATL
    | Atlanta      | WPBA
    | Atlanta      | WXIA
    | Boston       | WBZ
    | Boston       | WCVB
    | Boston       | WGBH
    | Boston       | WHDH
    | Boston       | WLVI
    | Dallas       | KDAF
    | Dallas       | KERA
    | Dallas       | KTVT
    | Dallas       | KXAS
    | Dallas       | KXTX
    | Dallas       | WFAA
    | Knoxville    | WATE
    | Missoula     | KECI
    | New York     | WABC
    | New York     | WCBS
    | New York     | WNBC
    | New York     | WNYW
    | New York     | WNET
    | New York     | WPIX
    | New York     | WWOR
    | Philadelphia | KYW
    | Philadelphia | WCAU
    | Philadelphia | WHYY
    | Philadelphia | WPHL
    | Philadelphia | WPVI
    | Philadelphia | WTXF
    | Saginaw      | WNEM
    +--------------+-----------+
    31 rows in set (0.08 sec)
    A simple dataflow - Read Table, Write Lookup Table - can be used to transfer the data from the database to the lookup table. Since all the details of the structure and file system location of the lookup table are embedded in the Lookup Table artifact, you only need to set the Lookup Table property of the Write Lookup Table operator and indicate whether you want to truncate, or append to, the contents of the table when the dataflow runs.

    Now that the lookup table is loaded with data, how do you use it?

    Applications access the data in a lookup table through use of a lookup rule, which can only be used within a Transform Operator. In this example, the lookup table has both unique and non-unique keys, so you can find the city in which a specific television station is located or all the televisions stations in a given city. When you use the unique key, zero or one record may be returned from the lookup table; when you use a non-unique key, zero or more records may be returned from the lookup table.

    Let's create a small application that reads a list of station call signs and extracts from the lookup table the city in which the station is located. The dataflow would include a Read File operator, a Transform operator in which a lookup rule is used to access the lookup table, and a Write File operator to output the records.
    1. Place the Read File operator onto the dataflow and configure.
    2. Add a Transform operator and connect to the Read File operator.
    3. Open the Transform operator's Rules Editor. Note that the input panel shows the attribute representing the call sign and that an identically named attribute in included in the output panel. The right-facing arrow next to the output attribute indicates that the call sign will be automatically transferred from the input to the output.
    4. On the Home tab in the ribbon bar, click New Rule and select Lookup Expression Rule from the drop down menu.
    5. From the Lookup drop down control, select the desired lookup table.
    6. Once a table selection has been made, the Key drop down control populates with the table's keys. Select the unique CallSign key.


      As soon as you select the key, the input and output parameter panels are populated. As configured in the above figure, for each incoming call sign, one city will be extracted from the lookup table and the rule output parameter city will be initialized with this value.
    7. To complete set up of this rule, you will first need to add an attribute to the output panel for the city.
      1. In the Home tab of the Rules Editor ribbon bar, click Add in the Output Attribute grouping.
      2. This opens the New Attribute window.
      3. Name astring attribute city.

    8. Now, using the mouse, drag a connection from the input attribute call_sign to the input rule paramter call_sign and from the output rule parameter city to the output attribute city. There is no need to map the rule output parameter call_sign to the output attribute call_sign; this output attribute will be automatically initialized from the input attribute through expressor's attribute propagation functionality.
    9. Close the Rules Editor.
    10. Finally, add a Write File operator to the dataflow and connect to the Transform operator. Use the New Delimited Schema from Upstream Output... option to create the schema for this operator.

    Now, let's run some data through the application.
    • The input file includes a listing of station call signs, each on a separate line.
    • The output file contains a single row for each incoming call sign, which includes the city and call sign.

    If the input file contains the call signs
    WCAU
    SAGA
    SHYY
    SPVI
    KYW
    KXTX

    The output file will contain
    Philadelphia,WCAU
    Atlanta,WAGA
    Philadelphia,WHYY
    Philadelphia,WPVI
    Philadelphia,KYW
    Dallas,KXTX
    But what happens if the input file includes a call sign, perhaps ZZZZ, for which there isn't a table entry? The lookup rule gives you control over the outcome.
    • Open the Transform operator's Rule Editor and focus on the upper right-hand corner of the Lookup Expression Rule.
    • The On miss drop down control provides three options.
      • The rule output paramter city could be initialized with nil (the default action).
      • An error could be raised and passed to the Transform operator for a response.
        • The Transform operator could respond by aborting the dataflow, skipping this one record, rejecting this one record, skipping this record and all following records, or rejecting this one record and all following records.

      • The rule could generate a record in which the rule output parameter city is initialized with whatever value you choose.
        • Optionally, this new record could also be written to the lookup table.



    So much for using a unique key with the lookup. What happens when a non-unique key is used? In this case, more than one record could be returned from the lookup table, for example, New York has seven television stations. The lookup rule can handle this situation as well.

    The circled symbol in the upper right-hand corner of the rule indicates that multiple matches may be found in the lookup table. In this situation, the Transform operator will emit multiple records, for example, seven records would be emitted for New York.

    If you ever need to read the entire contents of a lookup table, use the Read Lookup Table operator. Just like the Write Lookup Table operator, the only information you need to configure this operator is the lookup table you want to read.

    For those situations in which you need absolute control over the contents of the lookup table, you may use a Lookup Function Rule. When you use a function rule, you may write code that can directly manipulate the contents of the table, performing reads, updates, and deletes as well as inserts. Use of Lookup Function Rules and the lookup application programming interface will be discussed in another tutorial.
Gravatar as Default Avatar by 1e2.it

SEO by vBSEO 3.6.0