TalendOpenStudio Components RG en 7.3.1

Talend Components
Reference Guide
7.3.1
Last updated: 2020-02-23
Contents
Copyleft........................................................................................................................ 77
tAccessBulkExec.......................................................................................................... 79
tAccessBulkExec Standard properties.......................................................................................................................79
Related scenarios............................................................................................................................................................. 81
tAccessClose................................................................................................................ 82
tAccessClose Standard properties..............................................................................................................................82
tAccessCommit............................................................................................................ 84
tAccessCommit Standard properties......................................................................................................................... 84
Related scenario............................................................................................................................................................... 85
tAccessConnection...................................................................................................... 86
tAccessConnection Standard properties.................................................................................................................. 86
Inserting data in parent/child tables........................................................................................................................87
tAccessInput.................................................................................................................91
tAccessInput Standard properties.............................................................................................................................. 91
tAccessOutput..............................................................................................................95
tAccessOutput Standard properties...........................................................................................................................95
Related scenarios...........................................................................................................................................................100
tAccessOutputBulk....................................................................................................101
tAccessOutputBulk Standard properties................................................................................................................101
tAccessOutputBulkExec............................................................................................104
tAccessOutputBulkExec Standard properties...................................................................................................... 104
tAccessRollback.........................................................................................................108
tAccessRollback Standard properties..................................................................................................................... 108
tAccessRow................................................................................................................ 110
tAccessRow Standard properties..............................................................................................................................110
tAddCRCRow..............................................................................................................114
tAddCRCRow Standard properties...........................................................................................................................114
Adding a surrogate key to a file..............................................................................................................................115
tAddLocationFromIP.................................................................................................118
tAddLocationFromIP Standard properties............................................................................................................ 118
Identifying a real-world geographic location of an IP.................................................................................... 119
tAdvancedFileOutputXML........................................................................................122
tAdvancedFileOutputXML Standard properties.................................................................................................. 122
Defining the XML tree.................................................................................................................................................125
Mapping XML data........................................................................................................................................................127
Defining the node status............................................................................................................................................ 127
Creating an XML file using a loop......................................................................................................................... 128
tAggregateRow..........................................................................................................133
tAggregateRow Standard properties...................................................................................................................... 133
Aggregating values and sorting data.....................................................................................................................135
tAggregateSortedRow.............................................................................................. 139
tAggregateSortedRow Standard properties......................................................................................................... 139
Sorting and aggregating the input data...............................................................................................................141
tAmazonAuroraClose................................................................................................ 146
tAmazonAuroraClose Standard properties........................................................................................................... 146
Related scenario.............................................................................................................................................................147
tAmazonAuroraCommit............................................................................................148
tAmazonAuroraCommit Standard properties.......................................................................................................148
tAmazonAuroraConnection......................................................................................150
tAmazonAuroraConnection Standard properties................................................................................................150
tAmazonAuroraInput................................................................................................ 153
tAmazonAuroraInput Standard properties............................................................................................................153
Handling data with Amazon Aurora....................................................................................................................... 156
tAmazonAuroraOutput............................................................................................. 163
tAmazonAuroraOutput Standard properties........................................................................................................ 163
tAmazonAuroraRollback.......................................................................................... 170
tAmazonAuroraRollback Standard properties..................................................................................................... 170
Related Scenario............................................................................................................................................................ 171
tAmazonEMRListInstances.......................................................................................172
tAmazonEMRListInstances Standard properties.................................................................................................172
tAmazonEMRManage................................................................................................174
tAmazonEMRManage Standard properties...........................................................................................................174
Managing an Amazon EMR cluster.........................................................................................................................178
tAmazonEMRResize.................................................................................................. 182
tAmazonEMRResize Standard properties..............................................................................................................182
tAmazonMysqlClose................................................................................................. 185
tAmazonMysqlClose Standard properties............................................................................................................. 185
tAmazonMysqlCommit............................................................................................. 187
tAmazonMysqlCommit Standard properties........................................................................................................ 187
tAmazonMysqlConnection....................................................................................... 189
tAmazonMysqlConnection Standard properties................................................................................................. 189
tAmazonMysqlInput..................................................................................................192
tAmazonMysqlInput Standard properties............................................................................................................. 192
tAmazonMysqlOutput...............................................................................................195
tAmazonMysqlOutput Standard properties.......................................................................................................... 195
tAmazonMysqlRollback............................................................................................201
tAmazonMysqlRollback Standard properties.......................................................................................................201
tAmazonMysqlRow................................................................................................... 203
tAmazonMysqlRow Standard properties............................................................................................................... 203
tAmazonOracleClose................................................................................................ 207
tAmazonOracleClose Standard properties............................................................................................................207
tAmazonOracleCommit............................................................................................ 209
tAmazonOracleCommit Standard properties....................................................................................................... 209
tAmazonOracleConnection...................................................................................... 211
tAmazonOracleConnection Standard properties................................................................................................ 211
tAmazonOracleInput.................................................................................................214
tAmazonOracleInput Standard properties............................................................................................................ 214
tAmazonOracleOutput..............................................................................................218
tAmazonOracleOutput Standard properties.........................................................................................................218
tAmazonOracleRollback........................................................................................... 224
tAmazonOracleRollback Standard properties......................................................................................................224
tAmazonOracleRow.................................................................................................. 226
tAmazonOracleRow Standard properties..............................................................................................................226
tAmazonRedshiftManage.........................................................................................230
tAmazonRedshiftManage Standard properties................................................................................................... 230
tApacheLogInput.......................................................................................................234
tApacheLogInput Standard properties...................................................................................................................234
Reading an Apache access-log file.........................................................................................................................235
tAS400Close.............................................................................................................. 237
tAS400Close Standard properties............................................................................................................................237
tAS400Commit.......................................................................................................... 239
tAS400Commit Standard properties....................................................................................................................... 239
tAS400Connection.................................................................................................... 241
tAS400Connection Standard properties................................................................................................................ 241
tAS400Input.............................................................................................................. 243
tAS400Input Standard properties............................................................................................................................ 243
Handling data with AS/400....................................................................................................................................... 245
tAS400LastInsertId................................................................................................... 250
tAS400LastInsertId Standard properties............................................................................................................... 250
tAS400Output........................................................................................................... 252
tAS400Output Standard properties.........................................................................................................................252
tAS400Rollback.........................................................................................................257
tAS400Rollback Standard properties..................................................................................................................... 257
tAS400Row................................................................................................................ 259
tAS400Row Standard properties..............................................................................................................................259
tAssert........................................................................................................................ 263
tAssert Standard properties....................................................................................................................................... 263
Viewing product orders status (on a daily basis) against a benchmark number....................................264
Setting up the assertive condition for a Job execution.................................................................................. 267
tAssertCatcher........................................................................................................... 273
tAssertCatcher Standard properties........................................................................................................................ 273
tAzureAdlsGen2Input............................................................................................... 275
tAzureAdlsGen2Input Standard properties...........................................................................................................275
tAzureAdlsGen2Output............................................................................................ 278
tAzureAdlsGen2Output Standard properties....................................................................................................... 278
Accessing Azure ADLS Gen2 storage..................................................................................................................... 280
tAzureStorageConnection........................................................................................ 283
tAzureStorageConnection Standard properties.................................................................................................. 283
tAzureStorageContainerCreate............................................................................... 285
tAzureStorageContainerCreate Standard properties.........................................................................................285
Creating a container in Azure Storage.................................................................................................................. 286
tAzureStorageContainerDelete............................................................................... 291
tAzureStorageContainerDelete Standard properties.........................................................................................291
tAzureStorageContainerExist.................................................................................. 293
tAzureStorageContainerExist Standard properties............................................................................................293
tAzureStorageContainerList.................................................................................... 295
tAzureStorageContainerList Standard properties.............................................................................................. 295
tAzureStorageDelete................................................................................................ 298
tAzureStorageDelete Standard properties............................................................................................................298
tAzureStorageGet......................................................................................................301
tAzureStorageGet Standard properties.................................................................................................................. 301
Retrieving files from a Azure Storage container............................................................................................... 303
tAzureStorageInputTable.........................................................................................310
tAzureStorageInputTable Standard properties................................................................................................... 310
Handling data with Microsoft Azure Table storage..........................................................................................313
tAzureStorageList..................................................................................................... 320
tAzureStorageList Standard properties..................................................................................................................320
tAzureStorageOutputTable......................................................................................323
tAzureStorageOutputTable Standard properties................................................................................................323
tAzureStoragePut......................................................................................................327
tAzureStoragePut Standard properties.................................................................................................................. 327
tAzureStorageQueueCreate..................................................................................... 330
tAzureStorageQueueCreate Standard properties............................................................................................... 330
tAzureStorageQueueDelete..................................................................................... 332
tAzureStorageQueueDelete Standard properties...............................................................................................332
tAzureStorageQueueInput....................................................................................... 334
tAzureStorageQueueInput Standard properties................................................................................................. 334
tAzureStorageQueueInputLoop...............................................................................337
tAzureStorageQueueInputLoop Standard properties........................................................................................337
tAzureStorageQueueList.......................................................................................... 340
tAzureStorageQueueList Standard properties.....................................................................................................340
tAzureStorageQueueOutput.................................................................................... 343
tAzureStorageQueueOutput Standard properties.............................................................................................. 343
tAzureStorageQueuePurge...................................................................................... 346
tAzureStorageQueuePurge Standard properties................................................................................................ 346
tBarChart....................................................................................................................348
tBarChart Standard properties.................................................................................................................................. 348
Creating a bar chart from the input data.............................................................................................................350
tBigQueryBulkExec................................................................................................... 357
tBigQueryBulkExec Standard properties............................................................................................................... 357
tBigQueryInput..........................................................................................................361
tBigQueryInput Standard properties.......................................................................................................................361
Performing a query in Google BigQuery.............................................................................................................. 364
tBigQueryOutput....................................................................................................... 368
tBigQueryOutput Standard properties................................................................................................................... 368
Writing data in Google BigQuery............................................................................................................................ 371
tBigQueryOutputBulk............................................................................................... 379
tBigQueryOutputBulk Standard properties.......................................................................................................... 379
tBigQuerySQLRow.....................................................................................................382
tBigQuerySQLRow Standard properties................................................................................................................ 382
tBonitaDeploy............................................................................................................385
tBonitaDeploy Standard properties.........................................................................................................................385
tBonitaInstantiateProcess........................................................................................387
tBonitaInstantiateProcess Standard properties.................................................................................................. 387
Executing a Bonita process via a Talend Job..................................................................................................... 390
Outputting the process instance UUID over the Row > Main link.............................................................. 395
tBoxConnection.........................................................................................................398
tBoxConnection Standard properties..................................................................................................................... 398
tBoxCopy....................................................................................................................400
tBoxCopy Standard properties..................................................................................................................................400
tBoxDelete................................................................................................................. 403
tBoxDelete Standard properties...............................................................................................................................403
tBoxGet...................................................................................................................... 405
tBoxGet Standard properties..................................................................................................................................... 405
tBoxList...................................................................................................................... 407
tBoxList Standard properties.....................................................................................................................................407
tBoxPut...................................................................................................................... 409
tBoxPut Standard properties..................................................................................................................................... 409
Uploading and downloading files from Box....................................................................................................... 411
tBufferInput............................................................................................................... 414
tBufferInput Standard properties.............................................................................................................................414
Retrieving bufferized data..........................................................................................................................................415
tBufferOutput............................................................................................................ 417
tBufferOutput Standard properties......................................................................................................................... 417
Buffering data..................................................................................................................................................................418
Buffering data to be used as a source system...................................................................................................420
Buffering output data on the webapp server..................................................................................................... 421
Calling a Job with context variables from a browser...................................................................................... 424
Calling a Job exported as Webservice in another Job..................................................................................... 426
tCassandraBulkExec..................................................................................................429
tCassandraBulkExec Standard properties............................................................................................................. 429
tCassandraClose........................................................................................................ 431
tCassandraClose Standard properties.................................................................................................................... 431
tCassandraConnection..............................................................................................432
tCassandraConnection Standard properties.........................................................................................................432
tCassandraInput........................................................................................................ 434
Mapping tables between Cassandra type and Talend data type.................................................................434
tCassandraInput Standard properties.....................................................................................................................435
Handling data with Cassandra..................................................................................................................................439
tCassandraOutput..................................................................................................... 445
tCassandraOutput Standard properties................................................................................................................. 445
tCassandraOutputBulk..............................................................................................451
tCassandraOutputBulk Standard properties.........................................................................................................451
tCassandraOutputBulkExec......................................................................................455
tCassandraOutputBulkExec Standard properties............................................................................................... 455
tCassandraRow.......................................................................................................... 459
tCassandraRow Standard properties.......................................................................................................................459
tChangeFileEncoding................................................................................................462
tChangeFileEncoding Standard properties...........................................................................................................462
Transforming the character encoding of a file.................................................................................................. 463
tChronometerStart.................................................................................................... 465
tChronometerStart Standard properties................................................................................................................465
tChronometerStop.................................................................................................... 466
tChronometerStop Standard properties................................................................................................................ 466
Measuring the processing time of a subJob and part of a subJob.............................................................. 467
tCloudStart.................................................................................................................471
tCloudStart Standard properties.............................................................................................................................. 471
tCloudStop................................................................................................................. 474
tCloudStop Standard properties...............................................................................................................................474
tCombinedSQLAggregate.........................................................................................476
tCombinedSQLAggregate Standard properties...................................................................................................476
Filtering and aggregating table columns directly on the DBMS................................................................. 478
tCombinedSQLFilter................................................................................................. 488
tCombinedSQLFilter Standard properties.............................................................................................................488
tCombinedSQLInput................................................................................................. 490
tCombinedSQLInput Standard properties.............................................................................................................490
tCombinedSQLOutput...............................................................................................492
tCombinedSQLOutput Standard properties......................................................................................................... 492
tContextDump........................................................................................................... 494
tContextDump Standard properties........................................................................................................................494
tContextLoad.............................................................................................................496
tContextLoad Standard properties.......................................................................................................................... 496
Reading data from different MySQL databases using dynamically loaded connection parameters..497
tConvertType............................................................................................................. 504
tConvertType Standard properties.......................................................................................................................... 504
Converting java types.................................................................................................................................................. 505
tCosmosDBBulkLoad................................................................................................ 510
tCosmosDBBulkLoad Standard properties............................................................................................................510
tCosmosDBConnection............................................................................................. 513
tCosmosDBConnection Standard properties........................................................................................................513
tCosmosDBInput........................................................................................................515
tCosmosDBInput Standard properties....................................................................................................................515
tCosmosDBOutput.....................................................................................................519
tCosmosDBOutput Standard properties................................................................................................................ 519
tCosmosDBRow......................................................................................................... 524
tCosmosDBRow Standard properties......................................................................................................................524
tCouchbaseDCPInput................................................................................................ 527
tCouchbaseDCPInput Standard properties........................................................................................................... 527
tCouchbaseDCPOutput............................................................................................. 529
tCouchbaseDCPOutput Standard properties........................................................................................................529
tCouchbaseInput....................................................................................................... 532
tCouchbaseInput Standard properties................................................................................................................... 532
tCouchbaseOutput.................................................................................................... 537
tCouchbaseOutput Standard properties................................................................................................................ 537
tCreateTable.............................................................................................................. 540
tCreateTable Standard properties........................................................................................................................... 540
Creating new table in a Mysql Database............................................................................................................. 544
tCreateTemporaryFile...............................................................................................546
tCreateTemporaryFile Standard properties..........................................................................................................546
Creating a temporary file and writing data into it........................................................................................... 547
tDB2BulkExec............................................................................................................553
tDB2BulkExec Standard properties.........................................................................................................................553
tDB2Close.................................................................................................................. 559
tDB2Close Standard properties................................................................................................................................ 559
tDB2Commit.............................................................................................................. 561
tDB2Commit Standard properties............................................................................................................................561
tDB2Connection........................................................................................................ 563
tDB2Connection Standard properties.....................................................................................................................563
tDB2Input...................................................................................................................566
tDB2Input Standard properties.................................................................................................................................566
tDB2Output................................................................................................................570
tDB2Output Standard properties............................................................................................................................. 570
tDB2Rollback.............................................................................................................576
tDB2Rollback Standard properties.......................................................................................................................... 576
tDB2Row.................................................................................................................... 578
tDB2Row Standard properties.................................................................................................................................. 578
tDB2SCD.....................................................................................................................582
tDB2SCD Standard properties...................................................................................................................................582
tDB2SCDELT.............................................................................................................. 586
tDB2SCDELT Standard properties........................................................................................................................... 586
Related Scenarios.......................................................................................................................................................... 590
tDB2SP....................................................................................................................... 591
tDB2SP Standard properties...................................................................................................................................... 591
Dynamic database components.............................................................................. 595
tDBBulkExec.............................................................................................................. 596
tDBBulkExec Standard properties........................................................................................................................... 596
tDBClose.....................................................................................................................597
tDBClose Standard properties...................................................................................................................................597
tDBColumnList.......................................................................................................... 598
tDBColumnList Standard properties....................................................................................................................... 598
tDBCommit.................................................................................................................599
tDBCommit Standard properties.............................................................................................................................. 599
tDBConnection.......................................................................................................... 600
tDBConnection Standard properties....................................................................................................................... 600
tDBInput.....................................................................................................................601
tDBInput Standard properties................................................................................................................................... 601
tDBLastInsertId......................................................................................................... 603
tDBLastInsertId Standard properties...................................................................................................................... 603
tDBOutput.................................................................................................................. 604
tDBOutput Standard properties................................................................................................................................604
tDBOutputBulk.......................................................................................................... 606
tDBOutputBulk Standard properties....................................................................................................................... 606
tDBOutputBulkExec.................................................................................................. 607
tDBOutputBulkExec Standard properties..............................................................................................................607
tDBRollback............................................................................................................... 608
tDBRollback Standard properties.............................................................................................................................608
tDBRow...................................................................................................................... 609
tDBRow Standard properties.....................................................................................................................................609
tDBSCD....................................................................................................................... 610
tDBSCD Standard properties..................................................................................................................................... 610
tDBSCDELT................................................................................................................ 611
tDBSCDELT Standard properties.............................................................................................................................. 611
tDBSP..........................................................................................................................612
tDBSP Standard properties.........................................................................................................................................612
tDBTableList.............................................................................................................. 613
tDBTableList Standard properties........................................................................................................................... 613
tDBFSConnection...................................................................................................... 614
tDBFSConnection Standard properties.................................................................................................................. 614
tDBFSGet....................................................................................................................615
tDBFSGet Standard properties..................................................................................................................................615
tDBFSPut....................................................................................................................617
tDBFSPut Standard properties.................................................................................................................................. 617
tDBSQLRow................................................................................................................619
tDBSQLRow Standard properties.............................................................................................................................619
Resetting a DB auto-increment................................................................................................................................621
tDenormalize............................................................................................................. 623
tDenormalize Standard properties.......................................................................................................................... 623
Denormalizing on one column................................................................................................................................. 624
Denormalizing on multiple columns......................................................................................................................626
tDenormalizeSortedRow.......................................................................................... 629
tDenormalizeSortedRow Standard properties.....................................................................................................629
Regrouping sorted rows.............................................................................................................................................. 630
tDie............................................................................................................................. 634
tDie Standard properties.............................................................................................................................................634
tDotNETInstantiate................................................................................................... 636
tDotNETInstantiate Standard properties...............................................................................................................636
tDotNETRow.............................................................................................................. 638
tDotNETRow Standard properties........................................................................................................................... 638
Integrating .Net into Talend Studio: Introduction............................................................................................ 640
Integrating .Net into Talend Studio: Prerequisites........................................................................................... 640
Integrating .Net into Talend Studio: configuring the Job...............................................................................641
Utilizing .NET in Talend..............................................................................................................................................643
tDropboxConnection.................................................................................................647
tDropboxConnection Standard properties............................................................................................................647
tDropboxDelete.........................................................................................................648
tDropboxDelete Standard properties..................................................................................................................... 648
tDropboxGet.............................................................................................................. 650
tDropboxGet Standard properties............................................................................................................................650
tDropboxList..............................................................................................................652
tDropboxList Standard properties........................................................................................................................... 652
tDropboxPut.............................................................................................................. 654
tDropboxPut Standard properties............................................................................................................................654
Uploading files to Dropbox....................................................................................................................................... 655
tDTDValidator............................................................................................................661
tDTDValidator Standard properties.........................................................................................................................661
Validating XML files..................................................................................................................................................... 662
Validating XML files..................................................................................................................................................... 662
tDynamoDBInput.......................................................................................................665
tDynamoDBInput Standard properties...................................................................................................................665
Writing and extracting JSON documents from DynamoDB............................................................................668
tDynamoDBOutput....................................................................................................675
tDynamoDBOutput Standard properties............................................................................................................... 675
tEDIFACTtoXML.........................................................................................................678
tEDIFACTtoXML Standard properties..................................................................................................................... 678
Reading an EDIFACT message file and saving it to XML...............................................................................679
tELTGreenplumInput................................................................................................ 682
tELTGreenplumInput Standard properties............................................................................................................682
tELTGreenplumMap.................................................................................................. 684
tELTGreenplumMap Standard properties..............................................................................................................684
Mapping data using a simple implicit join..........................................................................................................686
tELTGreenplumOutput............................................................................................. 694
tELTGreenplumOutput Standard properties........................................................................................................ 694
tELTHiveInput............................................................................................................697
tELTHiveInput Standard properties........................................................................................................................ 697
tELTHiveMap............................................................................................................. 699
tELTHiveMap Standard properties.......................................................................................................................... 699
Joining table columns and writing them into Hive.......................................................................................... 710
tELTHiveOutput.........................................................................................................718
tELTHiveOutput Standard properties..................................................................................................................... 718
tELTInput................................................................................................................... 721
tELTInput Standard properties................................................................................................................................. 721
tELTMap..................................................................................................................... 723
tELTMap Standard properties................................................................................................................................... 723
Aggregating Snowflake data using context variables as table and connection names.......................725
tELTOutput................................................................................................................ 730
tELTOutput Standard properties.............................................................................................................................. 730
tELTMSSqlInput........................................................................................................ 733
tELTMSSqlInput Standard properties.....................................................................................................................733
tELTMSSqlMap.......................................................................................................... 735
tELTMSSqlMap Standard properties.......................................................................................................................735
tELTMSSqlOutput......................................................................................................738
tELTMSSqlOutput Standard properties..................................................................................................................738
tELTMysqlInput......................................................................................................... 741
tELTMysqlInput Standard properties......................................................................................................................741
tELTMysqlMap...........................................................................................................743
tELTMysqlMap Standard properties........................................................................................................................743
Aggregating table columns and filtering............................................................................................................. 745
Mapping date using using an Alias table............................................................................................................ 749
tELTMysqlOutput...................................................................................................... 754
tELTMysqlOutput Standard properties.................................................................................................................. 754
tELTNetezzaInput..................................................................................................... 757
tELTNetezzaInput Standard properties..................................................................................................................757
tELTNetezzaMap....................................................................................................... 759
tELTNetezzaMap Standard properties....................................................................................................................759
tELTNetezzaOutput.................................................................................................. 762
tELTNetezzaOutput Standard properties.............................................................................................................. 762
tELTOracleInput........................................................................................................ 765
tELTOracleInput Standard properties.....................................................................................................................765
tELTOracleMap.......................................................................................................... 767
tELTOracleMap Standard properties.......................................................................................................................767
Updating Oracle database entries...........................................................................................................................769
tELTOracleOutput..................................................................................................... 773
tELTOracleOutput Standard properties................................................................................................................. 773
Managing data using the Oracle MERGE function............................................................................................775
tELTPostgresqlInput................................................................................................. 780
tELTPostgresqlInput Standard properties.............................................................................................................780
tELTPostgresqlMap...................................................................................................782
tELTPostgresqlMap Standard properties...............................................................................................................782
tELTPostgresqlOutput.............................................................................................. 785
tELTPostgresqlOutput Standard properties......................................................................................................... 785
tELTSybaseInput....................................................................................................... 788
tELTSybaseInput Standard properties....................................................................................................................788
tELTSybaseMap......................................................................................................... 790
tELTSybaseMap Standard properties......................................................................................................................790
tELTSybaseOutput.................................................................................................... 793
tELTSybaseOutput Standard properties................................................................................................................ 793
tELTTeradataInput.................................................................................................... 796
tELTTeradataInput Standard properties................................................................................................................796
tELTTeradataMap......................................................................................................798
tELTTeradataMap Standard properties..................................................................................................................798
Mapping data using a subquery.............................................................................................................................. 800
tELTTeradataOutput................................................................................................. 810
tELTTeradataOutput Standard properties.............................................................................................................810
tELTVerticaInput....................................................................................................... 813
tELTVerticaInput Standard properties....................................................................................................................813
tELTVerticaMap.........................................................................................................815
tELTVerticaMap Standard properties......................................................................................................................815
tELTVerticaOutput.................................................................................................... 818
tELTVerticaOutput Standard properties................................................................................................................ 818
tESBConsumer........................................................................................................... 821
tESBConsumer Standard properties........................................................................................................................821
Using tESBConsumer to retrieve the valid email..............................................................................................826
Using tESBConsumer with custom SOAP Headers............................................................................................833
tESBProviderFault.....................................................................................................844
tESBProviderFault Standard properties................................................................................................................. 844
Requesting airport names based on country codes......................................................................................... 845
tESBProviderRequest................................................................................................857
tESBProviderRequest Standard properties........................................................................................................... 857
Sending a message without expecting a response.......................................................................................... 859
tESBProviderResponse............................................................................................. 869
tESBProviderResponse Standard properties........................................................................................................ 869
Returning Hello world response..............................................................................................................................870
tEXABulkExec............................................................................................................ 881
tEXABulkExec Standard properties......................................................................................................................... 881
Settings for different sources of import data..................................................................................................... 886
Importing data into an EXASolution database table from a local CSV file..............................................889
tEXAClose...................................................................................................................895
tEXAClose Standard properties................................................................................................................................ 895
tEXACommit.............................................................................................................. 897
tEXACommit Standard properties............................................................................................................................897
tEXAConnection........................................................................................................ 899
tEXAConnection Standard properties.....................................................................................................................899
tEXAInput...................................................................................................................902
tEXAInput Standard properties.................................................................................................................................902
tEXAOutput................................................................................................................906
tEXAOutput Standard properties............................................................................................................................. 906
tEXARollback............................................................................................................. 912
tEXARollback Standard properties.......................................................................................................................... 912
tEXARow.................................................................................................................... 914
tEXARow Standard properties...................................................................................................................................914
tEXistConnection.......................................................................................................918
tEXistConnection Standard properties...................................................................................................................918
tEXistDelete...............................................................................................................920
tEXistDelete Standard properties............................................................................................................................ 920
tEXistGet.................................................................................................................... 922
tEXistGet Standard properties.................................................................................................................................. 922
Retrieving resources from a remote eXist DB server...................................................................................... 923
tEXistList....................................................................................................................926
tEXistList Standard properties.................................................................................................................................. 926
tEXistPut.................................................................................................................... 928
tEXistPut Standard properties...................................................................................................................................928
tEXistXQuery............................................................................................................. 930
tEXistXQuery Standard properties...........................................................................................................................930
tEXistXUpdate........................................................................................................... 932
tEXistXUpdate Standard properties........................................................................................................................ 932
tExternalSortRow......................................................................................................934
tExternalSortRow Standard properties.................................................................................................................. 934
tExtractDelimitedFields........................................................................................... 937
tExtractDelimitedFields Standard properties...................................................................................................... 937
Extracting a delimited string column of a database table............................................................................ 939
tExtractJSONFields....................................................................................................945
tExtractJSONFields Standard properties............................................................................................................... 945
Retrieving error messages while extracting data from JSON fields........................................................... 947
Collecting data from your favorite online social network............................................................................. 952
Extracting data from a JSON file through looping........................................................................................... 956
tExtractPositionalFields........................................................................................... 963
tExtractPositionalFields Standard properties......................................................................................................963
tExtractRegexFields..................................................................................................966
tExtractRegexFields Standard properties............................................................................................................. 966
Extracting name, domain and TLD from e-mail addresses............................................................................967
tExtractXMLField...................................................................................................... 971
tExtractXMLField Standard properties...................................................................................................................971
Extracting XML data from a field in a database table....................................................................................973
Extracting correct and erroneous data from an XML field in a delimited file........................................975
tFileArchive................................................................................................................979
tFileArchive Standard properties............................................................................................................................. 979
Zipping files using a tFileArchive........................................................................................................................... 981
tFileCompare............................................................................................................. 984
tFileCompare Standard properties.......................................................................................................................... 984
Comparing unzipped files...........................................................................................................................................985
tFileCopy.................................................................................................................... 988
tFileCopy Standard properties.................................................................................................................................. 988
Restoring files from bin.............................................................................................................................................. 990
tFileDelete................................................................................................................. 992
tFileDelete Standard properties...............................................................................................................................992
Deleting files................................................................................................................................................................... 993
tFileExist.................................................................................................................... 995
tFileExist Standard properties.................................................................................................................................. 995
Checking for the presence of a file and creating it if it does not exist.................................................... 996
tFileFetch.................................................................................................................1000
tFileFetch Standard properties.............................................................................................................................. 1000
Fetching data through HTTP.................................................................................................................................. 1003
Reusing stored cookie to fetch files through HTTP...................................................................................... 1005
Related scenario.......................................................................................................................................................... 1009
tFileInputARFF........................................................................................................ 1010
tFileInputARFF Standard properties.....................................................................................................................1010
Displaying the content of a ARFF file................................................................................................................ 1011
tFileInputDelimited................................................................................................ 1015
tFileInputDelimited Standard properties............................................................................................................1015
Reading data from a Delimited file and display the output.......................................................................1018
Reading data from a remote file in streaming mode....................................................................................1020
tFileInputExcel........................................................................................................ 1024
tFileInputExcel Standard properties.................................................................................................................... 1024
Related scenarios........................................................................................................................................................ 1027
tFileInputFullRow................................................................................................... 1028
tFileInputFullRow Standard properties...............................................................................................................1028
Reading full rows in a delimited file.................................................................................................................. 1029
tFileInputJSON........................................................................................................ 1032
tFileInputJSON Standard properties.....................................................................................................................1032
Extracting JSON data from a file using JSONPath without setting a loop node..................................1034
Extracting JSON data from a file using JSONPath..........................................................................................1037
Extracting JSON data from a file using XPath.................................................................................................1039
Extracting JSON data from a URL.........................................................................................................................1040
tFileInputLDIF......................................................................................................... 1045
tFileInputLDIF Standard properties......................................................................................................................1045
tFileInputMail..........................................................................................................1048
tFileInputMail Standard properties...................................................................................................................... 1048
Extracting key fields from an email.................................................................................................................... 1050
tFileInputMSDelimited...........................................................................................1052
tFileInputMSDelimited Standard properties..................................................................................................... 1052
The Multi Schema Editor......................................................................................................................................... 1053
Reading a multi structure delimited file............................................................................................................1054
tFileInputMSPositional.......................................................................................... 1061
tFileInputMSPositional Standard properties..................................................................................................... 1061
Reading data from a positional file.....................................................................................................................1063
tFileInputMSXML....................................................................................................1067
tFileInputMSXML Standard properties................................................................................................................1067
Reading a multi-structure XML file..................................................................................................................... 1068
tFileInputPositional................................................................................................1072
tFileInputPositional Standard properties........................................................................................................... 1072
Reading a Positional file and saving filtered results to XML.....................................................................1075
tFileInputProperties............................................................................................... 1079
tFileInputProperties Standard properties...........................................................................................................1079
Reading and matching the keys and the values of different .properties files and outputting the
results in a glossary...................................................................................................................................................1080
tFileInputRaw..........................................................................................................1085
tFileInputRaw Standard properties.......................................................................................................................1085
Related Scenario..........................................................................................................................................................1086
tFileInputRegex...................................................................................................... 1087
tFileInputRegex Standard properties...................................................................................................................1087
Reading data using a Regex and outputting the result to Positional file............................................. 1089
tFileInputXML......................................................................................................... 1092
tFileInputXML Standard properties...................................................................................................................... 1092
Reading and extracting data from an XML structure....................................................................................1095
Extracting erroneous XML data via a reject flow...........................................................................................1096
tFileList.................................................................................................................... 1100
tFileList Standard properties.................................................................................................................................. 1100
Iterating on a file directory.....................................................................................................................................1102
Finding duplicate files between two folders....................................................................................................1104
tFileOutputARFF..................................................................................................... 1110
tFileOutputARFF Standard properties................................................................................................................. 1110
tFileOutputDelimited............................................................................................. 1113
tFileOutputDelimited Standard properties........................................................................................................ 1113
Writing data in a delimited file.............................................................................................................................1116
Utilizing Output Stream to save filtered data to a local file......................................................................1120
tFileOutputExcel..................................................................................................... 1123
tFileOutputExcel Standard properties.................................................................................................................1123
tFileOutputJSON..................................................................................................... 1127
tFileOutputJSON Standard properties..................................................................................................................1127
Writing a JSON structured file............................................................................................................................... 1128
tFileOutputLDIF...................................................................................................... 1131
tFileOutputLDIF Standard properties.................................................................................................................. 1131
Writing data from a database table into an LDIF file...................................................................................1133
tFileOutputMSDelimited........................................................................................1138
tFileOutputMSDelimited Standard properties.................................................................................................. 1138
tFileOutputMSPositional....................................................................................... 1140
tFileOutputMSPositional Standard properties..................................................................................................1140
tFileOutputMSXML................................................................................................. 1142
tFileOutputMSXML Standard properties.............................................................................................................1142
Defining the MultiSchema XML tree................................................................................................................... 1143
Mapping XML data from multiple schema sources....................................................................................... 1144
Defining the node status......................................................................................................................................... 1145
tFileOutputPositional.............................................................................................1147
tFileOutputPositional Standard properties........................................................................................................1147
tFileOutputProperties............................................................................................ 1151
tFileOutputProperties Standard properties....................................................................................................... 1151
tFileOutputRaw.......................................................................................................1153
tFileOutputRaw Standard properties................................................................................................................... 1153
tFileOutputXML...................................................................................................... 1155
tFileOutputXML Standard properties...................................................................................................................1155
tFileProperties........................................................................................................ 1158
tFileProperties Standard properties..................................................................................................................... 1158
Displaying the properties of a processed file.................................................................................................. 1159
tFileRowCount.........................................................................................................1161
tFileRowCount Standard properties..................................................................................................................... 1161
Writing a file to MySQL if the number of its records matches a reference value............................... 1162
tFileTouch................................................................................................................1166
tFileTouch Standard properties............................................................................................................................. 1166
tFileUnarchive......................................................................................................... 1168
tFileUnarchive Standard properties......................................................................................................................1168
tFilterColumns........................................................................................................ 1170
tFilterColumns Standard properties..................................................................................................................... 1170
tFilterRow................................................................................................................ 1172
tFilterRow Standard properties..............................................................................................................................1172
Filtering a list of names using simple conditions.......................................................................................... 1173
Filtering a list of names through different logical operations.................................................................. 1177
tFirebirdClose..........................................................................................................1179
tFirebirdClose Standard properties.......................................................................................................................1179
tFirebirdCommit......................................................................................................1181
tFirebirdCommit Standard properties..................................................................................................................1181
tFirebirdConnection................................................................................................1183
tFirebirdConnection Standard properties...........................................................................................................1183
tFirebirdInput.......................................................................................................... 1185
tFirebirdInput Standard properties.......................................................................................................................1185
tFirebirdOutput....................................................................................................... 1189
tFirebirdOutput Standard properties....................................................................................................................1189
tFirebirdRollback.................................................................................................... 1194
tFirebirdRollback Standard properties................................................................................................................ 1194
tFirebirdRow............................................................................................................1196
tFirebirdRow Standard properties.........................................................................................................................1196
tFixedFlowInput......................................................................................................1200
tFixedFlowInput Standard properties..................................................................................................................1200
tFlowMeter.............................................................................................................. 1202
tFlowMeter Standard properties............................................................................................................................1202
tFlowMeterCatcher................................................................................................. 1204
tFlowMeterCatcher Standard properties.............................................................................................................1204
Catching flow metrics from a Job.........................................................................................................................1205
tFlowToIterate........................................................................................................ 1209
tFlowToIterate Standard properties..................................................................................................................... 1209
Transforming data flow to a list...........................................................................................................................1210
tForeach................................................................................................................... 1214
tForeach Standard properties................................................................................................................................. 1214
Iterating on a list and retrieving the values.................................................................................................... 1214
tFTPClose.................................................................................................................1217
tFTPClose Standard properties.............................................................................................................................. 1217
tFTPConnection...................................................................................................... 1218
tFTPConnection Standard properties...................................................................................................................1218
tFTPDelete...............................................................................................................1221
tFTPDelete Standard properties............................................................................................................................ 1221
tFTPFileExist........................................................................................................... 1225
tFTPFileExist Standard properties........................................................................................................................ 1225
tFTPFileList............................................................................................................. 1228
tFTPFileList Standard properties...........................................................................................................................1228
Listing and getting files/folders on an FTP directory...................................................................................1230
tFTPFileProperties..................................................................................................1236
tFTPFileProperties Standard properties..............................................................................................................1236
tFTPGet.................................................................................................................... 1239
tFTPGet Standard properties.................................................................................................................................. 1239
tFTPPut.................................................................................................................... 1243
tFTPPut Standard properties...................................................................................................................................1243
Putting files onto an FTP server...........................................................................................................................1246
tFTPRename............................................................................................................ 1250
tFTPRename Standard properties......................................................................................................................... 1250
Renaming a file located on an FTP server........................................................................................................1253
tFTPTruncate...........................................................................................................1256
tFTPTruncate Standard properties........................................................................................................................1256
tFuzzyMatch............................................................................................................ 1259
tFuzzyMatch Standard properties......................................................................................................................... 1259
Checking the Levenshtein distance of 0 in first names............................................................................... 1260
Checking the Levenshtein distance of 1 or 2 in first names......................................................................1263
Checking the Metaphonic distance in first name........................................................................................... 1264
tGoogleDataprocManage....................................................................................... 1266
tGoogleDataprocManage Standard properties................................................................................................. 1266
tGoogleDriveConnection........................................................................................1268
tGoogleDriveConnection Standard properties..................................................................................................1268
OAuth methods for accessing Google Drive.....................................................................................................1270
tGoogleDriveCopy...................................................................................................1280
tGoogleDriveCopy Standard properties...............................................................................................................1280
tGoogleDriveCreate................................................................................................ 1283
tGoogleDriveCreate Standard properties............................................................................................................1283
tGoogleDriveDelete................................................................................................ 1286
tGoogleDriveDelete Standard properties........................................................................................................... 1286
tGoogleDriveGet..................................................................................................... 1289
tGoogleDriveGet Standard properties..................................................................................................................1289
tGoogleDriveList..................................................................................................... 1292
tGoogleDriveList Standard properties................................................................................................................. 1292
tGoogleDrivePut..................................................................................................... 1295
tGoogleDrivePut Standard properties..................................................................................................................1295
Managing files with Google Drive........................................................................................................................1297
tGPGDecrypt............................................................................................................ 1306
tGPGDecrypt Standard properties......................................................................................................................... 1306
Decrypting a GnuPG-encrypted file and display its content.......................................................................1307
tGreenplumBulkExec..............................................................................................1311
tGreenplumBulkExec Standard properties.........................................................................................................1311
tGreenplumClose.................................................................................................... 1315
tGreenplumClose Standard properties................................................................................................................ 1315
tGreenplumCommit................................................................................................ 1317
tGreenplumCommit Standard properties............................................................................................................1317
tGreenplumConnection.......................................................................................... 1319
tGreenplumConnection Standard properties.................................................................................................... 1319
tGreenplumGPLoad................................................................................................ 1321
tGreenplumGPLoad Standard properties............................................................................................................1321
tGreenplumInput.....................................................................................................1327
tGreenplumInput Standard properties.................................................................................................................1327
tGreenplumOutput..................................................................................................1330
tGreenplumOutput Standard properties............................................................................................................. 1330
tGreenplumOutputBulk..........................................................................................1336
tGreenplumOutputBulk Standard properties.................................................................................................... 1336
tGreenplumOutputBulkExec..................................................................................1339
tGreenplumOutputBulkExec Standard properties........................................................................................... 1339
tGreenplumRollback...............................................................................................1342
tGreenplumRollback Standard properties..........................................................................................................1342
tGreenplumRow...................................................................................................... 1344
tGreenplumRow Standard properties.................................................................................................................. 1344
tGreenplumSCD.......................................................................................................1348
tGreenplumSCD Standard properties...................................................................................................................1348
tGroovy.....................................................................................................................1352
tGroovy Standard properties................................................................................................................................... 1352
Related Scenarios........................................................................................................................................................1353
tGroovyFile.............................................................................................................. 1354
tGroovyFile Standard properties............................................................................................................................1354
Calling a file which contains Groovy code........................................................................................................1355
tGSBucketCreate..................................................................................................... 1357
tGSBucketCreate Standard properties................................................................................................................. 1357
tGSBucketDelete.....................................................................................................1359
tGSBucketDelete Standard properties................................................................................................................. 1359
tGSBucketExist........................................................................................................1361
tGSBucketExist Standard properties.................................................................................................................... 1361
tGSBucketList.......................................................................................................... 1363
tGSBucketList Standard properties.......................................................................................................................1363
tGSClose...................................................................................................................1365
tGSClose Standard properties.................................................................................................................................1365
tGSConnection.........................................................................................................1366
tGSConnection Standard properties..................................................................................................................... 1366
tGSCopy....................................................................................................................1368
tGSCopy Standard properties..................................................................................................................................1368
tGSDelete.................................................................................................................1370
tGSDelete Standard properties.............................................................................................................................. 1370
tGSGet...................................................................................................................... 1372
tGSGet Standard properties.....................................................................................................................................1372
tGSList......................................................................................................................1375
tGSList Standard properties.................................................................................................................................... 1375
tGSPut...................................................................................................................... 1377
tGSPut Standard properties.....................................................................................................................................1377
Managing files with Google Cloud Storage...................................................................................................... 1378
tHashInput............................................................................................................... 1386
tHashInput Standard properties............................................................................................................................ 1386
Reading data from the cache memory for high-speed data access......................................................... 1387
Clearing the memory before loading data to it in case an iterator exists in the same subJob....... 1391
tHashOutput............................................................................................................ 1395
tHashOutput Standard properties......................................................................................................................... 1395
tHBaseClose.............................................................................................................1398
tHBaseClose Standard properties..........................................................................................................................1398
tHBaseConnection.................................................................................................. 1400
tHBaseConnection Standard properties..............................................................................................................1400
tHBaseInput.............................................................................................................1405
HBase filters.................................................................................................................................................................. 1405
tHBaseInput Standard properties..........................................................................................................................1406
Exchanging customer data with HBase..............................................................................................................1411
tHBaseOutput..........................................................................................................1419
tHBaseOutput Standard properties.......................................................................................................................1419
tHCatalogInput........................................................................................................1425
tHCatalogInput Standard properties.................................................................................................................... 1425
tHCatalogLoad........................................................................................................ 1431
tHCatalogLoad Standard properties.....................................................................................................................1431
tHCatalogOperation................................................................................................1436
tHCatalogOperation Standard properties...........................................................................................................1436
Managing HCatalog tables on Hortonworks Data Platform........................................................................1444
tHCatalogOutput.....................................................................................................1453
tHCatalogOutput Standard properties.................................................................................................................1453
tHDFSCompare........................................................................................................1460
tHDFSCompare Standard properties.................................................................................................................... 1460
tHDFSConnection....................................................................................................1466
tHDFSConnection Standard properties............................................................................................................... 1466
tHDFSCopy...............................................................................................................1473
tHDFSCopy Standard properties............................................................................................................................ 1473
tHDFSDelete............................................................................................................1479
tHDFSDelete Standard properties.........................................................................................................................1479
tHDFSExist...............................................................................................................1484
tHDFSExist Standard properties............................................................................................................................ 1484
Checking the existence of a file in HDFS......................................................................................................... 1489
tHDFSGet................................................................................................................. 1493
tHDFSGet Standard properties............................................................................................................................... 1493
Computing data with Hadoop distributed file system..................................................................................1498
tHDFSInput.............................................................................................................. 1505
tHDFSInput Standard properties........................................................................................................................... 1505
Using HDFS components to work with Azure Data Lake Storage (ADLS)..............................................1511
tHDFSList................................................................................................................. 1517
tHDFSList Standard properties...............................................................................................................................1517
Iterating on a HDFS directory................................................................................................................................ 1523
tHDFSOutput........................................................................................................... 1528
tHDFSOutput Standard properties........................................................................................................................1528
tHDFSOutputRaw....................................................................................................1535
tHDFSOutputRaw Standard properties............................................................................................................... 1535
tHDFSProperties..................................................................................................... 1542
tHDFSProperties Standard properties..................................................................................................................1542
tHDFSPut................................................................................................................. 1548
tHDFSPut Standard properties............................................................................................................................... 1548
tHDFSRename......................................................................................................... 1554
tHDFSRename Standard properties......................................................................................................................1554
tHDFSRowCount..................................................................................................... 1560
tHDFSRowCount Standard properties................................................................................................................. 1560
tHiveClose................................................................................................................1566
tHiveClose Standard properties............................................................................................................................. 1566
tHiveConnection..................................................................................................... 1568
tHiveConnection Standard properties................................................................................................................. 1568
Connecting to a custom Hadoop distribution.................................................................................................. 1579
Creating a partitioned Hive table......................................................................................................................... 1582
Creating a JDBC Connection to Azure HDInsight Hive................................................................................. 1589
tHiveCreateTable.................................................................................................... 1596
tHiveCreateTable Standard properties................................................................................................................1596
tHiveInput................................................................................................................1609
tHiveInput Standard properties............................................................................................................................. 1609
tHiveLoad.................................................................................................................1622
tHiveLoad Standard properties.............................................................................................................................. 1622
tHiveRow................................................................................................................. 1634
tHiveRow Standard properties............................................................................................................................... 1634
Connecting to a security-enabled MapR............................................................................................................1646
tHSQLDbInput......................................................................................................... 1650
tHSQLDbInput Standard properties......................................................................................................................1650
tHSQLDbOutput...................................................................................................... 1653
tHSQLDbOutput Standard properties.................................................................................................................. 1653
tHSQLDbRow...........................................................................................................1658
tHSQLDbRow Standard properties....................................................................................................................... 1658
tHttpRequest........................................................................................................... 1662
tHttpRequest Standard properties........................................................................................................................ 1662
Sending a HTTP request to the server and saving the response information to a local file.......... 1664
Sending a POST request from a local JSON file............................................................................................. 1666
tImpalaClose........................................................................................................... 1670
tImpalaClose Standard properties........................................................................................................................ 1670
tImpalaConnection................................................................................................. 1672
tImpalaConnection Standard properties.............................................................................................................1672
tImpalaCreateTable................................................................................................1676
tImpalaCreateTable Standard properties........................................................................................................... 1676
tImpalaInput............................................................................................................1683
tImpalaInput Standard properties.........................................................................................................................1683
tImpalaLoad............................................................................................................ 1688
tImpalaLoad Standard properties..........................................................................................................................1688
tImpalaOutput.........................................................................................................1693
tImpalaOutput Standard properties..................................................................................................................... 1693
tImpalaRow............................................................................................................. 1698
tImpalaRow Standard properties...........................................................................................................................1698
tInfiniteLoop............................................................................................................1704
tInfiniteLoop Standard properties.........................................................................................................................1704
tInformixBulkExec.................................................................................................. 1706
tInformixBulkExec Standard properties..............................................................................................................1706
tInformixClose.........................................................................................................1711
tInformixClose Standard properties..................................................................................................................... 1711
tInformixCommit.................................................................................................... 1713
tInformixCommit Standard properties................................................................................................................ 1713
tInformixConnection.............................................................................................. 1715
tInformixConnection Standard properties......................................................................................................... 1715
tInformixInput.........................................................................................................1717
tInformixInput Standard properties..................................................................................................................... 1717
tInformixOutput......................................................................................................1720
tInformixOutput Standard properties.................................................................................................................. 1720
tInformixOutputBulk.............................................................................................. 1726
tInformixOutputBulk Standard properties......................................................................................................... 1726
tInformixOutputBulkExec...................................................................................... 1729
tInformixOutputBulkExec Standard properties................................................................................................ 1729
tInformixRollback................................................................................................... 1733
tInformixRollback Standard properties...............................................................................................................1733
tInformixRow.......................................................................................................... 1735
tInformixRow Standard properties....................................................................................................................... 1735
tInformixSCD........................................................................................................... 1739
tInformixSCD Standard properties........................................................................................................................1739
tInformixSP..............................................................................................................1743
tInformixSP Standard properties...........................................................................................................................1743
tIngresBulkExec...................................................................................................... 1747
tIngresBulkExec Standard properties.................................................................................................................. 1747
tIngresClose.............................................................................................................1751
tIngresClose Standard properties..........................................................................................................................1751
tIngresCommit.........................................................................................................1753
tIngresCommit Standard properties..................................................................................................................... 1753
tIngresConnection.................................................................................................. 1755
tIngresConnection Standard properties.............................................................................................................. 1755
tIngresInput.............................................................................................................1757
tIngresInput Standard properties.......................................................................................................................... 1757
tIngresOutput..........................................................................................................1761
tIngresOutput Standard properties.......................................................................................................................1761
tIngresOutputBulk.................................................................................................. 1766
tIngresOutputBulk Standard properties..............................................................................................................1766
tIngresOutputBulkExec.......................................................................................... 1769
tIngresOutputBulkExec Standard properties.....................................................................................................1769
Loading data to a table in the Ingres DBMS................................................................................................... 1772
tIngresRollback....................................................................................................... 1775
tIngresRollback Standard properties................................................................................................................... 1775
tIngresRow.............................................................................................................. 1777
tIngresRow Standard properties............................................................................................................................1777
tIngresSCD............................................................................................................... 1781
tIngresSCD Standard properties............................................................................................................................ 1781
tInterbaseClose....................................................................................................... 1784
tInterbaseClose Standard properties................................................................................................................... 1784
tInterbaseCommit................................................................................................... 1786
tInterbaseCommit Standard properties...............................................................................................................1786
tInterbaseConnection.............................................................................................1788
tInterbaseConnection Standard properties........................................................................................................1788
tInterbaseInput....................................................................................................... 1790
tInterbaseInput Standard properties....................................................................................................................1790
tInterbaseOutput.................................................................................................... 1794
tInterbaseOutput Standard properties................................................................................................................ 1794
tInterbaseRollback..................................................................................................1800
tInterbaseRollback Standard properties............................................................................................................. 1800
tInterbaseRow......................................................................................................... 1802
tInterbaseRow Standard properties......................................................................................................................1802
tIntervalMatch.........................................................................................................1806
tIntervalMatch Standard properties..................................................................................................................... 1806
Identifying server locations based on their IP addresses............................................................................ 1807
tIterateToFlow........................................................................................................ 1811
tIterateToFlow Standard properties..................................................................................................................... 1811
Transforming a list of files as data flow........................................................................................................... 1812
tJasperOutput..........................................................................................................1815
tJasperOutput Standard properties.......................................................................................................................1815
Generating a report against a .jrxml template................................................................................................ 1817
tJasperOutputExec..................................................................................................1820
tJasperOutputExec Standard properties..............................................................................................................1820
tJava......................................................................................................................... 1822
tJava Standard properties.........................................................................................................................................1822
Printing out a variable content............................................................................................................................. 1823
tJavaDBInput........................................................................................................... 1827
tJavaDBInput Standard properties........................................................................................................................ 1827
tJavaDBOutput........................................................................................................ 1830
tJavaDBOutput Standard properties..................................................................................................................... 1830
tJavaDBRow.............................................................................................................1834
tJavaDBRow Standard properties.......................................................................................................................... 1834
tJavaFlex.................................................................................................................. 1837
tJavaFlex Standard properties................................................................................................................................ 1837
Generating data flow................................................................................................................................................. 1838
Processing rows of data with tJavaFlex............................................................................................................. 1841
tJavaRow..................................................................................................................1845
tJavaRow Standard properties................................................................................................................................1845
Transforming data line by line using tJavaRow.............................................................................................. 1847
tJDBCClose...............................................................................................................1850
tJDBCClose Standard properties............................................................................................................................ 1850
tJDBCColumnList.................................................................................................... 1852
tJDBCColumnList Standard properties.................................................................................................................1852
tJDBCCommit...........................................................................................................1854
tJDBCCommit Standard properties........................................................................................................................1854
tJDBCConnection.................................................................................................... 1856
tJDBCConnection Standard properties.................................................................................................................1856
Importing a database driver................................................................................................................................... 1858
tJDBCInput............................................................................................................... 1861
tJDBCInput Standard properties.............................................................................................................................1861
tJDBCOutput............................................................................................................ 1865
tJDBCOutput Standard properties......................................................................................................................... 1865
tJDBCRollback......................................................................................................... 1870
tJDBCRollback Standard properties...................................................................................................................... 1870
tJDBCRow.................................................................................................................1872
tJDBCRow Standard properties.............................................................................................................................. 1872
tJDBCSCDELT...........................................................................................................1876
tJDBCSCDELT Standard properties....................................................................................................................... 1876
Tracking data changes in a Snowflake table using the tJDBCSCDELT component............................ 1879
tJDBCSP....................................................................................................................1889
tJDBCSP Standard properties.................................................................................................................................. 1889
tJDBCTableList........................................................................................................ 1893
tJDBCTableList Standard properties.....................................................................................................................1893
tJIRAInput................................................................................................................ 1895
tJIRAInput Standard properties.............................................................................................................................. 1895
Retrieving the project information from JIRA application...........................................................................1896
tJIRAOutput............................................................................................................. 1899
tJIRAOutput Standard properties...........................................................................................................................1899
Creating an issue in JIRA application..................................................................................................................1900
Updating an issue in JIRA application................................................................................................................ 1903
tJMSInput.................................................................................................................1908
tJMSInput Standard properties...............................................................................................................................1908
tJMSOutput..............................................................................................................1911
tJMSOutput Standard properties........................................................................................................................... 1911
Enqueuing/dequeuing a message on the ActiveMQ server.........................................................................1912
tJoin.......................................................................................................................... 1916
tJoin Standard properties......................................................................................................................................... 1916
Doing an exact match on two columns and outputting the main and rejected data........................ 1917
tKafkaCommit......................................................................................................... 1922
tKafkaCommit Standard properties...................................................................................................................... 1922
tKafkaConnection................................................................................................... 1923
tKafkaConnection Standard properties............................................................................................................... 1923
Kafka and AVRO in a Job......................................................................................................................................... 1924
tKafkaCreateTopic.................................................................................................. 1926
tKafkaCreateTopic Standard properties..............................................................................................................1926
tKafkaInput..............................................................................................................1928
tKafkaInput Standard properties........................................................................................................................... 1928
tKafkaOutput...........................................................................................................1932
tKafkaOutput Standard properties........................................................................................................................1932
tLDAPAttributesInput.............................................................................................1935
tLDAPAttributesInput Standard properties........................................................................................................ 1935
tLDAPClose..............................................................................................................1939
tLDAPClose Standard properties........................................................................................................................... 1939
tLDAPConnection....................................................................................................1940
tLDAPConnection Standard properties................................................................................................................1940
tLDAPInput.............................................................................................................. 1942
tLDAPInput Standard properties............................................................................................................................1942
Displaying LDAP directory's filtered content................................................................................................... 1944
tLDAPOutput........................................................................................................... 1947
tLDAPOutput Standard properties........................................................................................................................ 1947
Editing data in a LDAP directory.......................................................................................................................... 1950
tLDAPRenameEntry................................................................................................ 1953
tLDAPRenameEntry Standard properties............................................................................................................1953
tLibraryLoad............................................................................................................ 1956
tLibraryLoad Standard properties......................................................................................................................... 1956
Importing an external library................................................................................................................................. 1957
Checking the format of an e-mail address........................................................................................................1958
tLineChart................................................................................................................ 1961
tLineChart Standard properties..............................................................................................................................1961
Creating a line chart to ease trend analysis.................................................................................................... 1963
tLogCatcher............................................................................................................. 1970
tLogCatcher Standard properties.......................................................................................................................... 1970
Catching messages triggered by a tWarn component.................................................................................. 1971
Catching the message triggered by a tDie component................................................................................ 1973
tLogRow...................................................................................................................1977
tLogRow Standard properties.................................................................................................................................1977
tLoop........................................................................................................................ 1979
tLoop Standard properties.......................................................................................................................................1979
Executing a Job multiple times using a loop...................................................................................................1980
tMap......................................................................................................................... 1983
tMap Standard properties........................................................................................................................................ 1983
Mapping data using a filter and a simple explicit join................................................................................ 1985
Advanced mapping with lookup reload at each row.....................................................................................2003
Mapping with join output tables.......................................................................................................................... 2010
tMapRDBClose........................................................................................................ 2015
tMapRDBClose Standard properties..................................................................................................................... 2015
tMapRDBConnection.............................................................................................. 2017
tMapRDBConnection Standard properties......................................................................................................... 2017
tMapRDBInput.........................................................................................................2022
tMapRDBInput Standard properties..................................................................................................................... 2022
tMapRDBOutput......................................................................................................2028
tMapRDBOutput Standard properties.................................................................................................................. 2028
tMapROjaiInput.......................................................................................................2033
tMapROjaiInput Standard properties................................................................................................................... 2033
tMapROjaiOutput....................................................................................................2036
tMapROjaiOutput Standard properties................................................................................................................2036
Writing candidate data in a MapR-DB OJAI database................................................................................... 2039
tMapRStreamsCommit........................................................................................... 2043
tMapRStreamsCommit Standard properties...................................................................................................... 2043
tMapRStreamsConnection..................................................................................... 2044
tMapRStreamsConnection Standard properties............................................................................................... 2044
tMapRStreamsCreateStream................................................................................. 2047
tMapRStreamsCreateStream Standard properties...........................................................................................2047
tMapRStreamsInput................................................................................................2050
tMapRStreamsInput Standard properties........................................................................................................... 2050
tMapRStreamsOutput.............................................................................................2055
tMapRStreamsOutput Standard properties........................................................................................................2055
tMarketoBulkExec...................................................................................................2058
tMarketoBulkExec Standard properties.............................................................................................................. 2058
tMarketoConnection...............................................................................................2061
tMarketoConnection Standard properties.......................................................................................................... 2061
tMarketoCampaign................................................................................................. 2063
tMarketoCampaign Standard properties.............................................................................................................2063
tMarketoInput......................................................................................................... 2067
tMarketoInput Standard properties...................................................................................................................... 2067
tMarketoListOperation...........................................................................................2073
tMarketoListOperation Standard properties......................................................................................................2073
Adding a lead record to a Marketo list using SOAP API.............................................................................. 2075
tMarketoOutput...................................................................................................... 2078
tMarketoOutput Standard properties...................................................................................................................2078
Transmitting data with Marketo using REST API........................................................................................... 2081
tMarkLogicBulkLoad...............................................................................................2087
tMarkLogicBulkLoad Standard properties..........................................................................................................2087
tMarkLogicClose..................................................................................................... 2090
tMarkLogicClose Standard properties..................................................................................................................2090
tMarkLogicConnection........................................................................................... 2092
tMarkLogicConnection Standard properties......................................................................................................2092
tMarkLogicInput......................................................................................................2094
tMarkLogicInput Standard properties..................................................................................................................2094
tMarkLogicOutput...................................................................................................2097
tMarkLogicOutput Standard properties.............................................................................................................. 2097
tMaxDBInput........................................................................................................... 2100
tMaxDBInput Standard properties........................................................................................................................ 2100
tMaxDBOutput........................................................................................................ 2103
tMaxDBOutput Standard properties.....................................................................................................................2103
tMaxDBRow.............................................................................................................2107
tMaxDBRow Standard properties.......................................................................................................................... 2107
tMDMBulkLoad....................................................................................................... 2110
tMDMBulkLoad Standard properties....................................................................................................................2110
Loading records into a business entity.............................................................................................................. 2113
tMDMClose.............................................................................................................. 2118
tMDMClose Standard properties............................................................................................................................2118
tMDMCommit.......................................................................................................... 2120
tMDMCommit Standard properties.......................................................................................................................2120
tMDMConnection.................................................................................................... 2122
tMDMConnection Standard properties................................................................................................................2122
tMDMDelete............................................................................................................ 2124
tMDMDelete Standard properties......................................................................................................................... 2124
Deleting master data from an MDM Hub.......................................................................................................... 2128
tMDMInput.............................................................................................................. 2135
tMDMInput Standard properties............................................................................................................................2135
Reading master data from an MDM hub........................................................................................................... 2139
tMDMOutput............................................................................................................2142
tMDMOutput Standard properties.........................................................................................................................2142
Examples of partial update operations using tMDMOutput....................................................................... 2147
Writing master data in an MDM hub...................................................................................................................2153
Removing master data partially from the MDM hub.....................................................................................2158
tMDMReceive.......................................................................................................... 2165
tMDMReceive Standard properties....................................................................................................................... 2165
Extracting information from an MDM record in XML................................................................................... 2167
tMDMRollback.........................................................................................................2171
tMDMRollback Standard properties..................................................................................................................... 2171
tMDMRouteRecord................................................................................................. 2173
tMDMRouteRecord Standard properties............................................................................................................. 2173
Routing an update report record to Event Manager..................................................................................... 2175
tMDMSP................................................................................................................... 2179
tMDMSP Standard properties................................................................................................................................. 2179
Executing a stored procedure using tMDMSP..................................................................................................2180
tMDMTriggerInput..................................................................................................2186
tMDMTriggerInput Standard properties..............................................................................................................2186
Exchanging the event information about an MDM record..........................................................................2188
tMDMTriggerOutput...............................................................................................2197
tMDMTriggerOutput Standard properties.......................................................................................................... 2197
tMDMViewSearch................................................................................................... 2199
tMDMViewSearch Standard properties............................................................................................................... 2199
Retrieving records from an MDM hub via an existing view....................................................................... 2203
tMemorizeRows...................................................................................................... 2206
tMemorizeRows Standard properties...................................................................................................................2206
Retrieving the different ages and lowest age data....................................................................................... 2207
tMicrosoftCrmInput................................................................................................ 2213
tMicrosoftCrmInput Standard properties............................................................................................................2213
Writing data in a Microsoft CRM database and putting conditions on columns to extract specified
rows...................................................................................................................................................................................2217
tMicrosoftCrmOutput............................................................................................. 2223
tMicrosoftCrmOutput Standard properties........................................................................................................ 2223
tMicrosoftMQInput................................................................................................. 2227
tMicrosoftMQInput Standard properties.............................................................................................................2227
Writing and fetching queuing messages from Microsoft message queue............................................. 2228
tMicrosoftMQOutput.............................................................................................. 2233
tMicrosoftMQOutput Standard properties..........................................................................................................2233
tMomCommit...........................................................................................................2235
tMomCommit Standard properties....................................................................................................................... 2235
tMomConnection.................................................................................................... 2237
tMomConnection Standard properties................................................................................................................ 2237
tMomInput...............................................................................................................2240
tMomInput Standard properties............................................................................................................................ 2240
Asynchronous communication via a MOM server...........................................................................................2246
Transmitting XML files via a MOM server.........................................................................................................2249
tMomMessageIdList............................................................................................... 2255
tMomMessageIdList Standard properties...........................................................................................................2255
tMomOutput............................................................................................................ 2257
tMomOutput Standard properties......................................................................................................................... 2257
tMomRollback......................................................................................................... 2263
tMomRollback Standard properties......................................................................................................................2263
tMondrianInput....................................................................................................... 2265
tMondrianInput Standard properties................................................................................................................... 2265
Extracting multi-dimenstional datasets from a MySQL database (Cross-join tables)........................2267
tMongoDBBulkLoad............................................................................................... 2270
tMongoDBBulkLoad Standard properties...........................................................................................................2270
Importing data into MongoDB database............................................................................................................2273
tMongoDBClose...................................................................................................... 2281
tMongoDBClose Standard properties...................................................................................................................2281
tMongoDBConnection............................................................................................ 2282
tMongoDBConnection Standard properties.......................................................................................................2282
tMongoDBGridFSDelete.........................................................................................2285
tMongoDBGridFSDelete Standard properties................................................................................................... 2285
tMongoDBGridFSGet.............................................................................................. 2288
tMongoDBGridFSGet Standard properties..........................................................................................................2288
tMongoDBGridFSList.............................................................................................. 2292
tMongoDBGridFSList Standard properties......................................................................................................... 2292
tMongoDBGridFSProperties.................................................................................. 2296
tMongoDBGridFSProperties Standard properties............................................................................................ 2296
tMongoDBGridFSPut.............................................................................................. 2300
tMongoDBGridFSPut Standard properties..........................................................................................................2300
Managing files using MongoDB GridFS..............................................................................................................2302
tMongoDBInput.......................................................................................................2311
tMongoDBInput Standard properties...................................................................................................................2311
Retrieving data from a collection by advanced queries...............................................................................2315
tMongoDBOutput....................................................................................................2319
tMongoDBOutput Standard properties................................................................................................................2319
Creating a collection and writing data to it.....................................................................................................2323
Upserting records in a collection..........................................................................................................................2328
tMongoDBRow........................................................................................................ 2336
tMongoDBRow Standard properties.....................................................................................................................2336
Using MongoDB functions to create a collection and write data to it................................................... 2339
tMsgBox................................................................................................................... 2345
tMsgBox Standard properties................................................................................................................................. 2345
'Hello world!' type test............................................................................................................................................. 2346
tMSSqlBulkExec...................................................................................................... 2348
tMSSqlBulkExec Standard properties.................................................................................................................. 2348
tMSSqlClose............................................................................................................ 2353
tMSSqlClose Standard properties..........................................................................................................................2353
tMSSqlColumnList.................................................................................................. 2355
tMSSqlColumnList Standard properties..............................................................................................................2355
tMSSqlCommit........................................................................................................ 2358
tMSSqlCommit Standard properties.....................................................................................................................2358
tMSSqlConnection.................................................................................................. 2360
tMSSqlConnection Standard properties..............................................................................................................2360
Inserting data into a database table and extracting useful information from it.................................2362
tMSSqlInput.............................................................................................................2368
tMSSqlInput Standard properties..........................................................................................................................2368
tMSSqlLastInsertId................................................................................................. 2372
tMSSqlLastInsertId Standard properties.............................................................................................................2372
tMSSqlOutput..........................................................................................................2375
tMSSqlOutput Standard properties.......................................................................................................................2375
tMSSqlOutputBulk.................................................................................................. 2382
tMSSqlOutputBulk Standard properties..............................................................................................................2382
tMSSqlOutputBulkExec..........................................................................................2385
tMSSqlOutputBulkExec Standard properties.................................................................................................... 2385
tMSSqlRollback.......................................................................................................2390
tMSSqlRollback Standard properties................................................................................................................... 2390
tMSSqlRow.............................................................................................................. 2392
tMSSqlRow Standard properties............................................................................................................................2392
tMSSqlSCD...............................................................................................................2397
tMSSqlSCD Standard properties............................................................................................................................ 2397
tMSSqlSP................................................................................................................. 2401
tMSSqlSP Standard properties............................................................................................................................... 2401
Retrieving personal information using a stored procedure........................................................................ 2404
tMSSqlTableList......................................................................................................2410
tMSSqlTableList Standard properties.................................................................................................................. 2410
tMysqlBulkExec.......................................................................................................2412
tMysqlBulkExec Standard Properties................................................................................................................... 2412
tMysqlClose............................................................................................................. 2416
tMysqlClose Standard properties.......................................................................................................................... 2416
tMysqlColumnList...................................................................................................2418
tMysqlColumnList Standard properties...............................................................................................................2418
Iterating on a DB table and listing its column names................................................................................. 2419
tMysqlCommit......................................................................................................... 2423
tMysqlCommit Standard properties......................................................................................................................2423
tMysqlConnection...................................................................................................2425
tMysqlConnection Standard properties...............................................................................................................2425
Inserting data in mother/daughter tables......................................................................................................... 2426
Sharing a database connection between a parent Job and child Job......................................................2430
tMysqlInput............................................................................................................. 2437
tMysqlInput Standard properties...........................................................................................................................2437
Writing columns from a MySQL database to an output file using tMysqlInput...................................2440
Using context parameters when reading a table from a database.......................................................... 2443
Reading data from databases through context-based dynamic connections....................................... 2446
tMysqlLastInsertId.................................................................................................. 2453
tMysqlLastInsertId Standard properties..............................................................................................................2453
Getting the ID for the last inserted record with tMysqlLastInsertId........................................................2455
tMysqlLookupInput................................................................................................ 2459
tMysqlOutput.......................................................................................................... 2460
tMysqlOutput Standard properties....................................................................................................................... 2460
Inserting a column and altering data using tMysqlOutput......................................................................... 2466
Updating data using tMysqlOutput...................................................................................................................... 2471
Retrieving data in error with a Reject link....................................................................................................... 2474
tMysqlOutputBulk...................................................................................................2480
tMysqlOutputBulk Standard properties...............................................................................................................2480
Inserting transformed data in MySQL database..............................................................................................2482
tMysqlOutputBulkExec...........................................................................................2486
tMysqlOutputBulkExec Standard properties..................................................................................................... 2486
Inserting data in bulk in MySQL database........................................................................................................2489
tMysqlRollback........................................................................................................2491
tMysqlRollback Standard properties.................................................................................................................... 2491
tMysqlRow............................................................................................................... 2493
tMysqlRow Standard properties.............................................................................................................................2493
Removing and regenerating a MySQL table index........................................................................................ 2497
Using PreparedStatement objects to query data............................................................................................ 2498
Combining two flows for selective output........................................................................................................2503
tMysqlSCD............................................................................................................... 2508
tMysqlSCD Standard properties............................................................................................................................. 2508
SCD management methodology............................................................................................................................2511
Tracking data changes using Slowly Changing Dimensions (type 0 through type 3)........................ 2514
tMysqlSCDELT......................................................................................................... 2522
tMysqlSCDELT Standard properties......................................................................................................................2522
tMysqlSP.................................................................................................................. 2526
tMysqlSP Standard properties................................................................................................................................ 2526
Using tMysqlSP to find a State Label using a stored procedure...............................................................2528
tMysqlTableList...................................................................................................... 2532
tMysqlTableList Standard properties...................................................................................................................2532
Iterating on DB tables and deleting their content using a user-defined SQL template................... 2533
tNamedPipeClose................................................................................................... 2538
tNamedPipeClose Standard properties............................................................................................................... 2538
tNamedPipeOpen....................................................................................................2540
tNamedPipeOpen Standard properties............................................................................................................... 2540
tNamedPipeOutput.................................................................................................2542
tNamedPipeOutput Standard properties............................................................................................................ 2542
tNeo4jBatchOutput.................................................................................................2545
tNeo4jBatchOutput Standard properties............................................................................................................ 2545
tNeo4jBatchOutputRelationship...........................................................................2548
tNeo4jBatchOutputRelationship Standard properties................................................................................... 2548
Writing information of actors and movies to Neo4j with hierarchical relationship using Neo4j
Batch components...................................................................................................................................................... 2550
tNeo4jBatchSchema............................................................................................... 2560
tNeo4jBatchSchema Standard properties.......................................................................................................... 2560
tNeo4jClose............................................................................................................. 2562
tNeo4jClose Standard properties.......................................................................................................................... 2562
tNeo4jConnection...................................................................................................2564
tNeo4jConnection Standard properties...............................................................................................................2564
tNeo4jImportTool................................................................................................... 2567
tNeo4jImportTool Standard properties...............................................................................................................2567
tNeo4jInput............................................................................................................. 2569
tNeo4jInput Standard properties...........................................................................................................................2569
tNeo4jOutput.......................................................................................................... 2572
tNeo4jOutput Standard properties....................................................................................................................... 2572
Writing data to a Neo4j database and reading specific data from it...................................................... 2576
Writing family information to Neo4j and creating relationships.............................................................. 2580
tNeo4jOutputRelationship.................................................................................... 2586
tNeo4jOutputRelationship Standard properties.............................................................................................. 2586
Writing information of actors and movies to Neo4j with hierarchical relationship........................... 2589
tNeo4jRow............................................................................................................... 2599
tNeo4jRow Standard properties............................................................................................................................ 2599
Creating nodes with a label using a Cypher query........................................................................................2602
Importing data from a CSV file to Neo4j using a Cypher query................................................................2606
Importing data from a CSV file to Neo4j and creating relationships using a single Cypher query.. 2612
tNetezzaBulkExec................................................................................................... 2616
tNetezzaBulkExec Standard properties...............................................................................................................2616
tNetezzaClose......................................................................................................... 2620
tNetezzaClose Standard properties...................................................................................................................... 2620
tNetezzaCommit..................................................................................................... 2622
tNetezzaCommit Standard properties................................................................................................................. 2622
tNetezzaConnection............................................................................................... 2624
tNetezzaConnection Standard properties.......................................................................................................... 2624
tNetezzaInput..........................................................................................................2626
tNetezzaInput Standard properties...................................................................................................................... 2626
tNetezzaNzLoad......................................................................................................2630
tNetezzaNzLoad Standard properties.................................................................................................................. 2630
tNetezzaOutput.......................................................................................................2637
tNetezzaOutput Standard properties................................................................................................................... 2637
tNetezzaRollback....................................................................................................2643
tNetezzaRollback Standard properties................................................................................................................2643
tNetezzaRow........................................................................................................... 2645
tNetezzaRow Standard properties........................................................................................................................ 2645
tNetezzaSCD............................................................................................................2649
tNetezzaSCD Standard properties.........................................................................................................................2649
tNetsuiteConnection...............................................................................................2653
tNetsuiteConnection Standard properties..........................................................................................................2653
tNetsuiteInput......................................................................................................... 2655
tNetsuiteInput Standard properties......................................................................................................................2655
Handling data with NetSuite..................................................................................................................................2657
tNetsuiteOutput...................................................................................................... 2663
tNetsuiteOutput Standard properties.................................................................................................................. 2663
tNormalize............................................................................................................... 2667
tNormalize Standard properties.............................................................................................................................2667
Normalizing data......................................................................................................................................................... 2669
tOpenbravoERPInput..............................................................................................2672
tOpenbravoERPInput Standard properties.........................................................................................................2672
tOpenbravoERPOutput...........................................................................................2674
tOpenbravoERPOutput Standard properties..................................................................................................... 2674
tOracleBulkExec......................................................................................................2676
tOracleBulkExec Standard properties..................................................................................................................2676
Truncating and inserting file data into an Oracle database.......................................................................2681
tOracleClose............................................................................................................ 2684
tOracleClose Standard properties......................................................................................................................... 2684
tOracleCommit........................................................................................................ 2686
tOracleCommit Standard properties.................................................................................................................... 2686
tOracleConnection.................................................................................................. 2688
tOracleConnection Standard properties............................................................................................................. 2688
tOracleInput............................................................................................................ 2692
tOracleInput Standard properties......................................................................................................................... 2692
Using context parameters when reading a table from an Oracle database..........................................2695
tOracleOutput......................................................................................................... 2699
tOracleOutput Standard properties...................................................................................................................... 2699
tOracleOutputBulk..................................................................................................2706
tOracleOutputBulk Standard properties............................................................................................................. 2706
tOracleOutputBulkExec..........................................................................................2709
tOracleOutputBulkExec Standard properties.................................................................................................... 2709
tOracleRollback.......................................................................................................2715
tOracleRollback Standard properties...................................................................................................................2715
tOracleRow.............................................................................................................. 2717
tOracleRow Standard properties........................................................................................................................... 2717
tOracleSCD...............................................................................................................2722
tOracleSCD Standard properties............................................................................................................................2722
tOracleSCDELT........................................................................................................ 2726
tOracleSCDELT Standard properties.................................................................................................................... 2726
tOracleSP................................................................................................................. 2731
tOracleSP Standard properties...............................................................................................................................2731
Checking number format using a stored procedure...................................................................................... 2735
tOracleTableList......................................................................................................2739
tOracleTableList Standard properties..................................................................................................................2739
tPaloCheckElements...............................................................................................2741
tPaloCheckElements Standard properties..........................................................................................................2741
tPaloClose................................................................................................................2744
tPaloClose Standard properties............................................................................................................................. 2744
tPaloConnection..................................................................................................... 2746
tPaloConnection Standard properties..................................................................................................................2746
tPaloCube................................................................................................................ 2748
tPaloCube Standard properties.............................................................................................................................. 2748
Creating a cube in an existing database........................................................................................................... 2750
tPaloCubeList.......................................................................................................... 2752
Discovering the read-only output schema of tPaloCubeList...................................................................... 2752
tPaloCubeList Standard properties.......................................................................................................................2752
Retrieving detailed cube information from a given database................................................................... 2754
tPaloDatabase......................................................................................................... 2756
tPaloDatabase Standard properties......................................................................................................................2756
Creating a database................................................................................................................................................... 2757
tPaloDatabaseList...................................................................................................2759
Discovering the read-only output schema of tPaloDatabaseList..............................................................2759
tPaloDatabaseList Standard properties...............................................................................................................2759
Retrieving detailed database information from a given Palo server.......................................................2761
tPaloDimension.......................................................................................................2763
tPaloDimension Standard properties...................................................................................................................2763
Creating a dimension with elements.................................................................................................................. 2766
tPaloDimensionList................................................................................................ 2771
Discovering the read-only output schema of tPaloDimensionList........................................................... 2771
tPaloDimensionList Standard properties............................................................................................................2771
Retrieving detailed dimension information from a given database........................................................ 2773
tPaloInputMulti.......................................................................................................2776
tPaloInputMulti Standard properties................................................................................................................... 2776
Retrieving dimension elements from a given cube....................................................................................... 2778
tPaloOutput............................................................................................................. 2782
tPaloOutput Standard properties.......................................................................................................................... 2782
tPaloOutputMulti....................................................................................................2785
tPaloOutputMulti Standard properties................................................................................................................2785
Writing data into a given cube..............................................................................................................................2787
Rejecting inflow data when the elements to be written do not exist in a given cube..................... 2790
tPaloRule................................................................................................................. 2795
tPaloRule Standard properties............................................................................................................................... 2795
Creating a rule in a given cube............................................................................................................................ 2796
tPaloRuleList...........................................................................................................2799
Discovering the read-only output schema of tPaloRuleList....................................................................... 2799
tPaloRuleList Standard properties........................................................................................................................2799
Retrieving detailed rule information from a given cube............................................................................. 2801
tParAccelBulkExec.................................................................................................. 2803
tParAccelBulkExec Standard properties..............................................................................................................2803
tParAccelClose........................................................................................................ 2807
tParAccelClose Standard properties.....................................................................................................................2807
tParAccelCommit.................................................................................................... 2809
tParAccelCommit Standard properties................................................................................................................ 2809
tParAccelConnection.............................................................................................. 2811
tParAccelConnection Standard properties......................................................................................................... 2811
tParAccelInput.........................................................................................................2813
tParAccelInput Standard properties..................................................................................................................... 2813
tParAccelOutput......................................................................................................2817
tParAccelOutput Standard properties..................................................................................................................2817
tParAccelOutputBulk..............................................................................................2823
tParAccelOutputBulk Standard properties......................................................................................................... 2823
tParAccelOutputBulkExec......................................................................................2826
tParAccelOutputBulkExec Standard properties................................................................................................2826
tParAccelRollback...................................................................................................2830
tParAccelRollback Standard properties...............................................................................................................2830
tParAccelRow.......................................................................................................... 2832
tParAccelRow Standard properties....................................................................................................................... 2832
tParAccelSCD...........................................................................................................2836
tParAccelSCD Standard properties........................................................................................................................2836
tParseRecordSet......................................................................................................2840
tParseRecordSet Standard properties..................................................................................................................2840
tPatternUnmasking.................................................................................................2842
tPatternUnmasking Standard properties............................................................................................................ 2842
Unmasking Australian phone numbers...............................................................................................................2845
tPatternUnmasking properties for Apache Spark Batch............................................................................... 2849
tPatternUnmasking properties for Apache Spark Streaming...................................................................... 2853
tPivotToColumnsDelimited................................................................................... 2857
tPivotToColumnsDelimited Standard properties.............................................................................................2857
Using a pivot column to aggregate data...........................................................................................................2858
tPOP......................................................................................................................... 2861
tPOP Standard properties........................................................................................................................................ 2861
Retrieving a selection of email messages from an email server.............................................................. 2863
tPostgresPlusBulkExec.......................................................................................... 2865
tPostgresPlusBulkExec Standard properties..................................................................................................... 2865
tPostgresPlusClose.................................................................................................2869
tPostgresPlusClose Standard properties.............................................................................................................2869
tPostgresPlusCommit.............................................................................................2871
tPostgresPlusCommit Standard properties........................................................................................................2871
tPostgresPlusConnection.......................................................................................2873
tPostgresPlusConnection Standard properties.................................................................................................2873
tPostgresPlusInput................................................................................................. 2875
tPostgresPlusInput Standard properties.............................................................................................................2875
tPostgresPlusOutput.............................................................................................. 2879
tPostgresPlusOutput Standard properties..........................................................................................................2879
tPostgresPlusOutputBulk...................................................................................... 2885
tPostgresPlusOutputBulk Standard properties.................................................................................................2885
tPostgresPlusOutputBulkExec.............................................................................. 2888
tPostgresPlusOutputBulkExec Standard properties....................................................................................... 2888
tPostgresPlusRollback........................................................................................... 2891
tPostgresPlusRollback Standard properties...................................................................................................... 2891
tPostgresPlusRow...................................................................................................2893
tPostgresPlusRow Standard properties...............................................................................................................2893
tPostgresPlusSCD................................................................................................... 2897
tPostgresPlusSCD Standard properties............................................................................................................... 2897
tPostgresPlusSCDELT.............................................................................................2901
tPostgresPlusSCDELT Standard properties........................................................................................................2901
tPostgresqlBulkExec...............................................................................................2906
tPostgresqlBulkExec Standard properties..........................................................................................................2906
tPostgresqlClose..................................................................................................... 2910
tPostgresqlClose Standard properties................................................................................................................. 2910
tPostgresqlCommit.................................................................................................2912
tPostgresqlCommit Standard properties............................................................................................................ 2912
tPostgresqlConnection...........................................................................................2914
tPostgresqlConnection Standard properties..................................................................................................... 2914
tPostgresqlInput..................................................................................................... 2916
tPostgresqlInput Standard properties................................................................................................................. 2916
tPostgresqlOutput.................................................................................................. 2920
tPostgresqlOutput Standard properties.............................................................................................................. 2920
tPostgresqlOutputBulk.......................................................................................... 2927
tPostgresqlOutputBulk Standard properties..................................................................................................... 2927
tPostgresqlOutputBulkExec.................................................................................. 2930
tPostgresqlOutputBulkExec Standard properties............................................................................................ 2930
tPostgresqlRollback............................................................................................... 2934
tPostgresqlRollback Standard properties...........................................................................................................2934
tPostgresqlRow.......................................................................................................2936
tPostgresqlRow Standard properties................................................................................................................... 2936
tPostgresqlSCD....................................................................................................... 2940
tPostgresqlSCD Standard properties....................................................................................................................2940
tPostgresqlSCDELT.................................................................................................2944
tPostgresqlSCDELT Standard properties............................................................................................................ 2944
Tracking data changes in a PostgreSQL table using the tPostgreSQLSCDELT component............. 2948
tPostjob....................................................................................................................2958
tPostjob Standard properties.................................................................................................................................. 2958
tPrejob......................................................................................................................2959
tPrejob Standard properties.................................................................................................................................... 2959
Handling files before and after the execution of a data Job..................................................................... 2959
tPubSubOutput....................................................................................................... 2963
tRedshiftBulkExec.................................................................................................. 2964
tRedshiftBulkExec Standard properties.............................................................................................................. 2964
Loading/unloading data to/from Amazon S3................................................................................................... 2970
tRedshiftClose.........................................................................................................2980
tRedshiftClose Standard properties......................................................................................................................2980
tRedshiftCommit.....................................................................................................2982
tRedshiftCommit Standard properties.................................................................................................................2982
tRedshiftConnection...............................................................................................2984
tRedshiftConnection Standard properties..........................................................................................................2984
tRedshiftInput......................................................................................................... 2987
tRedshiftInput Standard properties......................................................................................................................2987
Handling data with Redshift...................................................................................................................................2991
tRedshiftOutput...................................................................................................... 2996
tRedshiftOutput Standard properties...................................................................................................................2996
tRedshiftOutputBulk.............................................................................................. 3002
tRedshiftOutputBulk Standard properties..........................................................................................................3002
tRedshiftOutputBulkExec...................................................................................... 3007
tRedshiftOutputBulkExec Standard properties................................................................................................ 3007
tRedshiftRollback................................................................................................... 3014
tRedshiftRollback Standard properties............................................................................................................... 3014
tRedshiftRow...........................................................................................................3016
tRedshiftRow Standard properties........................................................................................................................3016
tRedshiftUnload...................................................................................................... 3021
tRedshiftUnload Standard properties.................................................................................................................. 3021
tReplace................................................................................................................... 3026
tReplace Standard properties................................................................................................................................. 3026
Cleaning up and filtering a CSV file....................................................................................................................3027
tReplaceList.............................................................................................................3031
tReplaceList Standard properties..........................................................................................................................3031
Replacing state names with their two-letter codes.......................................................................................3032
tReplicate.................................................................................................................3036
tReplicate Standard properties.............................................................................................................................. 3036
Replicating a flow and sorting two identical flows respectively..............................................................3037
tREST........................................................................................................................3041
tREST Standard properties.......................................................................................................................................3041
Creating and retrieving data by invoking REST Web service..................................................................... 3042
tRESTClient............................................................................................................. 3045
tRESTClient Standard properties...........................................................................................................................3045
Getting user information by interacting with a RESTful service...............................................................3050
Updating user information by interacting with a RESTful service........................................................... 3056
tRESTRequest..........................................................................................................3063
tRESTRequest Standard properties.......................................................................................................................3063
Using a REST service to accept HTTP GET requests and send responses............................................. 3066
Using URI Query parameters to explore the data of a database.............................................................. 3072
Using a REST service to accept HTTP POST requests.................................................................................. 3080
Using a REST service to accept HTTP POST requests and send responses...........................................3085
Using a REST service to accept HTTP POST requests in an HTML form................................................3093
tRESTResponse....................................................................................................... 3100
tRESTResponse Standard properties....................................................................................................................3100
tRiakBucketList....................................................................................................... 3102
tRiakBucketList Standard properties....................................................................................................................3102
tRiakClose................................................................................................................3104
tRiakClose Standard properties............................................................................................................................. 3104
tRiakConnection......................................................................................................3105
tRiakConnection Standard properties..................................................................................................................3105
tRiakInput................................................................................................................ 3107
tRiakInput Standard properties..............................................................................................................................3107
Exporting data from a Riak bucket to a local file..........................................................................................3108
tRiakKeyList............................................................................................................ 3113
tRiakKeyList Standard properties..........................................................................................................................3113
tRiakOutput............................................................................................................. 3115
tRiakOutput Standard properties.......................................................................................................................... 3115
tRouteFault..............................................................................................................3118
tRouteFault Standard properties........................................................................................................................... 3118
Exchanging messages between a Job and a Route....................................................................................... 3119
tRouteInput............................................................................................................. 3126
tRouteInput Standard properties...........................................................................................................................3126
Exchanging messages between a Job and a Route....................................................................................... 3127
tRouteOutput.......................................................................................................... 3132
tRouteOutput Standard properties....................................................................................................................... 3132
tRowGenerator........................................................................................................ 3134
tRowGenerator Standard properties.....................................................................................................................3134
Generating random java data.................................................................................................................................3136
tRSSInput.................................................................................................................3138
tRSSInput Standard properties...............................................................................................................................3138
Fetching frequently updated blog entries.........................................................................................................3139
tRSSOutput..............................................................................................................3141
tRSSOutput Standard properties........................................................................................................................... 3141
Creating an RSS flow and storing files on an FTP server........................................................................... 3142
Creating an RSS flow that contains metadata.................................................................................................3147
Creating an ATOM feed XML file..........................................................................................................................3149
tRunJob.................................................................................................................... 3153
tRunJob Standard properties...................................................................................................................................3153
Calling a Job and passing the parameter needed to the called Job........................................................ 3156
Running a list of child Jobs dynamically........................................................................................................... 3160
Propagating the buffered output data from the child Job to the parent Job........................................3164
tS3BucketCreate..................................................................................................... 3169
tS3BucketCreate Standard properties..................................................................................................................3169
tS3BucketDelete..................................................................................................... 3172
tS3BucketDelete Standard properties................................................................................................................. 3172
tS3BucketExist........................................................................................................ 3174
tS3BucketExist Standard properties.....................................................................................................................3174
Verifing the absence of a bucket, creating it and listing all the S3 buckets........................................ 3176
tS3BucketList.......................................................................................................... 3180
tS3BucketList Standard properties....................................................................................................................... 3180
tS3Close...................................................................................................................3182
tS3Close Standard properties................................................................................................................................. 3182
tS3Connection.........................................................................................................3184
tS3Connection Standard properties..................................................................................................................... 3184
Creating an IAM role on AWS................................................................................................................................ 3187
Setting up SSE KMS for your EMR cluster........................................................................................................ 3187
Setting up SSE KMS for your S3 bucket............................................................................................................ 3189
tS3Copy....................................................................................................................3192
tS3Copy Standard properties.................................................................................................................................. 3192
Copying an S3 object from one bucket to another........................................................................................3194
tS3Delete.................................................................................................................3199
tS3Delete Standard properties...............................................................................................................................3199
tS3Get...................................................................................................................... 3202
tS3Get Standard properties..................................................................................................................................... 3202
tS3List...................................................................................................................... 3206
tS3List Standard properties.....................................................................................................................................3206
Listing files with the same prefix from a bucket........................................................................................... 3208
Tagging S3 objects................................................................................................ 3212
Tagging S3 objects: linking the components...................................................................................................3212
Tagging S3 objects: configuring the components..........................................................................................3212
Tagging S3 objects: executing the Job...............................................................................................................3213
tS3Put...................................................................................................................... 3215
tS3Put Standard properties..................................................................................................................................... 3215
Exchange files with Amazon S3............................................................................................................................3218
tSalesforceBulkExec............................................................................................... 3222
tSalesforceBulkExec Standard properties.......................................................................................................... 3222
tSalesforceConnection........................................................................................... 3227
tSalesforceConnection Standard properties......................................................................................................3227
Connecting to Salesforce using OAuth implicit flow to authenticate the user (deprecated).......... 3230
tSalesforceGetDeleted........................................................................................... 3235
tSalesforceGetDeleted Standard properties...................................................................................................... 3235
Recovering deleted data from Salesforce..........................................................................................................3238
tSalesforceGetServerTimestamp...........................................................................3243
tSalesforceGetServerTimestamp Standard properties................................................................................... 3243
tSalesforceGetUpdated.......................................................................................... 3247
tSalesforceGetUpdated Standard properties.....................................................................................................3247
tSalesforceInput......................................................................................................3252
tSalesforceInput Standard properties..................................................................................................................3252
How to set schema for the guess query feature of tSalesforceInput...................................................... 3257
tSalesforceOutput...................................................................................................3263
tSalesforceOutput Standard properties...............................................................................................................3263
Upserting Salesforce data based on external IDs.......................................................................................... 3268
tSalesforceOutputBulk........................................................................................... 3279
tSalesforceOutputBulk Standard properties......................................................................................................3279
tSalesforceOutputBulkExec...................................................................................3281
tSalesforceOutputBulkExec Standard properties............................................................................................ 3281
Inserting bulk data into Salesforce......................................................................................................................3286
tSalesforceEinsteinBulkExec................................................................................. 3290
tSalesforceEinsteinBulkExec Standard properties.......................................................................................... 3290
tSalesforceEinsteinOutputBulkExec..................................................................... 3294
tSalesforceEinsteinOutputBulkExec Standard properties............................................................................ 3294
tSampleRow............................................................................................................ 3299
tSampleRow Standard properties......................................................................................................................... 3299
Filtering rows and groups of rows.......................................................................................................................3300
tSAPHanaClose........................................................................................................3303
tSAPHanaClose Standard properties.................................................................................................................... 3303
tSAPHanaCommit................................................................................................... 3304
tSAPHanaCommit Standard properties............................................................................................................... 3304
tSAPHanaConnection............................................................................................. 3306
tSAPHanaConnection Standard properties........................................................................................................ 3306
tSAPHanaInput........................................................................................................3308
tSAPHanaInput Standard properties.................................................................................................................... 3308
tSAPHanaOutput.....................................................................................................3312
tSAPHanaOutput Standard properties.................................................................................................................3312
tSAPHanaRollback.................................................................................................. 3318
tSAPHanaRollback Standard properties..............................................................................................................3318
tSAPHanaRow......................................................................................................... 3319
tSAPHanaRow Standard properties...................................................................................................................... 3319
Exporting data using tSAPHanaUnload...............................................................3323

Creating the SAP HANA database connection................................................................................................. 3323
Creating and running the Job.................................................................................................................................3324
tSchemaComplianceCheck.....................................................................................3325
tSCPClose.................................................................................................................3326
tSCPClose Standard properties.............................................................................................................................. 3326
tSCPConnection...................................................................................................... 3328
tSCPConnection Standard properties...................................................................................................................3328
tSCPDelete...............................................................................................................3330
tSCPDelete Standard properties............................................................................................................................ 3330
tSCPFileExists......................................................................................................... 3332
tSCPFileExists Standard properties...................................................................................................................... 3332
Handling a file using SCP........................................................................................................................................3333
tSCPFileList............................................................................................................. 3338
tSCPFileList Standard properties...........................................................................................................................3338
tSCPGet.................................................................................................................... 3340
tSCPGet Standard properties.................................................................................................................................. 3340
tSCPPut.................................................................................................................... 3342
tSCPPut Standard properties.................................................................................................................................. 3342
tSCPRename............................................................................................................ 3344
tSCPRename Standard properties......................................................................................................................... 3344
tSCPTruncate...........................................................................................................3346
tSCPTruncate Standard properties........................................................................................................................3346
tSendMail.................................................................................................................3348
tSendMail Standard properties.............................................................................................................................. 3348
Sending an email on error...................................................................................................................................... 3350
tServerAlive............................................................................................................. 3352
tServerAlive Standard properties.......................................................................................................................... 3352
Validating the status of the connection to a remote host.......................................................................... 3353
tServiceNowConnection.........................................................................................3356
tServiceNowConnection Standard properties...................................................................................................3356
tServiceNowInput................................................................................................... 3358
tServiceNowInput Standard properties............................................................................................................... 3358
tServiceNowOutput................................................................................................ 3361
tServiceNowOutput Standard properties............................................................................................................3361
tSetEnv.....................................................................................................................3364
tSetEnv Standard properties................................................................................................................................... 3364
Modifying a variable during a Job execution................................................................................................... 3365
tSetGlobalVar.......................................................................................................... 3368
tSetGlobalVar Standard properties....................................................................................................................... 3368
Printing out the content of a global variable..................................................................................................3369
tSetKerberosConfiguration....................................................................................3371
tSetKerberosConfiguration Standard properties..............................................................................................3371
tSetKeystore............................................................................................................3373
tSetKeystore Standard properties......................................................................................................................... 3373
Extracting customer information from a private WSDL file........................................................................3374
tSetProxy................................................................................................................. 3379
tSetProxy Standard properties............................................................................................................................... 3379
tSleep....................................................................................................................... 3382
tSleep Standard properties......................................................................................................................................3382
tSnowflakeBulkExec...............................................................................................3384
tSnowflakeBulkExec Standard properties.......................................................................................................... 3384
Loading data in a Snowflake table using custom stage path.................................................................... 3390
tSnowflakeClose..................................................................................................... 3398
tSnowflakeClose Standard properties................................................................................................................. 3398
tSnowflakeCommit................................................................................................. 3399
tSnowflakeCommit Standard properties.............................................................................................................3399
Related scenario for tSnowflakeCommit............................................................................................................3400
tSnowflakeConnection........................................................................................... 3401
tSnowflakeConnection Standard properties......................................................................................................3401
tSnowflakeInput......................................................................................................3404
tSnowflakeInput Standard properties..................................................................................................................3404
Writing data into and reading data from a Snowflake table......................................................................3407
tSnowflakeOutput...................................................................................................3412
tSnowflakeOutput Standard properties.............................................................................................................. 3412
tSnowflakeOutputBulk...........................................................................................3416
tSnowflakeOutputBulk Standard properties......................................................................................................3416
tSnowflakeOutputBulkExec...................................................................................3423
tSnowflakeOutputBulkExec Standard properties............................................................................................ 3423
Loading Data Using COPY Command..................................................................................................................3430
tSnowflakeRollback................................................................................................3438
tSnowflakeRollback Standard properties........................................................................................................... 3438
Related scenario: tSnowflakeRollback................................................................................................................ 3439
tSnowflakeRow....................................................................................................... 3440
tSnowflakeRow Standard properties....................................................................................................................3440
Querying data in a cloud file through a Snowflake external table and a materialized view......... 3443
tSOAP....................................................................................................................... 3450
tSOAP Standard properties......................................................................................................................................3450
Fetching the country name information using a Web service................................................................... 3452
Using a SOAP message from an XML file to get country name information and saving the
information to an XML file..................................................................................................................................... 3454
tSocketInput............................................................................................................ 3458
tSocketInput Standard properties......................................................................................................................... 3458
Passing on data to the listening port................................................................................................................. 3460
tSocketOutput......................................................................................................... 3463
tSocketOutput Standard properties......................................................................................................................3463
tSortRow.................................................................................................................. 3465
tSortRow Standard properties................................................................................................................................ 3465
Sorting entries.............................................................................................................................................................. 3466
tSplitRow................................................................................................................. 3469
tSplitRow Standard properties............................................................................................................................... 3469
Splitting one row into two rows...........................................................................................................................3470
tSplunkEventCollector........................................................................................... 3474
tSplunkEventCollector Standard properties...................................................................................................... 3474
tSQLDWHBulkExec................................................................................................. 3476
tSQLDWHBulkExec Standard properties.............................................................................................................3476
tSQLDWHClose........................................................................................................3481
tSQLDWHClose Standard properties.................................................................................................................... 3481
tSQLDWHCommit....................................................................................................3483
tSQLDWHCommit Standard properties............................................................................................................... 3483
tSQLDWHConnection............................................................................................. 3485
tSQLDWHConnection Standard properties........................................................................................................ 3485
tSQLDWHInput........................................................................................................3488
tSQLDWHInput Standard properties.................................................................................................................... 3488
tSQLDWHOutput.....................................................................................................3492
tSQLDWHOutput Standard properties.................................................................................................................3492
tSQLDWHRollback.................................................................................................. 3498
tSQLDWHRollback Standard properties..............................................................................................................3498
tSQLDWHRow......................................................................................................... 3500
tSQLDWHRow Standard properties...................................................................................................................... 3500
tSQLiteClose............................................................................................................3504
tSQLiteClose Standard properties.........................................................................................................................3504
tSQLiteCommit........................................................................................................3506
tSQLiteCommit Standard properties.................................................................................................................... 3506
tSQLiteConnection..................................................................................................3508
tSQLiteConnection Standard properties............................................................................................................. 3508
tSQLiteInput............................................................................................................ 3510
tSQLiteInput Standard properties......................................................................................................................... 3510
Filtering SQlite data...................................................................................................................................................3512
tSQLiteOutput......................................................................................................... 3515
tSQLiteOutput Standard properties......................................................................................................................3515
tSQLiteRollback...................................................................................................... 3520
tSQLiteRollback Standard properties...................................................................................................................3520
tSQLiteRow..............................................................................................................3522
tSQLiteRow Standard properties...........................................................................................................................3522
Updating SQLite rows............................................................................................................................................... 3525
tSQLTemplate......................................................................................................... 3528
tSQLTemplate Standard properties...................................................................................................................... 3528
tSQLTemplateAggregate....................................................................................... 3531
tSQLTemplateAggregate Standard properties..................................................................................................3531
Filtering and aggregating table columns directly on the DBMS...............................................................3533
tSQLTemplateCommit............................................................................................3537
tSQLTemplateCommit Standard properties...................................................................................................... 3537
tSQLTemplateFilterColumns................................................................................. 3539
tSQLTemplateFilterColumns Standard properties.......................................................................................... 3539
tSQLTemplateFilterRows.......................................................................................3541
tSQLTemplateFilterRows Standard properties.................................................................................................3541
tSQLTemplateMerge.............................................................................................. 3543
tSQLTemplateMerge Standard properties..........................................................................................................3543
Merging data directly on the DBMS.................................................................................................................... 3545
tSQLTemplateRollback.......................................................................................... 3552
tSQLTemplateRollback Standard properties.....................................................................................................3552
tSqoopExport.......................................................................................................... 3554
Additional arguments................................................................................................................................................ 3554
tSqoopExport Standard properties....................................................................................................................... 3555
tSqoopImport.......................................................................................................... 3565
tSqoopImport Standard properties....................................................................................................................... 3565
Importing a MySQL table to HDFS.......................................................................................................................3574
tSqoopImportAllTables..........................................................................................3580
tSqoopImportAllTables Standard properties.....................................................................................................3580
tSqoopMerge...........................................................................................................3588
tSqoopMerge Standard properties........................................................................................................................3588
Merging two datasets in HDFS..............................................................................................................................3595
tSQSConnection...................................................................................................... 3600
tSQSConnection Standard properties.................................................................................................................. 3600
tSQSInput.................................................................................................................3603
tSQSInput Standard properties.............................................................................................................................. 3603
Retrieving messages from an Amazon SQS queue........................................................................................ 3606
tSQSMessageChangeVisibility...............................................................................3611
tSQSMessageChangeVisibility Standard properties........................................................................................3611
tSQSMessageDelete............................................................................................... 3614
tSQSMessageDelete Standard properties...........................................................................................................3614
tSQSOutput..............................................................................................................3617
tSQSOutput Standard properties...........................................................................................................................3617
Delivering messages to an Amazon SQS queue............................................................................................. 3620
tSQSQueueAttributes............................................................................................. 3626
tSQSQueueAttributes Standard properties........................................................................................................ 3626
tSQSQueueCreate................................................................................................... 3629
tSQSQueueCreate Standard properties............................................................................................................... 3629
tSQSQueueDelete................................................................................................... 3632
tSQSQueueDelete Standard properties...............................................................................................................3632
tSQSQueueList........................................................................................................ 3635
tSQSQueueList Standard properties.....................................................................................................................3635
Listing Amazon SQS queues in an AWS region...............................................................................................3637
tSQSQueuePurge.................................................................................................... 3641
tSQSQueuePurge Standard properties................................................................................................................ 3641
tSSH..........................................................................................................................3644
tSSH Standard properties.........................................................................................................................................3644
Displaying remote system information via SSH..............................................................................................3647
tStatCatcher.............................................................................................................3649
tStatCatcher Standard properties..........................................................................................................................3649
Displaying the statistics log of Job execution................................................................................................. 3650
tSVNLogInput..........................................................................................................3654
tSVNLogInput Standard properties.......................................................................................................................3654
Retrieving a log message from an SVN repository........................................................................................ 3655
tSybaseBulkExec.....................................................................................................3658
tSybaseBulkExec Standard properties.................................................................................................................3658
tSybaseClose........................................................................................................... 3663
tSybaseClose Standard properties........................................................................................................................ 3663
tSybaseCommit....................................................................................................... 3665
tSybaseCommit Standard properties....................................................................................................................3665
tSybaseConnection................................................................................................. 3667
tSybaseConnection Standard properties.............................................................................................................3667
tSybaseInput............................................................................................................3669
tSybaseInput Standard properties.........................................................................................................................3669
tSybaseIQBulkExec................................................................................................. 3673
tSybaseIQBulkExec Standard properties............................................................................................................ 3673
tSybaseIQOutputBulkExec.....................................................................................3681
tSybaseIQOutputBulkExec Standard properties...............................................................................................3681
Bulk-loading data to a Sybase IQ 12 database............................................................................................... 3685
tSybaseOutput.........................................................................................................3689
tSybaseOutput Standard properties..................................................................................................................... 3689
tSybaseOutputBulk.................................................................................................3695
tSybaseOutputBulk Standard properties............................................................................................................ 3695
tSybaseOutputBulkExec.........................................................................................3698
tSybaseOutputBulkExec Standard properties................................................................................................... 3698
tSybaseRollback......................................................................................................3703
tSybaseRollback Standard properties.................................................................................................................. 3703
tSybaseRow............................................................................................................. 3705
tSybaseRow Standard properties.......................................................................................................................... 3705
tSybaseSCD..............................................................................................................3709
tSybaseSCD Standard properties...........................................................................................................................3709
tSybaseSCDELT....................................................................................................... 3713
tSybaseSCDELT Standard properties................................................................................................................... 3713
tSybaseSP................................................................................................................ 3718
tSybaseSP Standard properties.............................................................................................................................. 3718
tSystem.................................................................................................................... 3722
tSystem Standard properties...................................................................................................................................3722
Echoing 'Hello World!'...............................................................................................................................................3724
tTeradataClose........................................................................................................3726
tTeradataClose Standard properties.....................................................................................................................3726
tTeradataCommit....................................................................................................3728
tTeradataCommit Standard properties................................................................................................................3728
tTeradataConnection..............................................................................................3730
tTeradataConnection Standard properties.........................................................................................................3730
tTeradataFastExport...............................................................................................3733
tTeradataFastExport Standard properties.......................................................................................................... 3733
tTeradataFastLoad..................................................................................................3736
tTeradataFastLoad Standard properties............................................................................................................. 3736
tTeradataFastLoadUtility....................................................................................... 3739
tTeradataFastLoadUtility Standard properties................................................................................................. 3739
tTeradataInput........................................................................................................ 3742
tTeradataInput Standard properties.....................................................................................................................3742
tTeradataMultiLoad................................................................................................3746
tTeradataMultiLoad Standard properties........................................................................................................... 3746
tTeradataOutput..................................................................................................... 3749
tTeradataOutput Standard properties................................................................................................................. 3749
tTeradataRollback.................................................................................................. 3755
tTeradataRollback Standard properties.............................................................................................................. 3755
tTeradataRow..........................................................................................................3757
tTeradataRow Standard properties.......................................................................................................................3757
tTeradataSCD.......................................................................................................... 3762
tTeradataSCD Standard properties....................................................................................................................... 3762
tTeradataSCDELT....................................................................................................3766
tTeradataSCDELT Standard properties................................................................................................................3766
tTeradataTPTExec.................................................................................................. 3771
tTeradataTPTExec Standard properties.............................................................................................................. 3771
Supported optional attributes for each consumer operator....................................................................... 3775
Loading data into a Teradata database............................................................................................................. 3776
tTeradataTPTUtility................................................................................................3783
tTeradataTPTUtility Standard properties........................................................................................................... 3783
tTeradataTPump..................................................................................................... 3788
tTeradataTPump Standard properties................................................................................................................. 3788
Inserting data into a Teradata database table................................................................................................ 3790
tUniqRow................................................................................................................. 3794
tUniqRow Standard properties...............................................................................................................................3794
Deduplicating entries.................................................................................................................................................3795
tUnite....................................................................................................................... 3799
tUnite Standard properties...................................................................................................................................... 3799
Iterating on files and merge the content.......................................................................................................... 3800
tVectorWiseCommit................................................................................................3803
tVectorWiseCommit Standard properties........................................................................................................... 3803
tVectorWiseConnection......................................................................................... 3805
tVectorWiseConnection Standard properties.................................................................................................... 3805
tVectorWiseInput.................................................................................................... 3807
tVectorWiseInput Standard properties................................................................................................................ 3807
tVectorWiseOutput................................................................................................. 3811
tVectorWiseOutput Standard properties.............................................................................................................3811
tVectorWiseRollback.............................................................................................. 3816
tVectorWiseRollback Standard properties..........................................................................................................3816
tVectorWiseRow......................................................................................................3818
tVectorWiseRow Standard properties.................................................................................................................. 3818
tVerticaBulkExec.....................................................................................................3822
tVerticaBulkExec Standard properties.................................................................................................................3822
tVerticaClose........................................................................................................... 3828
tVerticaClose Standard properties........................................................................................................................ 3828
tVerticaCommit....................................................................................................... 3830
tVerticaCommit Standard properties....................................................................................................................3830
tVerticaConnection................................................................................................. 3832
tVerticaConnection Standard properties.............................................................................................................3832
tVerticaInput........................................................................................................... 3834
tVerticaInput Standard properties.........................................................................................................................3834
tVerticaOutput........................................................................................................ 3838
tVerticaOutput Standard properties..................................................................................................................... 3838
tVerticaOutputBulk.................................................................................................3844
tVerticaOutputBulk Standard properties............................................................................................................ 3844
tVerticaOutputBulkExec.........................................................................................3847
tVerticaOutputBulkExec Standard properties................................................................................................... 3847
tVerticaRollback......................................................................................................3852
tVerticaRollback Standard properties..................................................................................................................3852
tVerticaRow............................................................................................................. 3854
tVerticaRow Standard properties.......................................................................................................................... 3854
tVerticaSCD............................................................................................................. 3858
tVerticaSCD Standard properties...........................................................................................................................3858
tVtigerCRMInput..................................................................................................... 3862
tVtigerCRMInput Standard properties................................................................................................................. 3862
tVtigerCRMOutput.................................................................................................. 3864
tVtigerCRMOutput Standard properties.............................................................................................................. 3864
tWaitForFile.............................................................................................................3867
tWaitForFile Standard properties.......................................................................................................................... 3867
Waiting for a file to be created and stopping the iteration loop after a message is triggered.......3869
Waiting for a file to be created and continuing the iteration loop after a message is triggered....3871
tWaitForSocket....................................................................................................... 3873
tWaitForSocket Standard properties.................................................................................................................... 3873
tWaitForSqlData..................................................................................................... 3875
tWaitForSqlData Standard properties..................................................................................................................3875
Waiting for insertion of rows in a table............................................................................................................ 3876
tWarn........................................................................................................................3879
tWarn Standard properties.......................................................................................................................................3879
tWebService.............................................................................................................3881
tWebService Standard properties..........................................................................................................................3881
Getting country names using tWebService....................................................................................................... 3883
tWebServiceInput................................................................................................... 3890
tWebServiceInput Standard properties............................................................................................................... 3890
Getting country names using tWebServiceInput.............................................................................................3892
tWorkdayInput........................................................................................................ 3895
tWorkdayInput Standard properties..................................................................................................................... 3895
tWriteJSONField......................................................................................................3897
Configuring a JSON Tree.......................................................................................................................................... 3897
tWriteJSONField Standard properties.................................................................................................................. 3897
Writing flat data into JSON fields.........................................................................................................................3899
tWriteXMLField.......................................................................................................3904
tWriteXMLField Standard properties................................................................................................................... 3904
Extracting the structure of an XML file and inserting it into the fields of a database table...........3906
tXMLMap................................................................................................................. 3910
tXMLMap Standard properties............................................................................................................................... 3910
Mapping and transforming XML data................................................................................................................. 3911
Restructuring products data using multiple loop elements....................................................................... 3933
tXMLRPCInput.........................................................................................................3943
tXMLRPCInput Standard properties..................................................................................................................... 3943
Guessing the State name from an XMLRPC..................................................................................................... 3944
tXSDValidator......................................................................................................... 3946
tXSDValidator Standard properties...................................................................................................................... 3946
Validating data flows against an XSD file........................................................................................................ 3948
tXSLT........................................................................................................................3953
tXSLT Standard properties.......................................................................................................................................3953
Transforming XML to html using an XSL stylesheet.................................................................................... 3954
Copyleft
Copyleft
Adapted for 7.3.1. Supersedes previous releases.
The content of this document is correct at the time of publication.
However, more recent updates may be available in the online version that can be found on Talend
Help Center.
This documentation is provided under the terms of the Creative Commons Public License (CCPL).
For more information about what you can and cannot do with this documentation in accordance with
the CCPL, please read: http://creativecommons.org/licenses/by-nc-sa/2.0/.
Notices
Talend is a trademark of Talend, Inc.
All brands, product names, company names, trademarks and service marks are the properties of their
respective owners.
License Agreement
The software described in this documentation is licensed under the Apache License, Version 2.0 (the
"License"); you may not use this software except in compliance with the License. You may obtain
a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.html. Unless required by
applicable law or agreed to in writing, software distributed under the License is distributed on an "AS
IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under the License.
This product includes software developed at AOP Alliance (Java/J2EE AOP standards), ASM, Amazon,
AntlR, Apache ActiveMQ, Apache Ant, Apache Avro, Apache Axiom, Apache Axis, Apache Axis 2,
Apache Batik, Apache CXF, Apache Cassandra, Apache Chemistry, Apache Common Http Client, Apache
Common Http Core, Apache Commons, Apache Commons Bcel, Apache Commons JxPath, Apache
Commons Lang, Apache Datafu, Apache Derby Database Engine and Embedded JDBC Driver, Apache
Geronimo, Apache HCatalog, Apache Hadoop, Apache Hbase, Apache Hive, Apache HttpClient, Apache
HttpComponents Client, Apache JAMES, Apache Log4j, Apache Lucene Core, Apache Neethi, Apache
Oozie, Apache POI, Apache Parquet, Apache Pig, Apache PiggyBank, Apache ServiceMix, Apache
Sqoop, Apache Thrift, Apache Tomcat, Apache Velocity, Apache WSS4J, Apache WebServices Common
Utilities, Apache Xml-RPC, Apache Zookeeper, Box Java SDK (V2), CSV Tools, Cloudera HTrace,
ConcurrentLinkedHashMap for Java, Couchbase Client, DataNucleus, DataStax Java Driver for Apache
Cassandra, Ehcache, Ezmorph, Ganymed SSH-2 for Java, Google APIs Client Library for Java, Google
Gson, Groovy, Guava: Google Core Libraries for Java, H2 Embedded Database and JDBC Driver, Hector:
A high level Java client for Apache Cassandra, Hibernate BeanValidation API, Hibernate Validator,
HighScale Lib, HsqlDB, Ini4j, JClouds, JDO-API, JLine, JSON, JSR 305: Annotations for Software Defect
Detection in Java, JUnit, Jackson Java JSON-processor, Java API for RESTful Services, Java Agent for
Memory Measurements, Jaxb, Jaxen, JetS3T, Jettison, Jetty, Joda-Time, Json Simple, LZ4: Extremely
Fast Compression algorithm, LightCouch, MetaStuff, Metrics API, Metrics Reporter Config, Microsoft
Azure SDK for Java, Mondrian, MongoDB Java Driver, Netty, Ning Compression codec for LZF encoding,
OpenSAML, Paraccel JDBC Driver, Parboiled, PostgreSQL JDBC Driver, Protocol Buffers - Google's
data interchange format, Resty: A simple HTTP REST client for Java, Riak Client, Rocoto, SDSU Java
Library, SL4J: Simple Logging Facade for Java, SQLite JDBC Driver, Scala Lang, Simple API for CSS,
Snappy for Java a fast compressor/decompresser, SpyMemCached, SshJ, StAX API, StAXON - JSON via
StAX, Super SCV, The Castor Project, The Legion of the Bouncy Castle, Twitter4J, Uuid, W3C, Windows
Azure Storage libraries for Java, Woden, Woodstox: High-performance XML processor, Xalan-J, Xerces2,
77
Copyleft
XmlBeans, XmlSchema Core, Xmlsec - Apache Santuario, YAML parser and emitter for Java, Zip4J,
atinject, dropbox-sdk-java: Java library for the Dropbox Core API, google-guice. Licensed under their
respective license.
78
tAccessBulkExec
tAccessBulkExec
Offers gains in performance when carrying out Insert operations in an Access database.
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data
to a delimited file and then to perform various actions on the file in an Access database, in a two step
process. These two steps are fused together in the tAccessOutputBulkExec component, detailed in a
separate section. The advantage of using a two step process is that it makes it possible to carry out
transformations on the data before loading it in the database.
This component executes an Insert action on the data provided.
tAccessBulkExec Standard properties

These properties are used to configure tAccessBulkExec running in the Standard Job framework.
The Standard tAccessBulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Property type Either Built-in or Repository .
Built-in: No property data is stored centrally.
Repository: Select the repository file in which the

properties are stored. The fields that follow are completed
automatically using the data retrieved.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
79
tAccessBulkExec
DB version Select the version of your database.
Database Type in the directory where your database is stored.
Username and Password DB user authentication data.

To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create table: The table is removed and created
again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear table: The table content is deleted.
Table Name of the table to be written. Note that only one table
can be written at a time and that the table must exist
already for the insert operation to succeed.
Local filename Browse to the delimited file to be loaded into your d

atabase.
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-in: The schema is created and stored locally for this

component only. Related topic: see Talend Studio User
Guide.
Repository: The schema already exists and is stored in the

Repository, hence can be reused. Related topic: see Talend
Studio User Guide.
Click Edit schema to make changes to the schema. If the

current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
80
tAccessBulkExec
completion and choose this schema metadata again in

the Repository Content window.
Advanced settings
Additional JDBC parameters Specify additional connection properties for the DB

connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.
Include header Select this check box to include the column header.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component is to be used along with tAccessOutputB

ulk component. Used together, they can offer gains in
performance while feeding an Access database.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Limitation If you are using an ODBC driver, make sure that your JVM
and ODBC versions match up: both 64-bit or 32-bit.
Related scenarios
For use cases in relation with tAccessBulkExec, see the following scenarios:
• Inserting transformed data in MySQL database on page 2482
• Inserting data in bulk in MySQL database on page 2489
81
tAccessClose
tAccessClose
Closes an active connection to the Access database so as to release occupied resources.
tAccessClose Standard properties

These properties are used to configure tAccessClose running in the Standard Job framework.
The Standard tAccessClose component belongs to the Databases family.
Basic settings
Component list Select the tAccessConnection component in the list if more

than one connection is planned for the current Job.
Advanced settings
tStat Catcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Usage
Usage rule This component is to be used along with other Access

components, especially with tAccessConnection and
tAccessCommit.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
User Guide.
82
tAccessClose
Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
Related scenarios
No scenario is available for the Standard version of this component yet.
83
tAccessCommit
tAccessCommit
Commits in one go a global transaction instead of doing that on every row or every batch, and
provides gain in performance, using a unique connection.
tAccessCommit validates the data processed through the Job into the connected database.
tAccessCommit Standard properties

These properties are used to configure tAccessCommit running in the Standard Job framework.
The Standard tAccessCommit component belongs to the Databases family.
Basic settings

than one connection are planned for the current Job.
Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.
Warning:
If you want to use a Row > Main connection to link
tAccessCommit to your Job, your data will be committed
row by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component is more commonly used with other tAccess*
components, especially with the tAccessConnection and
tAccessRollback components.
84
tAccessCommit

User Guide.
Related scenario
For tAccessCommit related scenario, see Inserting data in mother/daughter tables on page 2426
85
tAccessConnection
tAccessConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tAccessConnection opens a connection to the database for a current transaction.
tAccessConnection Standard properties

These properties are used to configure tAccessConnection running in the Standard Job framework.
The Standard tAccessConnection component belongs to the Databases and the ELT families.
Basic settings
Built-in: No property data stored centrally.

DB Version Access 2003 or later versions.
Database Name of the database.

settings.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.
86
tAccessConnection
Advanced settings

connection you are creating.
Usage
Usage rule This component is more commonly used with other

tAccess* components, especially with the tAccessCommit
and tAccessRollback components.
When working with Java 8, this component supports only
the General collation mode of Access.
Inserting data in parent/child tables

The following Job is dedicated to advanced database users, who want to carry out multiple table
insertions using a parent table Table1 to generate two child tables: Name and Birthday.
• In Access 2007, create an Access database named Database1.
• Once the Access database is created, create a table named Table1 with two column headings:
Name and Birthday.
Back into the Integration perspective of Talend Studio , the Job requires twelve components
including tAccessConnection, tAccessCommit, tAccessInput, tAccessOutput and tAccessClose.
87
tAccessConnection
• Drop the following components from the Palette to the design workspace: tFileList, tFileInputDeli
mited, tMap, tAccessOutput (two), tAccessInput (two), tAccessCommit, tAccessClose and tLogRow
(x2).
• Connect the tFileList component to the input file component using an Iterate link. Thus, the name
of the file to be processed will be dynamically filled in from the tFileList directory using a global
variable.
• Connect the tFileInputDelimited component to the tMap component and dispatch the flow
between the two output Access components. Use a Row link for each of these connections
representing the main data flow.
• Set the tFileList component properties, such as the directory where files will be fetched from.
• Add a tAccessConnection component and connect it to the starter component of this Job. In this
example, the tFileList component uses an OnComponentOk link to define the execution order.
• In the tAccessConnection Component view, set the connection details manually or fetch them
from the Repository if you centrally store them as a Metadata DB connection entry. For more
information about Metadata, see Talend Studio User Guide .
• In the tFileInputDelimited component's Basic settings view, press Ctrl+Space bar to access the
variable list. Set the File Name field to the global variable: tFileList_1.CURRENT_FILEPATH. For
more information about using variables, see Talend Studio User Guide.
• Set the rest of the fields as usual, defining the row and field separators according to your file
structure.
• Then set the schema manually through the Edit schema dialog box or select the schema from the
Repository . Make sure the data type is correctly set, in accordance with the nature of the data
processed.
• In the tMap Output area, add two output tables, one called Name for the Name table, the second
called Birthday, for the Birthday table. For more information about the tMap component, see
Talend Studio User Guide.
• Drag the Name column from the Input area, and drop it to the Name table.
• Drag the Birthday column from the Input area, and drop it to the Birthday table.
88
tAccessConnection
• Then connect the output row links to distribute the flow correctly to the relevant DB output
components.
• In each of the tAccessOutput components' Basic settings view, select the Use an existing
connection check box to retrieve the tAccessConnection details.
• Set the Table name making sure it corresponds to the correct table, in this example either Name
or Birthday.
• There is no action on the table as they are already created.
• Select Insert as Action on data for both output components.
• Click on Sync columns to retrieve the schema set in the tMap.
• Then connect the first tAccessOutput component to the first tAccessInput component using an
OnComponentOk link.
• In each of the tAccessInput components' Basic settings view, select the Use an existing
connection check box to retrieve the distributed data flow. Then set the schema manually through
Edit schema dialog box.
• Then set the Table Name accordingly. In tAccessInput_1, this will be Name.
• Click on the Guess Query.
• Connect each tAccessInput component to tLogRow component with a Row > Main link. In each of
the tLogRow components' basic settings view, select Table in the Mode field.
• Add the tAccessCommit component below the tFileList component in the design workspace and
connect them together using an OnComponentOk link in order to terminate the Job with the trans
action commit.
• In the basic settings view of tAccessCommit component and from the Component list, select the
connection to be used, tAccessConnection_1 in this scenario.
• Save your Job and press F6 to execute it.
89
tAccessConnection
The parent table Table1 is reused to generate the Name table and Birthday table.
90
tAccessInput
tAccessInput
Reads a database and extracts fields based on a query.
tAccessInput executes a DB query with a strictly defined statement which must correspond to
the schema definition. Then it passes on the field list to the next component via a Row > Main
connection.
tAccessInput Standard properties

These properties are used to configure tAccessInput running in the Standard Job framework.
The Standard tAccessInput component belongs to the Databases family.
Basic settings

Click this icon to open a database connection wizard and

store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see the section describing how to
set up a DB connection of Talend Studio User Guide.
91
tAccessInput
connection.
Guide.
DB Version Select the version of Access that you are using.

settings.
Schema and Edit schema A schema is a row description. It defines the number of

Guide.

Studio User Guide.

available:
only.
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
92
tAccessInput
Advanced settings

level.
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
Trim column Remove leading and trailing whitespace from defined

columns.
Global Variables
Global Variables NB_LINE: the number of rows processed. This is an After

variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all possible SQL queries.
unusable.
93
tAccessInput

User Guide.
Related scenarios
For related topics, see:
Related topic in description of tContextLoad on page 496.
94
tAccessOutput
tAccessOutput
Writes, updates, makes changes or suppresses entries in a database.
tAccessOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
tAccessOutput Standard properties

These properties are used to configure tAccessOutput running in the Standard Job framework.
The Standard tAccessOutput component belongs to the Databases family.
Basic settings


connection parameters, see Talend Studio User Guide.
95
tAccessOutput
connection.
Guide.
Database Name of the database

settings.
can be written at a time
operations:
again.
exist.
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries.
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.
96
tAccessOutput
Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You can
do that by clicking Edit Schema and selecting the check
box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings
view where you can simultaneously define primary keys for
the update and delete operations. To do that: Select the
Use field options check box and then in the Key in update
column, select the check boxes next to the column name on
which you want to base the update operation. Do the same
in the Key in delete column for the deletion operation.
Built-In: You create and store the schema locally for this
component only.
Repository: You have already created the schema and stored

it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

available:
only.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings

97
tAccessOutput

Note:
You can press Ctrl+Space to access a list of predefined
global variables.
Commit every Number of rows to be completed before committing

batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and, above all, better
performance at executions.
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
Name: Type in the name of the schema column to be

altered or inserted as new column
SQL expression: Type in the SQL statement to be executed

in order to alter or insert the relevant column data.
Position: Select Before, Replace or After following the

action to be performed on the reference column.
Reference column: Type in a column of reference that the

tDBOutput can use to place or replace the new or altered
column.
level.
Use field options Select this check box to customize a request, especially
when there is double action on data.
Debug query mode Select this check box to display each step during processing
entries in a database.
Support null in "SQL WHERE" statement Select this check box if you want to deal with the Null
values contained in a DB table.
Note:
Make sure the Nullable check box is selected for the c
orresponding columns in the schema.
Global Variables

NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
98
tAccessOutput
NB_LINE_DELETED: the number of rows deleted. This is an

NB_LINE_REJECTED: the number of rows rejected. This is an
check box.
use from it.
User Guide.
Usage
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a Access database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMysqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.
unusable.
User Guide.
99
tAccessOutput
Related scenarios
• Inserting a column and altering data using tMysqlOutput on page 2466.
100
tAccessOutputBulk
tAccessOutputBulk
Prepares the file which contains the data used to feed the Access database.
process. These two steps are fused together in the tAccessOutputBulkExec component, detailed in a
separate section. The advantage of using a two step process is that it makes it possible to carry out
transformations on the data before loading it in the database.
tAccessOutputBulk writes a delimited file.
tAccessOutputBulk Standard properties

These properties are used to configure tAccessOutputBulk running in the Standard Job framework.
The Standard tAccessOutputBulk component belongs to the Databases family.
Basic settings

File Name Name and path to the file to be created and/or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
Create directory if not exists Select this check box to create the as yet non-existant file d
irectory that specified in the File name field.
Append Select this check box to add any new rows to the end of the
file.
component only.
101
tAccessOutputBulk

Job designs.
alend.com).

available:
only.
Advanced settings
Include header Select this check box to include the column header in the
file.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
level.
Global Variables

check box.
use from it.
User Guide.
102
tAccessOutputBulk
Usage
Usage rule This component is to be used along with tAccessBulkExec

component. Used together they offer gains in performance
while feeding an Access database.
Component family Databases/Access
Related scenarios
For use cases in relation with tAccessOutputBulk, see the following scenarios:
103
tAccessOutputBulkExec
Executes an Insert action on the data provided, in an Access database.
process. These two steps are fused together in tAccessOutputBulkExec.
As a dedicated component, tAccessOutputBulkExec improves performance during Insert operations in
an Access database.
tAccessOutputBulkExec Standard properties

These properties are used to configure tAccessOutputBulkExec running in the Standard Job
framework.
The Standard tAccessOutputBulkExec component belongs to the Databases family.
Basic settings

connection.
Guide.
104
DB name Name of the database

settings.
operations:
again.
already exist.
Table Name of the table to be written.
Note:
Note that only one table can be written at a time and
that the table must already exist for the insert operation
to succeed
FileName Name of the file to be processed.

Related topic: see Talend Studio User Guide.
Insert: Add new entries to the table.
component only.

Job designs.
alend.com).
105

available:
only.
Create directory if not exists Select this check box to create the as yet non existant file d
irectory specified in the File name field.
Append Select this check box to append new rows to the end of the
file.
Advanced settings

Note:
global variables.
Include header Select this check box to include the column header to the
file.
handling.
tStatCatcher Statistics Select this check box to collect the log data at the
component level.
Usage
Usage rule This component is mainly used when no particular

transformation is required on the data to be loaded in the
database.
106

unusable.
User Guide.
Limitation If you are using an ODBC driver, make sure that your JVM
and ODBC versions match up: both 64-bit or 32-bit.
Related scenarios
For use cases in relation with tAccessOutputBulkExec, see the following scenarios:
107
tAccessRollback
tAccessRollback
Cancels the transaction commit in the connected database and avoids to commit part of a transaction
involuntarily.
tAccessRollback Standard properties

These properties are used to configure tAccessRollback running in the Standard Job framework.
The Standard tAccessRollback component belongs to the Databases family.
Basic settings

Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.
Advanced settings
level.
Usage
Usage rule This component is more commonly used with other tAccess*
components, especially with the tAccessConnection and
tAccessCommit components.
108
tAccessRollback

User Guide.
Related scenarios
109
tAccessRow
tAccessRow
Executes the SQL query stated onto the specified database.
Depending on the nature of the query and the database, tAccessRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL
statements. tAccessRow is the specific component for this database query. The row suffix means the
component implements a flow in the job design although it does not provide output.
tAccessRow Standard properties

These properties are used to configure tAccessRow running in the Standard Job framework.
The Standard tAccessRow component belongs to the Databases family.
Basic settings

connection.
Guide.
DB Version Select the Access database version that you are using.
110
tAccessRow

settings.

Guide.

Studio User Guide.

available:
only.
Table Name Name of the source table where changes made to data
should be captured.
Query type The query can be Built-in for a particular Job, or for
commonly used query, it can be stored in the Repository to
ease the query reuse.
Built-in: Fill in manually the query statement or build it

graphically using SQLBuilder
Repository: Select the relevant query stored in the

Repository. The Query field gets accordingly filled in.
Query Enter your DB query paying particularly attention to

definition.

transaction quality (but not rollback) and above all better
performance on executions.
111
tAccessRow
Row > Rejects link.
Advanced settings
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
check box.
use from it.
User Guide.
112
tAccessRow
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
unusable.
User Guide.
Related scenarios
• Procedure on page 622
• Removing and regenerating a MySQL table index on page 2497.
113
tAddCRCRow
tAddCRCRow
Provides a unique ID which helps improving the quality of processed data. CRC stands for Cyclical
Redundancy Checking.
tAddCRCRow calculates a surrogate key based on one or several columns and adds it to the defined
schema.
tAddCRCRow Standard properties

These properties are used to configure tAddCRCRow running in the Standard Job framework.
The Standard tAddCRCRow component belongs to the Data Quality family.
Basic settings
Schema and Edit Schema A schema is a row description. it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
In this component, a new CRC column is automatically
added.
Built-in: The schema will be created and stored locally for

this component only. Related topic: see Talend Studio User
Guide.

Repository, hence can be reused in various projects and Job
designs. Related topic: see Talend Studio User Guide.
Implication Select the check box facing the relevant columns to be used
for the surrogate key checksum.
Advanced Settings
CRC type Select a CRC type in the list. The longer the CRC, the least
overlap you will have.
level.
Global Variables
Global Variables NB_LINE: the number of rows read by an input component

or transferred to an output component. This is an After
check box.
114
tAddCRCRow

use from it.
User Guide.
Usage
Usage rule This component is an intermediary step. It requires an input

flow as well as an output.
Limitation Due to license incompatibility, one or more JARs required

to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).
Adding a surrogate key to a file

This scenario describes a Job adding a surrogate key to a delimited file schema.
Setting up the Job

Procedure
1. Drop the following components: tFileInputDelimited, tAddCRCRow and tLogRow.
2. Connect them using a Main row connection.
Configuring the input component

Procedure
1. In the tFileInputDelimited Component view, set the File Name path and all related properties in
case these are not stored in the Repository.
115
tAddCRCRow
2. Create the schema through the Edit Schema button, if the schema is not stored already in the
Repository . Remember to set the data type column and for more information on the Date pattern
to be filled in, visit http://docs.oracle.com/javase/6/docs/api/index.html .
Configuring the tAddCRCRow component

Procedure
1. In the tAddCRCRow Component view, select the check boxes of the input flow columns to be used
to calculate the CRC.
Notice that a CRC column (read-only) has been added at the end of the schema.
2. Select CRC32 as CRC Type to get a longer surrogate key.
3. In the Basic settings view of tLogRow, select the Print values in cells of a table option to display
the output data in a table on the Console.
Job execution
Then save your Job and press F6 to execute it.
116
tAddCRCRow
An additional CRC Column has been added to the schema calculated on all previously selected
columns (in this case all columns of the schema).
117
tAddLocationFromIP
tAddLocationFromIP
Replaces IP addresses with geographical locations.
tAddLocationFromIP geolocates visitors through their IP addresses: this component identifies visitors'
geographical locations (country, region, city, latitude, longitude, ZIP code, etc.) using an IP address
lookup database file.
tAddLocationFromIP Standard properties

These properties are used to configure tAddLocationFromIP running in the Standard Job framework.
The Standard tAddLocationFromIP component belongs to the Misc family.
Basic settings
Schema and Edit schema A schema is a row description, it defines the fields to be
processed and passed on to the next component. The
schema of this component is read-only.
Built-in: You create and store the schema locally for this
Guide.
Repository: Select the Repository file where Properties are

stored. When selected, the fields that follow are pre-defined
using fetched data.
Database Filepath The path to the IP address lookup database file.
Input parameters Input column: Select the input column from which the input
values are to be taken.
input value is a hostname: Check if the input column holds

hostnames.
input value is an IP address: Check if the input column holds

IP addresses.
Location type Country code: Check to replace IP with country code.
Country name: Check to replace IP with country name.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
Global Variables

118
tAddLocationFromIP

check box.
use from it.
User Guide.
Usage
Usage rule This component is an intermediary step in the data flow

allowing to replace IP with geolocation information. It can
not be a start component as it requires an input flow. It also
requires an output component.

• geoip.jar
Identifying a real-world geographic location of an IP

The following scenario creates a three-component Job that associates an IP with a geographical
location. It obtains a site visitor's geographical location based on its IP.
Dropping and linking components

Procedure
1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput,
tAddLocationFromIP, and tLogRow.
2. Connect the three components using Row Main links.
119
tAddLocationFromIP
Configuring the components

Procedure
1. In the design workspace, select tFixedFlowInput, and click the Component tab to define the basic
settings for tFixedFlowInput.
2. Click the [...] button next to Edit Schema to define the structure of the data you want to use as
input. In this scenario, the schema is made of one column that holds an IP address.
3. Click OK to close the dialog box, and accept propagating the changes when prompted by the
system. The defined column is displayed in the Values panel of the Basic settings view.
4. In the Number of rows field, enter the number of rows to be generated, and click in the Value cell
and set the value for the IP address.
5. In the design workspace, select tAddLocationFromIP and click the Component tab to define the
basic settings for tAddLocationFromIP.
6. Click the Sync columns button to synchronize the schema with the input schema set with
tFixedFlowInput.
7. Browse to the GeoIP.dat file to set its path in the Database filepath field.
120
tAddLocationFromIP
Note:
Ensure to download the latest version of the IP address lookup database file from the relevant
site as indicated in the Basic settings view of tAddLocationFromIp.
8. In the Input parameters panel, set your input parameters as needed. In this scenario, the input
column is the ip column defined earlier that holds an IP address.
9. In the Location type panel, set location type as needed. In this scenario, we want to display the
country name.
10. In the design workspace, select tLogRow and click the Component tab and define the basic
settings for tLogRow as needed. In this scenario, we want to display values in cells of a table.
Saving and executing the Job

Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 or click Run in the Run tab to execute the Job.
Results
One row is generated to display the country name that is associated with the set IP address.
121
tAdvancedFileOutputXML
Writes an XML file with separated data values according to an XML tree structure.
tAdvancedFileOutputXML outputs data to an XML type of file and offers an interface to deal with loop
and group by elements if needed.
tAdvancedFileOutputXML Standard properties

These properties are used to configure tAdvancedFileOutputXML running in the Standard Job
framework.
The Standard tAdvancedFileOutputXML component belongs to the File and the XML families.
Basic settings
Property type Either Built-in or Repository.

stored. The following fields are pre-filled in using fetched
data.
Use Output Stream Select this check box process the data flow of interest. Once
you have selected it, the Output Stream field displays and
you can type in the data flow of interest.
The data flow to be processed must be added to the flow
in order for this component to fetch these data via the
corresponding representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using
along with this component; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.
For further information about how to use a stream, see
Reading data from a remote file in streaming mode on page
1020.
File name Name or path to the output file and/or the variable to be
used.
This field becomes unavailable once you have selected the
Use Output Stream check box.
Configure XML tree Opens the dedicated interface to help you set the XML
mapping. For details about the interface, see Defining the
XML tree on page 125.
122
Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either built-in or remote in the
Repository.
available:
only.

Guide.

Repository, hence can be reused in various projects and job
Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the Output component.
Append the source xml file Select this check box to add the new lines at the end of
your source XML file.
Generate compact file Select this check box to generate a file that does not have
any empty space or line separators. All elements then are
presented in a unique line and this will reduce considerably
file size.
Include DTD or XSL Select this check box to to add the DOCTYPE declaration,
indicating the root element, the access path and the DTD
file, or to add the processing instruction, indicating the
type of stylesheet used (such as XSL types), along with the
access path and file name.
Advanced settings
Split output in several files If the XML file output is big, you can split the file every
certain number of rows.
Trim data This check box is activated when you are using the dom4j
generation mode. Select this check box to trim the leading
or trailing whitespace from the value of a XML element.
Create directory only if not exists This check box is selected by default. It creates a directory
to hold the output XML files if required.
123
Create empty element if needed This box is selected by default. If no column is associated
to an XML node, this option will create an open/close tag in
place of the expected tag.
Create attribute even if its value is NULL Select this check box to generate XML tag attribute for the
associated input column whose value is null.
Create attribute even if it is unmapped Select this check box to generate XML tag attribute for the
associated input column that is unmapped.
Create associated XSD file If one of the XML elements is defined as a Namespace
element, this option will create the corresponding XSD file.
Note:
To use this option, you must select Dom4J as the
generation mode.
Add Document type as node Select this check box to add column(s) of the Document
type as node(s) instead of escaped string(s) in the output
XML file.
This check box appears only when the generation mode
is set to Slow and memory-consuming (Dom4j) in the
Advanced settings tab.
Advanced separator (for number) Select this check box to change the expected data s
eparator.
Thousands separator: define the thousands separator,
between inverted commas
Decimal separator: define the decimals separator between
inverted commas
Generation mode Select the appropriate generation mode according to your

memory availability. The available modes are:
• Slow and memory-consuming (Dom4j)
Note:
This option allows you to use dom4j to process the
XML files of high complexity.
• Fast with low memory consumption

Once you select Append the source xml file in the Basic
settings view, this field disappears because in this situation,
your generation mode is set automatically as dom4j.
handling.
Don't generate empty file Select the check box to avoid the generation of an empty
file.
tStatCatcher Statistics Select the check box to collect the log data at a Job level as
well as at each component level.
124
Global Variables
Global Variables ERROR_MESSAGE: the error message generated by the

check box.
NB_LINE: the number of rows processed. This is an After
use from it.
User Guide.
Usage
Usage rule Use this component to write an XML file with data passed
on from other components using a Row link.
Defining the XML tree

Double-click on the tAdvancedFileOutputXML component to open the dedicated interface or click on
the three-dot button on the Basic settings vertical tab of the Component Settings tab.
125
To the left of the mapping interface, under Schema List, all of the columns retrieved from the
incoming data flow are listed (only if an input flow is connected to the tAdvancedFileOutputXML
component).
To the right of the interface, define the XML structure you want to obtain as output.
You can easily import the XML structure or create it manually, then map the input schema columns
onto each corresponding element of the XML tree.
Importing the XML tree

The easiest and most common way to fill out the XML tree panel, is to import a well-formed XML file.
Procedure
1. Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2. Right-click on the root tag to display the contextual menu.
3. On the menu, select Import XML tree.
4. Browse to the file to import and click OK.
• You can import an XML tree from files in XML, XSD and DTD formats.
• When importing an XML tree structure from an XSD file, you can choose an element as the
root of your XML tree.
The XML Tree column is hence automatically filled out with the correct elements.
5. If you need to add or remove an element or sub-elements, right-click the relevant element of the
tree to display the contextual menu.
6. Select Delete to remove the selection from the tree or select the relevant option among: Add sub-
element, Add attribute, Add namespace to enrich the tree.
Creating the XML tree manually

If you don't have any XML structure defined as yet, you can create it manually.
Procedure
3. On the menu, select Add sub-element to create the first element of the structure.
4. If you need to add an attribute or a child element to any element or remove any element, right-
click the left of the corresponding element name to display the contextual menu.
5. Right-click to the left of the element name to display the contextual menu.
6. On the menu, select the relevant option among: Add sub-element, Add attribute, Add namespace
or Delete.
126
Mapping XML data

Once your XML tree is ready, you can map each input column with the relevant XML tree element or
sub-element to fill out the Related Column.
Procedure
1. Click on one of the Schema column name.
2. Drag it onto the relevant sub-element to the right.
3. Release to implement the actual mapping.
4. If you need to disconnect any mapping on any element of the XML tree, select the element and
right-click to the left of the element name to display the contextual menu.
5. Select Disconnect linker.
Defining the node status

Defining the XML tree and mapping the data is not sufficient. You also need to define the loop
element and if required the group element.
Define a loop element

The loop element allows you to define the iterating object. Generally the Loop element is also the
row generator.
About this task

To define an element as loop element:
Procedure
1. Select the relevant element on the XML tree.
3. Select Set as Loop Element.
Results
The Node Status column shows the newly added status.
There can only be one loop element at a time.
Define a group element

The group element is optional, it represents a constant element where the groupby operation can be
performed. A group element can be defined only if a loop element was defined before.
About this task

When using a group element, the rows should sorted, in order to be able to group by the selected
node.
To define an element as group element:
127
Procedure
3. Select Set as Group Element.
Results
The Node Status column shows the newly added status and any group status required are
automatically defined, if needed.
Click OK once the mapping is complete to validate the definition and continue the job configuration
where needed.
Creating an XML file using a loop

The following scenario describes the creation of an XML file from a sorted flat file gathering a video
collection.
Configuring the source file

Procedure
1. Drop a tFileInputDelimited and a tAdvancedFileOutputXML from the Palette onto the design
workspace.
2. Alternatively, if you configured a description for the input delimited file in the Metadata area of
the Repository, then you can directly drag & drop the metadata entry onto the editor, to set up
automatically the input flow.
3. Right-click on the input component and drag a row main link towards the tAdvancedFileO
utputXML component to implement a connection.
4. Select the tFileInputDelimited component and display the Component settings tab located in the
tab system at the bottom of the Studio.
128
5. Select the Property type, according to whether you stored the file description in the Repository or
not. If you dragged & dropped the component directly from the Metadata, no changes to the setti
ng should be needed.
If you didn't setup the file description in the Repository, then select Built-in and manually fill out
the fields displayed on the Basic settings vertical tab.
The input file contains the following type of columns separated by semi-colons: id, name, category,
year, language, director and cast.
In this simple use case, the Cast field gathers different values and the id increments when
changing movie.
6. If needed, define the tFileDelimitedInput schema according to the file structure.
7. Once you checked that the schema of the input file meets your expectation, click on OK to
validate.
129
Configuring the XML output and mapping

Procedure
1. Then select the tAdvancedFileOutputXML component and click on the Component settings tab to
configure the basic settings as well as the mapping. Note that a double-click on the component
will open directly the mapping interface.
2. In the File Name field, browse to the file to be written if it exists or type in the path and file name
that needs to be created for the output.
By default, the schema (file description) is automatically propagated from the input flow. But you
can edit it if you need.
3. Then click on the three-dot button or double-click on the tAdvancedFileOutputXML component
on the design workspace to open the dedicated mapping editor.
To the left of the interface, are listed the columns from the input file description.
4. To the right of the interface, set the XML tree panel to reflect the expected XML structure output.
You can create the structure node by node. For more information about the manual creation of an
XML tree, see Defining the XML tree on page 125.
In this example, an XML template is used to populate the XML tree automatically.
5. Right-click on the root tag displaying by default and select Import XML tree at the end of the
contextual menu options.
6. Browse to the XML file to be imported and click OK to validate the import operation.
Note:
You can import an XML tree from files in XML, XSD and DTD formats.
7. Then drag & drop each column name from the Schema List to the matching (or relevant) XML tree
elements as described in Mapping XML data on page 127.
The mapping is shown as blue links between the left and right panels.
130
Finally, define the node status where the loop should take place. In this use case, the Cast being
the changing element on which the iteration should operate, this element will be the loop
element.
Right-click on the Cast element on the XML tree, and select Set as loop element.
8. To group by movie, this use case needs also a group element to be defined.
Right-click on the Movie parent node of the XML tree, and select Set as group element.
The newly defined node status show on the corresponding element lines.
9. Click OK to validate the configuration.
10. Press F6 to execute the Job.
131
The output XML file shows the structure as defined.
132
tAggregateRow
tAggregateRow
Receives a flow and aggregates it based on one or more columns.
For each output line, are provided the aggregation key and the relevant result of set operations (min,
max, sum...).
tAggregateRow helps to provide a set of metrics based on values or calculations.
tAggregateRow Standard properties

These properties are used to configure tAggregateRow running in the Standard Job framework.
The Standard tAggregateRow component belongs to the Processing family.
Basic settings
available:
only.
component only.

Job designs.
Group by Define the aggregation sets, the values of which will be

used for calculations.
Output Column: Select the column label in the list offered

based on the schema structure you defined. You can add as
many output columns as you wish to make more precise
aggregations.
Ex: Select Country to calculate an average of values for eac
h country of a list or select Country and Region if you want
to compare one country's regions with another country'
regions.
133
tAggregateRow
Input Column: Match the input column label with your

output columns, in case the output label of the aggregation
set needs to be different.
Operations Select the type of operation along with the value to use for
the calculation and the output field.
Output Column: Select the destination field in the list.
Function: Select the operator among:

• count: calculates the number of rows
• min: selects the minimum value
• max: selects the maximum value
• avg: calculates the average
• sum: calculates the sum
• first: returns the first value
• last: returns the last value
• list: lists values of an aggregation by multiple keys.
• list (object): lists Java values of an aggregation by
multiple keys
• count (distinct): counts the number of the distinct rows
• standard deviation: calculates the variability of a set of
value.
• union (geometry): makes the union of a set of
Geometry objects
• population standard deviation: calculates the spread of
a data distribution. Use this function if the data to be
calculated is considered a population on its own. This
calculation supports 39 decimal places.
• sample standard deviation: calculates the spread of
a data distribution. Use this function if the data to
be calculated is considered a sample from a larger
population. This calculation supports 39 decimal
places.
Input column: Select the input column from which the

values are taken to be aggregated.
Ignore null values: Select the check boxes corresponding

to the names of the columns for which you want the NULL
value to be ignored.
Advanced settings
Delimiter(only for list operation) Enter the delimiter you want to use to separate the different
operations.
Use financial precision, this is the max precision for "sum" Select this check box to use a financial precision. This is a
and "avg" operations, checked option heaps more memory max precision but consumes more memory and slows the
and slower than unchecked. processing.
Warning:
We advise you to use the BigDecimal type for the output in
order to obtain precise results.
134
tAggregateRow
Check type overflow (slower) Checks the type of data to ensure that the Job doesn't
crash.
Check ULP (Unit in the Last Place), ensure that a value will Select this check box to ensure the most precise results
be incremented or decremented correctly, only float and possible for the Float and Double types.
double types. (slower)
tStatCatcher Statistics Check this box to collect the log data at component level.
Note that this check box is not available in the Map/Reduce
version of the component.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component handles flow of data therefore it requires

input and output, hence is defined as an intermediary step.
Usually the use of tAggregateRow is combined with the
tSortRow component.
Aggregating values and sorting data

This example shows you how to use Talend components to aggregate the students' comprehensive
scores and then sort the aggregated scores based on the student names.
Creating a Job for aggregating and sorting data

Create a Job to aggregate the students' comprehensive scores using the tAggregateRow component,
then sort the aggregated data using the tSortRow component, finally display the aggregated and
sorted data on the console.
135
tAggregateRow
Procedure
1. Create a new Job and add a tFixedFlowInput component, a tAggregateRow component, a
tSortRow component, and a tLogRow component by typing their names in the design workspace
or dropping them from the Palette.
2. Link the tFixedFlowInput component to the tAggregateRow component using a Row > Main
connection.
3. Do the same to link the tAggregateRow component to the tSortRow component, and the tSortRow
component to the tLogRow component.
Configuring the Job for aggregating and sorting data

Configure the Job to aggregate the students' comprehensive scores using the tAggregateRow
component and then sort the aggregated data using the tSortRow component.
Procedure
1. Double-click the tFixedFlowInput component to open its Basic settings view.
2. Click the button next to Edit schema to open the schema dialog box and define the schema by
adding two columns, name of String type and score of Double type. When done, click OK to save
the changes and close the schema dialog box.
3. In the Mode area, select Use Inline Content (delimited file) and in the Content field displayed,
enter the following input data:
Peter;92
James;93
Thomas;91
Peter;94
James;96
Thomas;95
Peter;96
James;92
Thomas;98
Peter;95
James;96
Thomas;93
Peter;98
James;97
Thomas;95
4. Double-click the tAggregateRow component to open its Basic settings view.
136
tAggregateRow
5. Click the button next to Edit schema to open the schema dialog box and define the schema by
adding five columns, name of String type, and sum, average, max, and min of Double type.
When done, click OK to save the changes and close the schema dialog box.
6. Add one row in the Group by table by clicking the button below it, and select name from both
the Output column and Input column position column fields to group the input data by the name
column.
7. Add four rows in the Operations table and define the operations to be carried out. In this example,
the operations are sum, average, max, and min. Then select score from all four Input column po
sition column fields to aggregate the input data based on it.
8. Double-click the tSortRow component to open its Basic settings view.
137
tAggregateRow
9. Add one row in the Criteria table and specify the column based on which the sort operation is
performed. In this example, it is the name column. Then select alpha from the sort num or alpha?
column field and asc from the Order asc or desc? column field to sort the aggregated data in
ascending alphabetical order.
10. Double-click the tLogRow component to open its Basic settings view, and then select Table (print
values in cells of a table) in the Mode area for better readability of the result.
Executing the Job to aggregate and sort data

After setting up the Job and configuring the components used in the Job for aggregating and sorting
data, you can then execute the Job and verify the Job execution result.
Procedure
1. Press Ctrl + S to save the Job.
Results
As shown above, the students' comprehensive scores are aggregated and then sorted in ascending
alphabetical order based on the student names.
138
tAggregateSortedRow
tAggregateSortedRow
Aggregates the sorted input data for output column based on a set of operations. Each output column
is configured with many rows as required, the operations to be carried out and the input column from
which the data will be taken for better data aggregation.
tAggregateSortedRow Standard properties

These properties are used to configure tAggregateSortedRow running in the Standard Job framework.
The Standard tAggregateSortedRow component belongs to the Processing family.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
in the Repository.
available:
only.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.

Guide.

flowcharts. Related topic: see Talend Studio User Guide.
Input rows count Specify the number of rows that are sent to the
tAggregateSortedRow component.
Note:
If you specified a Limit for the number of rows to be
processed in the input component, you will have to use
that same limit in the Input rows count field.
139
tAggregateSortedRow


based on the schema structure you defined. You can add
as many output columns as you wish to make more precise
aggregations.
Ex: Select Country to calculate an average of values for each
country of a list or select Country and Region if you want
to compare one country's regions with another country'
regions.
Input Column: Match the input column label with your

output columns, in case the output label of the aggregation
set needs to be different.
Function: Select the operator among:

• count: calculates the number of rows
• min: selects the minimum value
• max: selects the maximum value
• avg: calculates the average
• sum: calculates the sum
• first: returns the first value
• last: returns the last value
• list: lists values of an aggregation by multiple keys.
• list (object): lists Java values of an aggregation by
multiple keys
• count (distinct): counts the number of the distinct rows
• standard deviation: calculates the variability of a set of
value.
• union (geometry): makes the union of a set of
Geometry objects
• population standard deviation: calculates the spread of
a data distribution. Use this function if the data to be
calculated is considered a population on its own. This
calculation supports 39 decimal places.
• sample standard deviation: calculates the spread of
a data distribution. Use this function if the data to
be calculated is considered a sample from a larger
population. This calculation supports 39 decimal
places.
Input column: Select the input column from which the

values are taken to be aggregated.
Ignore null values: Select the check boxes corresponding

to the names of the columns for which you want the NULL
value to be ignored.
Advanced settings
tStatCatcher Statistics Check this box to collect the log data at component level.
140
tAggregateSortedRow
Global Variables

check box.
NB_LINE: the number of rows read by an input component
use from it.
User Guide.
Usage

Sorting and aggregating the input data

This scenario describes a Job that sorts the entries of the input data based on two columns and
displays the sorted data on the console, then aggregates the sorted data based on one column and
displays the aggregated data on the console.
Adding and linking components

Procedure
1. Create a new Job and add the following components by typing their names in the design
workspace or dropping them from the Palette: a tFixedFlowInput component, a tSortRow
component, a tAggregateSortedRow component, and two tLogRow components.
2. Link tFixedFlowInput to tSortRow using a Row > Main connection.
3. Do the same to link tSortRow to the first tLogRow, link the first tLogRow to tAggregateSort
edRow, and link tAggregateSortedRow to the second tLogRow.
141
tAggregateSortedRow

Sorting the input data
Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.
2. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
four columns: Id and Age of Integer type, and Name and Team of String type.
Click OK to close the schema editor and accept the propagation prompted by the pop-up dialog
box.
142
tAggregateSortedRow
3. In the Mode area, select Use Inline Content(delimited file), and then in the Content field
displayed, enter the input data to be sorted and aggregated. In this example, the input data is as
follows:
1;Thomas;28;Component Team
2;Harry;32;Doc Team
3;John;26;Component Team
4;Nicolas;27;QA Team
5;George;24;Component Team
6;Peter;30;Doc Team
7;Teddy;23;QA Team
8;James;26;Component Team
4. Double-click tSortRow to open its Basic settings view.
5. Click the [+] button below the Criteria table to add as many rows as required and then specify
the sorting criteria in the table. In this example, two rows are added, and the input entries will be
sorted based on the column Team and then the column Age, both in ascending order.
6. Double-click the first tLogRow to open its Basic settings view.
7. In the Mode area, select Table (print values in cells of a table) for better readability of the sorting
result.
Aggregating the sorted data
Procedure
1. Double-click tAggregateSortedRow to open its Basic settings view.
143
tAggregateSortedRow
five columns: AggTeam of String type, AggCount, MinAge, MaxAge, and AvgAge of Integer type.
box.
3. In the Input rows count field, enter the exact number of rows of the input data. In this example, it
is 8.
4. Click the [+] button below the Group by table to add as many rows as required and specify the
aggregation set in the table. In this example, the data will be aggregated based on the input
column Team.
5. Click the [+] button below the Operations table to add as many rows as required and specify the
operation to be carried out and the corresponding input column from which the data will be taken
for each output column. In this example, we want to calculate the number of the input entries, the
minimum age, the maximum age, and the average age for each team.
6. Double-click the second tLogRow to open its Basic settings view.
144
tAggregateSortedRow
7. In the Mode area, select Table (print values in cells of a table) for better readability of the sorting
result.

Procedure
2. Execute the Job by pressing F6 or clicking Run on the Run tab.
As shown above, the input entries are sorted based on the column Team and then the column Age,
both in ascending order, and the sorted entries are then aggregated based on the column Team.
145
tAmazonAuroraClose
tAmazonAuroraClose
Closes an active connection to an Amazon Aurora database instance to release the occupied
resources.
tAmazonAuroraClose Standard properties

These properties are used to configure tAmazonAuroraClose running in the Standard Job framework.
The Standard tAmazonAuroraClose component belongs to the Cloud and the Databases families.
Basic settings
Component List Select the tAmazonAuroraConnection component that

opens the connection you need to close from the list.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is more commonly used with other Amazon
Aurora components, especially with the tAmazonAuroraC
onnection and tAmazonAuroraCommit components.
146
tAmazonAuroraClose
User Guide.
Related scenario
For a related scenario, see Handling data with Amazon Aurora on page 156.
147
tAmazonAuroraCommit
tAmazonAuroraCommit
tAmazonAuroraCommit validates the data processed through the Job into the connected Amazon
Aurora database.
tAmazonAuroraCommit Standard properties

These properties are used to configure tAmazonAuroraCommit running in the Standard Job framework.
The Standard tAmazonAuroraCommit component belongs to the Cloud and the Databases families.
Basic settings
Component List Select the tAmazonAuroraConnection component for which

you want the commit action to be performed.
Close Connection This check box is selected by default and it allows you
to close the database connection once the commit is
done. Clear this check box to continue to use the selected
connection after the component has performed its task.
Warning:
tAmazonAuroraCommit to your Job, your data will be
committed row by row. In this case, do not select the Close
Connection check box or your connection will be closed
before the end of your first row commit.
Advanced settings
Global Variables

check box.
148
tAmazonAuroraCommit

use from it.
User Guide.
Usage
onnection and tAmazonAuroraRollback components.
User Guide.
Related scenario
149
tAmazonAuroraConnection
Opens a connection to an Amazon Aurora database instance that can then be reused by other Amazon
Aurora components.
tAmazonAuroraConnection Standard properties

These properties are used to configure tAmazonAuroraConnection running in the Standard Job
framework.
The Standard tAmazonAuroraConnection component belongs to the Cloud and the Databases families.
Basic settings
Property Type Either Built-In or Repository.
Built-In: No property data stored centrally.
Repository: Select the repository file in which the properties

are stored. The database connection fields that follow are
completed automatically using the data retrieved.
Host Type in the IP address or hostname of the Amazon Aurora

database.
Port Type in the listening port number of the Amazon Aurora

database.
Database Type in the name of the database you want to use.
Additional JDBC parameters Specify additional connection properties for the database
Username and Password Type in the database user authentication data.

settings.
150

This check box is not available when the Specify a data
source alias check box is selected.
Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box disappears when the Use or register a shared
DB Connection check box is selected.
Data source alias Type in the alias of the data source created on the Talend
Runtime side.
This field appears only when the Specify a data source alias
check box is selected.
Advanced settings
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
Global Variables

check box.
use from it.
151

User Guide.
Usage
ommit and tAmazonAuroraRollback components.
Related scenario
152
tAmazonAuroraInput
tAmazonAuroraInput
Reads an Amazon Aurora database and extracts fields based on a query.
tAmazonAuroraInput executes a database query with a strictly defined order which must correspond
to the schema definition. Then it passes on the field list to the next component via a Row >Main link.
tAmazonAuroraInput Standard properties

These properties are used to configure tAmazonAuroraInput running in the Standard Job framework.
The Standard tAmazonAuroraInput component belongs to the Cloud and the Databases families.
Basic settings
Repository: Select the repository file in which the properties

are stored. The database connection fields that follow are
completed automatically using the data retrieved.
connection.
Guide.

database.
153
tAmazonAuroraInput

database.

settings.
component only.

Job designs.

available:
only.
Table Name Type in the name of the table to be read.
Query Type and Query Enter the database query paying particularly attention to
the proper sequence of the fields in order to match the
schema definition.
Guess Query Click the button to generate the query which corresponds to
the table schema in the Query field.
Guess schema Click the button to retrieve the schema from the table.
Talend Runtime .
This check box disappears when the Use an existing
connection check box is selected.
Runtime side.
154
tAmazonAuroraInput
Advanced settings
connection you are creating. When you need to handle
data of the time-stamp type 0000-00-00 00:00:00 using
this component, set the parameter to noDatetimeStri
ngSync=true&zeroDateTimeBehavior=convertT
oNull.
This field disappears when the Use an existing connection
check box in the Basic settings view is selected.
Enable stream Select this check box to enable streaming over buffering
which allows the code to read from a large table without
consuming a large amount of memory in order to optimize
the performance.
Trim column Select the check box(es) in the Trim column to remove
leading and trailing whitespace from the corresponding
column(s).
This option disappears when the Trim all the String/Char
columns check box is selected.
Global Variables

check box.
use from it.
User Guide.
155
tAmazonAuroraInput
Usage
Usage rule This component is usually used as a start component of a

Job or subJob and it needs an output link.
unusable.
User Guide.
Handling data with Amazon Aurora

This scenario describes a Job that writes the user information into Amazon Aurora, and then reads the
information in Amazon Aurora and displays it on the console.
156
tAmazonAuroraInput
The scenario requires the following seven components:

• tAmazonAuroraConnection: opens a connection to Amazon Aurora.
• tFixedFlowInput: defines the user information data structure, and sends the data to the next
component.
• tAmazonAuroraOutput: writes the data it receives from the preceding component into Amazon
Aurora.
• tAmazonAuroraCommit: commits in one go the data processed to Amazon Aurora.
• tAmazonAuroraInput: reads the data from Amazon Aurora.
• tLogRow: displays the data it receives from the preceding component on the console.
• tAmazonAuroraClose: closes the connection to Amazon Aurora.
157
tAmazonAuroraInput
Adding and linking the components

Procedure
1. Create a new Job and add seven components listed previously by typing their names in the design
workspace or dropping them from the Palette.
2. Connect tFixedFlowInput to tAmazonAuroraOutput using a Row > Main connection.
3. Do the same to connect tAmazonAuroraInput to tLogRow.
4. Connect tAmazonAuroraConnection to tFixedFlowInput using a Trigger > OnSubjobOk connection.
5. Do the same to connect tFixedFlowInput to tAmazonAuroraCommit, tAmazonAuroraCommit to
tAmazonAuroraInput, and tAmazonAuroraInput to tAmazonAuroraClose.

Opening a connection to Amazon Aurora
Procedure
1. Double-click tAmazonAuroraConnection to open its Basic settings view.
2. In the Host, Port, Database, Username and Password fields, enter the information required for the
connection to Amazon Aurora.
Writing the data into Amazon Aurora
Procedure
158
tAmazonAuroraInput
three columns: id of Integer type, and name and city of String type.
Click OK to validate the changes and accept the propagation prompted by the pop-up dialog box.
3. In the Mode area, select Use Inline Content (delimited file) and enter the following user
information in the Content field.
1;George;Bismarck
2;Abraham;Boise
3;Taylor;Nashville
4;William;Jefferson City
5;Alexander;Jackson
6;James;Boise
7;Gerald;Little Rock
8;Tony;Richmond
9;Thomas;Springfield
10;Andre;Nashville
4. Double-click tAmazonAuroraOutput to open its Basic settings view.
159
tAmazonAuroraInput
5. Select the Use an existing connection check box and in the Component List that appears, select
the connection component you have configured.
6. In the Table field, enter or browse to the table into which you want to write the data. In this
example, it is TalendUser.
7. Select Drop table if exists and create from the Action on table drop-down list, and select Insert
from the Action on data drop-down list.
8. Double-click tAmazonAuroraCommit to open its Basic settings view.
9. Clear the Close Connection check box if it is selected.
Retrieving the data from Amazon Aurora
Procedure
1. Double-click tAmazonAuroraInput to open its Basic settings view.
2. Select the Use an existing connection check box and in the Component List that appears, select
the connection component you have configured.
160
tAmazonAuroraInput
three columns: id of Integer type, and name and city of String type. The data structure is same as
the structure you have defined for tFixedFlowInput.
4. In the Table Name field, enter or browse to the table into which you write the data. In this
example, it is TalendUser.
5. Click the Guess Query button to generate the query. The Query field will be filled with the autom
atically generated query.
6. Double-click tLogRow to open its Basic settings view.
7. In the Mode area, select Table (print values in cells of a table) for better readability of the result.
Closing the connection to Amazon Aurora
Procedure
1. Double-click tAmazonAuroraClose to open its Basic settings view.
2. In the Component List, select the connection component you have configured.

Procedure
2. Press F6 or click Run on the Run tab to run the Job.
161
tAmazonAuroraInput
As shown above, the user information is written into Amazon Aurora, and then the data is retrie
ved from Amazon Aurora and displayed on the console.
162
tAmazonAuroraOutput
tAmazonAuroraOutput
Writes, updates, makes changes or suppresses entries in an Amazon Aurora database.
tAmazonAuroraOutput executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.
tAmazonAuroraOutput Standard properties

These properties are used to configure tAmazonAuroraOutput running in the Standard Job framework.
The Standard tAmazonAuroraOutput component belongs to the Cloud and the Databases families.
Basic settings

properties are stored. The database connection fields that
follow are completed automatically using the data retrieved.
connection.
Guide.

database.
163
tAmazonAuroraOutput

database.

settings.
Table Type in the name of the table to be written. Note that only
one table can be written at a time.
operations:
• None: No operation is carried out.
• Drop and create table: The table is removed and
created again.
• Create table: The table does not exist and gets
created.
• Create table if not exists: The table is created if it does
not exist.
• Drop table if exists and create: The table is removed if
it already exists and created again.
• Clear table: The table content is deleted.
• Truncate table: The table content is quickly deleted.
However, you will not be able to rollback the
operation.
Action on data On the data of the table defined, you can perform one of the
following operations:
• Insert: Add new entries to the table. If duplicates are
found, the job stops.
• Update: Make changes to existing entries.
• Insert or update: Insert a new record. If the record with
the given reference already exists, an update would be
made.
• Update or insert: Update the record with the given
reference. If the record does not exist, a new record
would be inserted.
• Delete: Remove entries corresponding to the input
flow.
• Replace: Add new entries to the table. If an old row
in the table has the same value as a new row for a
PRIMARY KEY or a UNIQUE index, the old row is
deleted before the new row is inserted.
• Insert or update on duplicate key or unique index: Add
entries if the inserted value does not exist or update
entries if the inserted value already exists and there is
a risk of violating a unique index or primary key.
• Insert Ignore: Add only new rows to prevent duplicate
key errors.
164
tAmazonAuroraOutput
Warning:
which the Update and Delete operations are based. You
can do that by clicking Edit schema and selecting the check
component only.

Job designs.
alend.com).

available:
only.
Talend Runtime .
This check box disappears when the Use an existing
Runtime side.
165
tAmazonAuroraOutput
Die on error This check box is selected by default. Clear the check box to
skip the row in error and complete the process for error-free
rows. If needed, you can retrieve the rows in error via a Row
> Rejects link.
Advanced settings
This field disappears when the Use an existing connection
check box in the Basic settings view is selected.
Note:
global variables.
Extend Insert Select this check box to carry out a bulk insert of a defined
set of lines instead of inserting lines one by one. The gain in
system performance is considerable.
This check box appears only when the Insert option is
selected from the Action on data list in the Basic settings
view.
Note:
This option is not compatible with the Reject link. You
should therefore clear the check box if you are using a
Row > Rejects link with this component.
Number of rows per insert Enter the number of rows to be inserted per operation. Note
that the higher the value specified, the lower performance
levels shall be due to the increase in memory demands.
This field appears only when the Extend Insert check box is
selected.
Use Batch Select this check box to activate the batch mode for data
processing.
This check box is available only when the Update or Delete
option is selected from the Action on data list in the Basic
settings view.
Batch Size Specify the number of records to be processed in each

batch.
This field appears only when the Use batch mode check box
is selected.
Commit every Enter the number of rows to be included in a batch before

it is committed to the database. This option ensures
transaction quality (but not rollback) and, above all, a
higher performance level.
Additional columns This option allows you to call SQL functions to perform
actions on columns, provided that these are not insert,
update or delete actions, or actions that require pre-
166
tAmazonAuroraOutput
processing. This option is not available if you have

just created the database table (even if you delete it
beforehand). Click the [+] button under the table to add
column(s), and set the following parameters for each co
lumn.
• Name: Type in the name of the schema column to be al
tered or inserted.
• SQL expression: Type in the SQL statement to be ex
ecuted in order to alter or insert the data in the c
orresponding column.
• Position: Select Before, After or Replace depending on
the action to be performed on the reference column.
• Reference column: Type in a reference column that
tAmazonAuroraOutput can use to locate or replace the
new column or the column to be modified.
Use field options Select the check box for the corresponding column to
customize a request, particularly if multiple actions are
being carried out on the data.
• Key in update: Select the check box for the c
orresponding column based on which the data is up
dated.
• Key in delete: Select the check box for the c
orresponding column based on which the data is de
leted.
• Updatable: Select the check box if the data in the c
orresponding column can be updated.
• Insertable: Select the check box if the data in the c
orresponding column can be inserted.
Use Hint Options Select this check box to configure the hint(s) which can help
you optimize a query's execution.
Hint Options Click the [+] button under the table to add hint(s) and set
the following parameters for each hint. This table appears
only when the Use Hint Options check box is selected.
• HINT: Specify the hint you need, using the syntax /*+
*/.
• POSITION: Specify where you put the hint in an SQL s
tatement.
• SQL STMT*: Select an SQL statement INSERT, UPDATE,
or DELETE you need to use.
Use duplicate key update mode insert Select this check box to activate the ON DUPLICATE KEY
UPDATE mode, and then click the [+] button under the
table displayed to add column(s) to be updated and specify
the update action to be performed on the corresponding
column.
• Column: Enter the name of the column to be updated.
• Value: Enter the action to be performed on the column.
This check box is available only when the Insert option is
selected from the Action on data list in the Basic settings
view.
167
tAmazonAuroraOutput
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component must be used as an output component. It

a table in an Amazon Aurora database. It also allows you to
create a reject flow using a Row > Rejects link to filter data
in error. For a similar scenario, see .
unusable.
168
tAmazonAuroraOutput

User Guide.
Related scenario
169
tAmazonAuroraRollback
Rolls back any changes made in the Amazon Aurora database to prevent partial transaction commit if
an error occurs.
tAmazonAuroraRollback Standard properties

These properties are used to configure tAmazonAuroraRollback running in the Standard Job
framework.
The Standard tAmazonAuroraRollback component belongs to the Cloud and the Databases families.
Basic settings
Component List Select the tAmazonAuroraConnection component for which

you want the rollback action to be performed.
to close the database connection once the rollback is
Advanced settings
Global Variables

check box.
use from it.
User Guide.
170
Usage
onnection and tAmazonAuroraCommit components.
User Guide.
Related Scenario
171
tAmazonEMRListInstances
Lists the details about the instance groups in a cluster on Amazon EMR (Elastic MapReduce).
tAmazonEMRListInstances Standard properties

These properties are used to configure tAmazonEMRListInstances running in the Standard Job
framework.
The Standard tAmazonEMRListInstances component belongs to the Cloud family.
Basic settings
Access key and Secret key Specify the access keys (the access key ID in the Access
Key field and the secret access key in the Secret Key field)
required to access the Amazon Web Services. For more
information on AWS access keys, see Access keys (access key
ID and secret access key).
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
settings.
Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. These credentials can be used on Amazon
EC2 instances, and are delivered through the Amazon
EC2 metadata service. To use this option, your Job must
be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.
Assume role If you temporarily need some access permissions associated

to an AWS IAM role that is not granted to your user account,
select this check box to assume that role. Then specify
the values for the following parameters to create a new
assumed role session.
Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks
(for example "us-east-1"). For more information about how to
specify the AWS region, see Choose an AWS Region.
Filter master and core instances Select this check box to ignore the master and core instance
groups and list only the task instance groups.
Cluster id Enter the ID of the cluster for which you want to list the
instance groups.
Advanced settings
STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,
172
sts.amazonaws.com, where session credentials are

retrieved from.
This check box is available only when the Assume role
Global Variables
Global Variables CURRENT_GROUP_ID: the ID of the current instance group.

This is an After variable and it returns a string.
CURRENT_GROUP_NAME: the name of the current instance
group. This is an After variable and it returns a string.
check box.
use from it.
User Guide.
Usage
Usage rule tAmazonEMRListInstances is usually used as a start

component of a Job or subJob.
Related scenario
173
tAmazonEMRManage
tAmazonEMRManage
Launches or terminates a cluster on Amazon EMR (Elastic MapReduce).
tAmazonEMRManage Standard properties

These properties are used to configure tAmazonEMRManage running in the Standard Job framework.
The Standard tAmazonEMRManage component belongs to the Cloud family.
Basic settings
settings.
credentials. The credentials can be used on Amazon EC2
instances or AWS ECS, and are delivered through the
Amazon EC2 metadata service. To use this option, your Job
must be running within Amazon EC2 or other services that

Action Select an action to be performed from the list, either Start

or Stop.
• Start: launch an Amazon EMR cluster.
• Stop: terminate an Amazon EMR cluster.
Cluster name Enter the name of the cluster.
Cluster version Select the version of the cluster.

You can also select the Customize Version and Application
check box on the Advanced settings view to customize the
cluster version information.
174
tAmazonEMRManage
This property is not available when the Customize Version

and Application check box is selected.
Application Select the applications to be installed on the cluster.

You can also select the Customize Version and Application
check box on the Advanced settings view to customize the
applications information.
This property is available when an EMR version is selected
from the Cluster version list and the Customize Version and
Application check box is cleared.
Service role Enter the IAM (Identity and Access Management) role for the
Amazon EMR service. The default role is EMR_DefaultRole.
To use this default role, you must have already created it.
Job flow role Enter the IAM role for the EC2 instances that Amazon EMR
manages. The default role is EMR_EC2_DefaultRole. To use
this default role, you must have already created it.
Enable log Select this check box to enable logging and in the field
displayed specify the path to a folder in an S3 bucket where
you want Amazon EMR to write the log data.
Use EC2 key pair Select this check box to associate an Amazon EC2 (Elastic
Compute Cloud) key pair with the cluster and in the field
displayed enter the name of your EC2 key pair.
Predicate Specify the cluster(s) that you want to stop:

• All running clusters: all running clusters will be
stopped.
• All running clusters with predefined name: the running
cluster with a given name will be stopped. In the
Cluster name field displayed, you need to specify the
name of the cluster to be stopped.
• Running cluster with predefined id: the running cluster
with a given ID will be stopped. In the Cluster id field
displayed, you need to specify the ID of the cluster to
be stopped.
This list is available only when Stop is selected from the
Action list.
Instance count Enter the number of Amazon EC2 instances to initialize.
Master instance type Select the type of the master instance to initialize.
Slave instance type Select the type of the slave instance to initialize.
Advanced settings
retrieved from.
175
tAmazonEMRManage
Wait for cluster ready Select this check box to let your Job wait until the launch of
the cluster is completed.
Visible to all users Select this check box to make the cluster visible to all IAM
users.
Termination Protect Select this check box to enable termination protection to

prevent instances in the cluster from shutting down due to
errors or issues during processing.
Enable debug Select this check box to enable the debug mode.
Customize Version and Application Select this check box to customize the version of the cluster
and the applications to be installed on the cluster.
• Cluster version: enter the version of the cluster.
• Applications: click the [+] button below the table
to add as many rows as needed, each row for an
application, and specify the application by clicking
the right side of the cell and selecting the application
from the drop-down list displayed, or just entering the
application name in the cell if it is not in the list.
Subnet id Specify the identifier of the Amazon VPC (Virtual Private

Cloud) subnet where you want the job flow to launch.
Availability Zone Specify the availability zone for your cluster's EC2 instances.
Master security group Specify the security group for the master instance.
Additional master security groups Specify additional security groups for the master instance
and separate them with a comma, for example, gname1,
gname2, gname3.
Slave security group Specify the security group for the slave instances.
Additional slave security groups Specify additional security groups for the slave instances
and separate them with a comma, for example, gname1,
gname2, gname3.
Service Access Security Group Specify the identifier of the Amazon EC2 security group for
the Amazon EMR service to access clusters in VPC private
subnet.
For how to create a private subnet to enable service access
security group on Amazon EMR, see Scenario 2: VPC with
Public and Private Subnets (NAT).
Actions Specify the bootstrap actions associated with the cluster, by

clicking the [+] button below the table to add as many rows
as needed, each row for a bootstrap action, and setting the
following parameters for each action:
• Name: enter the name of the bootstrap action.
• Script location: specify the location of the script run
by the bootstrap action, for example, s3://ap-northe
ast-1.elasticmapreduce/bootstrap-actions/run-if.
• Arguments: enter the list of command line arguments
(separated by commas) passed to the bootstrap action
script, for example, "arg0","arg1","arg2".
176
tAmazonEMRManage
For more information about the bootstrap actions, see

BootstrapActionConfig.
Steps Specify the job flow step(s) to be invoked on the cluster

after its launch, by clicking the [+] button below the table
to add as many rows as needed, each row for a step, and
setting the following parameters for each step:
• Name: enter the name of the job flow step.
• Action on Failure: click the cell and from the drop-
down list select the action to take if the job flow step
fails.
• Main Class: enter the name of the main class in the
specified Java file. If not specified, the JAR file should
specify a Main-Class in its manifest file.
• Jar: enter the path to the JAR file run during the step,
for example, "s3://inputjar/test.jar".
• Args: enter the list of command line arguments
(separated by commas) passed to the JAR file's main
function when executed, for example, "arg0","arg1",
"arg2".
For more information about the job flow steps, see
StepConfig.
Keep alive after steps complete Select this check box to keep the job flow alive after
completing all steps.
Wait for steps to complete Select this check box to let your Job wait until the job flow
steps are completed.
This check box is available only when the Wait for cluster
ready check box is selected.
Properties Specify the classification and property information supplied

to the configuration object of the EMR cluster to be created,
by clicking the [+] button below the table to add as many
rows as needed, each row for a property, and setting the
following parameters:
• Classification: specify the classification of the
configuration.
• Key: enter the key of the property.
• Value: enter the value of the property.
Global Variables
CLUSTER_FINAL_ID The ID of the cluster. This is an After variable and it returns

a string.
CLUSTER_FINAL_NAME The name of the cluster. This is an After variable and it

returns a string.
ERROR_MESSAGE The error message generated by the component when an

error occurs. This is an After variable and it returns a string.
177
tAmazonEMRManage
Usage
Usage rule tAmazonEMRManage is usually used as a standalone

component.
Managing an Amazon EMR cluster

Here's an example of using Talend components to manage an Amazon EMR cluster.
Creating an Amazon EMR cluster management Job

Create a Job to start a new Amazon EMR cluster, then resize the cluster, and finally list the ID and
name information of the instance groups in the cluster.
Procedure
1. Create a new Job and add a tAmazonEMRManage component, a tAmazonEMRResize component, a
tAmazonEMRListInstances component, and a tJava component by typing their names in the design
2. Link the tAmazonEMRManage component to the tAmazonEMRResize component using a Trigger >
OnSubjobOk connection.
3. Link the tAmazonEMRResize component to the tAmazonEMRListInstances component using a
Trigger > OnSubjobOk connection.
4. Link the tAmazonEMRListInstances component to the tJava component using a Row > Iterate
connection.
Starting a new Amazon EMR cluster

Configure the tAmazonEMRManage component to start a new Amazon EMR cluster.
Procedure
1. Double-click the tAmazonEMRManage component to open its Basic settings view.
178
tAmazonEMRManage
2. In the Access Key and Secret Key fields, enter the authentication credentials required to access
Amazon S3.
3. From the Action list, select Start to start a cluster.
4. Select the AWS region from the Region drop-down list. In this example, it is Asia Pacific (Tokyo).
5. In the Cluster name field, enter the name of the cluster to be started. In this example, it is talend-
doc-emr-cluster.
6. From the Cluster version and Application drop-down list, select the version of the cluster and the
application to be installed on the cluster.
7. Select the Enable log check box and in the field displayed, specify the path to a folder in an S3
bucket where you want Amazon EMR to write the log data. In this example, it is s3://talend-doc-
emr-bucket.
Resizing the Amazon EMR cluster by adding a new task instance group
Configure the tAmazonEMRResize component to resize a running Amazon EMR cluster by adding a
new task instance group.
Procedure
1. Double-click the tAmazonEMRResize component to open its Basic settings view.
179
tAmazonEMRManage
Amazon S3.
3. From the Action drop-down list, select Add task instance group to resize the cluster by adding a
new task instance group.
4. In the Cluster id field, enter the ID of the cluster to be resized. In this example, the returned value
of the global variable CLUSTER_FINAL_ID of the previous tAmazonEMRManage component is used.
Note that you can retrieve the global variable by pressing Ctrl + Space and selecting the relevant
global variable from the list.
5. In the Group name field, enter the name of the task instance group to be added in the cluster. In
this example, it is talend-doc-instance-group.
6. In the Instance count field, specify the number of the instances to be created.
7. From the Task instance type drop-down list, select the type of the instances to be created.
Listing the instance groups in the Amazon EMR cluster

Configure the tAmazonEMRListInstances component and the tJava component to retrieve and display
the ID and name information of all instance groups in a running cluster.
Procedure
1. Double-click the tAmazonEMRListInstances component to open its Basic settings view.
Amazon S3.
3. Select the AWS region from the Region drop-down list. In this example, it is Asia Pacific (Tokyo).
4. Clear the Filter master and core instances check box to list all instance groups, including the
Master, Core, and Task type instance groups.
5. In the Cluster id field, enter the ID of the cluster for which to list the instance groups. In
this example, the returned value of the global variable CLUSTER_FINAL_ID of the previous
tAmazonEMRManage component is used.
6. Double-click the tJava component to open its Basic settings view.
180
tAmazonEMRManage
7. In the Code field, enter the following code to print the ID and Name information of each instance
group in the cluster.
System.out.println("\r\n===== Instance Group =====");

System.out.println("Instance Group ID: " + (String)globalMap.get("tAmaz
onEMRListInstances_1_CURRENT_GROUP_ID"));
System.out.println("Instance Group Name: " + (String)globalMap.get("tAmaz
onEMRListInstances_1_CURRENT_GROUP_NAME"));
Executing the Job to manage the Amazon EMR cluster

After setting up the Job and configuring the components used in the Job for managing Amazon EMR
cluster, you can then execute the Job and verify the Job execution result.
Procedure
1. Press Ctrl + S to save the Job and then F6 to execute the Job.
As shown above, the Job starts and resizes the Amazon EMR cluster, and then lists all instance
groups in the cluster.
2. View the cluster details on the Amazon EMR Cluster List page to validate the Job execution result.
181
tAmazonEMRResize
tAmazonEMRResize
Adds or resizes a task instance group in a cluster on Amazon EMR (Elastic MapReduce).
tAmazonEMRResize Standard properties

These properties are used to configure tAmazonEMRResize running in the Standard Job framework.
The Standard tAmazonEMRResize component belongs to the Cloud family.
Basic settings
settings.

Action Select an action to be performed from the drop-down list.

• Add task instance group: add a task instance group in a
cluster.
• Resize task instance group: resize a task instance group
in a cluster.
Cluster id Enter the ID of the cluster to be resized.
Group name Enter the name of the task instance group to be added.
This field is available only when Add task instance group is
selected from the Action drop-down list.
182
tAmazonEMRResize
Group id Enter the ID of the task instance group to be resized.

This field is available only when Resize task instance group
is selected from the Action drop-down list.
Instance count Enter the number of instances for the task instance group.
Task instance type Select an instance type for all instances in the task instance
group to be added from the drop-down list.
This list is available only when Add task instance group is
selected from the Action drop-down list.
Request spot Select this check box to launch Spot instances, and in the
Bid price($) field displayed, enter the maximum hourly rate
(in dollars) you are willing to pay per instance.
This check box is available only when Add task instance
group is selected from the Action drop-down list.
Advanced settings
retrieved from.
Global Variables
Global Variables TASK_GROUP_ID: the ID of the task instance group. This is

an After variable and it returns a string.
TASK_GROUP_NAME: the name of the task instance group.
check box.
use from it.
User Guide.
Usage
Usage rule tAmazonEMRResize is usually used as a standalone

component.
183
tAmazonEMRResize
Related scenario
184
tAmazonMysqlClose
tAmazonMysqlClose
Closes the transaction committed in the connected DB.
tAmazonMysqlClose Standard properties

These properties are used to configure tAmazonMysqlClose running in the Standard Job framework.
The Standard tAmazonMysqlClose component belongs to the Cloud and the Databases families.
Basic settings
Component list Select the tAmazonMysqlConnection component in the list

if more than one connection are planned for the current Job.
Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is to be used along with AmazonMysql

components, especially with tAmazonMysqlConnection and
tAmazonMysqlCommit.
185
tAmazonMysqlClose

User Guide.
Related scenarios
186
tAmazonMysqlCommit
tAmazonMysqlCommit
tAmazonMysqlCommit validates the data processed through the Job into the connected database.
tAmazonMysqlCommit Standard properties

These properties are used to configure tAmazonMysqlCommit running in the Standard Job framework.
The Standard tAmazonMysqlCommit component belongs to the Cloud and the Databases families.
Basic settings

if more than one connection are planned for the current job.
Warning:
tAmazonMysqlCommit to your Job, your data will be
connection check box or your connection will be closed
Advanced settings
level.
Global Variables

check box.
187
tAmazonMysqlCommit

use from it.
User Guide.
Usage

tAmazonMysql* components, especially with the
tAmazonMysqlConnection and tAmazonMysqlRollback
components.
User Guide.
Related scenario
For tAmazonMysqlCommit related scenario, see Inserting data in mother/daughter tables on page
2426.
188
tAmazonMysqlConnection
subjobs.
tAmazonMysqlConnection opens a connection to the database for a current transaction.
tAmazonMysqlConnection Standard properties

These properties are used to configure tAmazonMysqlConnection running in the Standard Job
framework.
The Standard tAmazonMysqlConnection component belongs to the Cloud and the Databases families.
Basic settings

DB Version MySQL 5 is available.
Host Database server IP address.
Port Listening port number of DB server.


settings.
189

Advanced settings
commit component.
component.
level.
Global Variables

check box.
use from it.
User Guide.
Usage

tAmazonMysqlCommit and tAmazonMysqlRollback
components.
190
Related scenario
For a related scenario using this component, see Inserting data in mother/daughter tables on page
2426
191
tAmazonMysqlInput
tAmazonMysqlInput
tAmazonMysqlInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Row > Main link.
tAmazonMysqlInput Standard properties

These properties are used to configure tAmazonMysqlInput running in the Standard Job framework.
The Standard tAmazonMysqlInput component belongs to the Cloud and the Databases families.
Basic settings

connection.
Guide.
192
tAmazonMysqlInput

settings.
component. The schema is either Built-in or stored
remotely in the Repository.

Guide.

Studio User Guide.
Table Name Name of the table to be read.
definition.
Advanced settings

Note:
When you need to handle data of the time-stamp type
0000-00-00 00:00:00 using this component, set the
parameter as:
noDatetimeStringSync=true&zeroDa-
teTimeBehavior=convertToNull.
Enable stream Select this check box to enables streaming over buffering
which allows the code to read from a large table without
consuming a large amount of memory in order to optimize
the performance.

columns.
Note: Deselect Trim all the String/Char columns to

enable Trim columns in this field.
193
tAmazonMysqlInput
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component covers all possible SQL queries for Mysql
databases.
unusable.
User Guide.
Related scenarios
For related scenarios, see tMysqlInput on page 2437.
194
tAmazonMysqlOutput
tAmazonMysqlOutput
tAmazonMysqlOutput executes the action defined on the table and/or on the data contained in the
tAmazonMysqlOutput Standard properties

These properties are used to configure tAmazonMysqlOutput running in the Standard Job framework.
The Standard tAmazonMysqlOutput component belongs to the Cloud and the Databases families.
Basic settings

connection.
Guide.
195
tAmazonMysqlOutput

settings.
operations:
again.
exist.
Truncate table: The table content is quickly deleted.
However, you will not be able to rollback the operation.
the job stops.
be inserted.
Replace: Add new entries to the table. If an old row in the
table has the same value as a new row for a PRIMARY KEY
or a UNIQUE index, the old row is deleted before the new
row is inserted.
Insert or update on duplicate key or unique index: Add
entries if the inserted value does not exist or update entries
if the inserted value already exists and there is a risk of
violating a unique index or primary key.
Insert Ignore: Add only new rows to prevent duplicate key
errors.
196
tAmazonMysqlOutput
Warning:
Schema and Edit schema A schema is a row description, i.e. it defines the number
component only.

Job designs.
alend.com).
rows. If needed, you can retrieve the rows in error via a Row
> Rejects link.
Advanced settings

Note:
global variables.
Extend Insert Select this check box to carry out a bulk insert of a defined
set of lines instead of inserting lines one by one. The gain in
system performance is considerable.
Number of rows per insert: enter the number of rows to

be inserted per operation. Note that the higher the value
specidied, the lower performance levels shall be due to the
increase in memory demands.
197
tAmazonMysqlOutput
Note:
This option is not compatible with the Reject link. You
should therefore clear the check box if you are using a
Row > Rejects link with this component.
Warning:
If you are using this component with tMysqlLastInsertID, en
sure that the Extend Insert check box in Advanced Settings
is not selected. Extend Insert allows for batch loading,
however, if the check box is selected, only the ID of the last
line of the last batch will be returned.
processing.
Note:
This check box is available only when you have selected,
the Update or the Delete option in the Action on data
field.

batch.
is selected.
Commit every Number of rows to be included in the batch before it is

committed to the DB. This option ensures transaction
quality (but not rollback) and, above all, a higher
performance level.
Additional Columns This option is not available if you have just created the DB
table (even if you delete it beforehand). This option allows
you to call SQL functions to perform actions on columns,
provided that these are not insert, update or delete actions,
or actions that require pre-processing.

altered or inserted.

in order to alter or insert the data in the corrsponding
column.
Position: Select Before, Replace or After, depending on the

Reference column: Type in a reference column that tAma

zonMysqlOutput can use to locate or replace the new
column, or the column to be modified.
Use field options Select this check box to customize a request, particularly if
multiple actions are being carried out on the data.
Use Hint Options Select this check box to activate the hint configuration area
which helps you optimize a query's execution. In this area,
parameters are:
198
tAmazonMysqlOutput
- HINT: specify the hint you need, using the syntax
/*+ */.
- POSITION: specify where you put the hint in a SQL

statement.
- SQL STMT: select the SQL statement you need to use.
Use duplicate key update mode insert Updates the values of the columns specified, in the event of
duplicate primary keys.:
Column: Between double quotation marks, enter the name
of the column to be updated.
Value: Enter the action you want to carry out on the column.
Note:
To use this option you must first of all select the Insert
mode in the Action on data list found in the Basic
Settings view.
level.
Global Variables

QUERY: the query statement processed. This is an After
check box.
use from it.
User Guide.
199
tAmazonMysqlOutput
Usage
a table in a MySQL database. It also allows you to create a
For an example of tAmazonMysqlOutput in use, see .
unusable.
User Guide.
Related scenarios
For related scenarios, see tMysqlSCD on page 2508.
200
tAmazonMysqlRollback
involuntarily.
tAmazonMysqlRollback Standard properties

These properties are used to configure tAmazonMysqlRollback running in the Standard Job
framework.
The Standard tAmazonMysqlRollback component belongs to the Cloud and the Databases families.
Basic settings

Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
201
Usage

tAmazonMysqlConnection and tAmazonMysqlCommit
components.
User Guide.
Related scenario
For a related scenario, see Rollback from inserting data in mother/daughter tables on page 2429.
202
tAmazonMysqlRow
tAmazonMysqlRow
Depending on the nature of the query and the database, tAmazonMysqlRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write easily
your SQL statements. tAmazonMysqlRow is the specific component for this database query. The row
suffix means the component implements a flow in the job design although it does not provide output.
tAmazonMysqlRow Standard properties

These properties are used to configure tAmazonMysqlRow running in the Standard Job framework.
The Standard tAmazonMysqlRow component belongs to the Cloud and the Databases families.
Basic settings

connection.
Guide.
203
tAmazonMysqlRow
Host Database server IP address

settings.
Schema and Edit Schema A schema is a row description, that is to say, it defines the
number of fields to be processed and passed on to the next
in the Repository.
component only.

Job designs.
alend.com).
Table Name Name of the table to be processed.
Query type Either Built-in or Repository.


Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.

definition.
Row > Rejects link.
Advanced settings

204
tAmazonMysqlRow

Propagate QUERY's recordset Select this check box to insert the result of the query in a
COLUMN of the current flow. Select this column from the
use column list.
Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component is
usually followed by tParseRecordSet.
tab.
instruction.
Note:
increased

level.
Global Variables
check box.
use from it.
User Guide.
205
tAmazonMysqlRow
Usage
unusable.
User Guide.
Related scenario
For a related scenario, see:
• Combining two flows for selective output on page 2503
206
tAmazonOracleClose
tAmazonOracleClose
Closes the transaction committed in the connected database.
tAmazonOracleClose Standard properties

These properties are used to configure tAmazonOracleClose running in the Standard Job framework.
The Standard tAmazonOracleClose component belongs to the Cloud and the Databases families.
Basic settings
Component list Select the tAmazonOracleConnection component in the list

if more than one connection are planned for the current Job.
Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is to be used along with AmazonOracle

components, especially with tAmazonOracleConnection and
tAmazonOracleCommit.
207
tAmazonOracleClose

User Guide.
Related scenario
This component is to be used with tAmazonOracleConnection and tAmazonOracleRollback
components. It is generally used with a tAmazonOracleConnection to close a connection for the
ongoing transaction.
For a related scenario, see tMysqlConnection on page 2425.
208
tAmazonOracleCommit
tAmazonOracleCommit
tAmazonOracleCommit validates the data processed through the Job into the connected database.
tAmazonOracleCommit Standard properties

These properties are used to configure tAmazonOracleCommit running in the Standard Job framework.
The Standard tAmazonOracleCommit component belongs to the Cloud and the Databases families.
Basic settings

Warning:
tAmazonOracleCommit to your Job, your data will be
Advanced settings
level.
Global Variables

check box.
209
tAmazonOracleCommit

use from it.
User Guide.
Usage

tAmazonOracle* components, especially with the
tAmazonOracleConnection and tAmazonOracleRollback
components.
User Guide.
Related scenario
For tAmazonOracleCommit related scenario, see Inserting data in mother/daughter tables on page
2426
210
tAmazonOracleConnection
subjobs.
tAmazonOracleConnection opens a connection to the database for a current transaction.
tAmazonOracleConnection Standard properties

These properties are used to configure tAmazonOracleConnection running in the Standard Job
framework.
The Standard tAmazonOracleConnection component belongs to the Cloud and the Databases families.
Basic settings

Connection type Drop-down list of available drivers:

Oracle SID: Select this connection type to uniquely identify
a particular database on a system.
DB Version Oracle 11-5 is available.
Use tns file Select this check box to use the metadata of a context
included in a tns file.
Note:
One tns file may have many contexts.
TNS File: Enter the path to the tns file manually or browse
to the file by clicking the three-dot button next to the filed.
Select a DB Connection in Tns File: Click the three-dot
button to display all the contexts held in the tns file and
select the desired one.
211
Schema Name of the schema.

settings.

Note:
You can set the encoding parameters through this field.
Advanced settings
commit component.
component.
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.
Global Variables

212

check box.
use from it.
User Guide.
Usage

tAmazonOracleCommit and tAmazonOracleRollback
components.
Related scenario
For tAmazonOracleConnection related scenario, see tMysqlConnection on page 2425
213
tAmazonOracleInput
tAmazonOracleInput
tAmazonOracleInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Row > Main link.
tAmazonOracleInput Standard properties

These properties are used to configure tAmazonOracleInput running in the Standard Job framework.
The Standard tAmazonOracleInput component belongs to the Cloud and the Databases families.
Basic settings


DB Version Select the Oracle version in use.
214
tAmazonOracleInput
connection.
Guide.
Oracle schema Oracle schema name.

settings.
Schema and Edit Schema A schema is a row description, i.e. it defines the number

Guide.

Studio User Guide.
Table name Database table name.
definition.
Advanced settings
level.
Use cursor When selected, helps to decide the row set to work with at a
time and thus optimize performance.
215
tAmazonOracleInput

columns.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component covers all possible SQL queries for Oracle
databases.
unusable.
User Guide.

216
tAmazonOracleInput

Related scenarios
For related scenarios, see:
• Reading data from different MySQL databases using dynamically loaded connection parameters
on page 497.
217
tAmazonOracleOutput
tAmazonOracleOutput
tAmazonOracleOutput executes the action defined on the table and/or on the data contained in the
tAmazonOracleOutput Standard properties

These properties are used to configure tAmazonOracleOutput running in the Standard Job framework.
The Standard tAmazonOracleOutput component belongs to the Cloud and the Databases families.
Basic settings

connection.
Guide.

218
tAmazonOracleOutput
DB Version Select the Oracle version in use.

settings.
Oracle schema Name of the Oracle schema.
can be written at a time.
operations:
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
Clear a table: The table content is deleted.
Warning:
If you select the Use an existing connection check box
and select an option other than None from the Action
on table list, a commit statement will be generated
automatically before the data update/insert/delete
operation.
job stops.
Update: Make changes to existing entries
be inserted.
219
tAmazonOracleOutput
Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit schema A schema is a row description, i.e. it defines the number
component only.

Job designs.
alend.com).
Row > Rejects link.
Advanced settings
Use alternate schema Select this option to use a schema other than the one
specified by the component that establishes the database
connection (that is, the component selected from the
Component list drop-down list in Basic settings view).
After selecting this option, provide the name of the desired
schema in the Schema field.
This option is available when Use an existing connection is
selected in Basic settings view.

220
tAmazonOracleOutput
Note:
global variables.
Override any existing NLS_LANG environment variable Select this check box to override variables already set for a
NLS language environment.
Commit every Enter the number of rows to be completed before

committing batches of rows together into the DB. This
option ensures transaction quality (but not rollback) and,
above all, better performance at execution.
level.

altered or inserted as new column.



column.
Use Hint Options Select this check box to activate the hint configuration area
which helps you optimize a query's execution. In this area,
parameters are:
- HINT: specify the hint you need, using the syntax
/*+ */.

statement.
Convert columns and table to uppercase Select this check box to set the names of columns and table
in upper case.
processing.

batch.
221
tAmazonOracleOutput
is selected.
Support null in "SQL WHERE" statement Select this check box to validate null in "SQL WHERE"
statement.
Global Variables

check box.
use from it.
User Guide.
Usage
a table in a Oracle database. It also allows you to create a
For such an example, see Retrieving data in error with a
Reject link on page 2474.
222
tAmazonOracleOutput

unusable.
User Guide.

Related scenarios
For tAmazonOracleOutput related topics, see:
223
tAmazonOracleRollback
involuntarily.
tAmazonOracleRollback Standard properties

These properties are used to configure tAmazonOracleRollback running in the Standard Job
framework.
The Standard tAmazonOracleRollback component belongs to the Cloud and the Databases families.
Basic settings

Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
224
Usage

tAmazonOracleConnection and tAmazonOracleCommit
components.
User Guide.
Related scenario
For tAmazonOracleRollback related scenario, see tMysqlRollback on page 2491.
225
tAmazonOracleRow
tAmazonOracleRow
Depending on the nature of the query and the database, tAmazonOracleRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write easily
your SQL statements. tAmazonOracleRow is the specific component for this database query. The row
suffix means the component implements a flow in the job design although it does not provide output.
tAmazonOracleRow Standard properties

These properties are used to configure tAmazonOracleRow running in the Standard Job framework.
The Standard tAmazonOracleRow component belongs to the Cloud and the Databases families.
Basic settings

connection.
Guide.
Connection type Drop-down list of available drivers.
226
tAmazonOracleRow

settings.
Schema and Edit Schema A schema is a row description, i.e. it defines the number

Guide.

Studio User Guide.
Query type Either Built-in or Repository .



definition.
Use NB_LINE_ This option allows you feed the variable with the number
of rows inserted/updated/deleted to the next component or
subJob. This field only applies if the query entered in Query
field is a INSERT, UPDATE or DELETE query.
• NONE: does not feed the variable.
• INSERTED: feeds the variable with the number of rows
inserted.
• UPDATED: feeds the variable with the number of rows
updated.
• DELETED: feeds the variable with the number of rows
deleted.
Row > Rejects link.
227
tAmazonOracleRow
Advanced settings
use column list.
Note:
should be set to the type of Object and this component
is usually followed by tParseRecordSet.
tab.
instruction.
Note:
increased

level.
Global Variables
check box.
228
tAmazonOracleRow

use from it.
User Guide.
Usage
unusable.
User Guide.
Related scenarios
• Procedure on page 622.
229
tAmazonRedshiftManage
Manages Amazon Redshift clusters and snapshots.
tAmazonRedshiftManage manages the work of creating a new Amazon Redshift cluster, creating a
snapshot of an Amazon Redshift cluster, resizing an existing Amazon Redshift cluster, and deleting an
existing cluster or snapshot.
tAmazonRedshiftManage Standard properties

These properties are used to configure tAmazonRedshiftManage running in the Standard Job
framework.
The Standard tAmazonRedshiftManage component belongs to the Cloud and the Databases families.
Basic settings
Access Key and Secret Key Specify the access keys (the access key ID in the Access
settings.

Action Select an action to be performed from the list.

• Create cluster: create a new Amazon Redshift cluster.
• Delete cluster: delete a previously provisioned Amazon
Redshift cluster.
• Resize cluster: resize an existing Amazon Redshift
cluster.
• Restore from snapshot: create a new Amazon Redshift
cluster from a snapshot.
• Delete snapshot: delete the specified manual snapshot.
230
(e.g. "us-east-1") in the list. For more information about the

supported AWS regions where you can provision an Amazon
Redshift cluster, see Regions and Endpoints.
Create snapshot Select this check box to create a final snapshot of the
Amazon Redshift cluster before it is deleted.
This check box is available only when Delete cluster is
selected from the Action list.
Snapshot id Enter the identifier of the snapshot.

This field is available when:
• Delete cluster is selected from the Action list and the
Create snapshot check box is selected.
• Restore from snapshot or Delete snapshot is selected
from the Action list.
Cluster id Enter the ID of the cluster.

This field is available when Create cluster, Delete cluster,
Resize cluster, or Restore from snapshot is selected from the
Action list.
Database Enter the name of the first database to be created when the
cluster is created.
This field is available only when Create cluster is selected
from the Action list.
Port Enter the port number on which the cluster accepts

connections.
This field is available when Create cluster or Restore from
snapshot is selected from the Action list.
Master username and Master password The user name and the password associated with the master
user account for the cluster to be created.
settings.
The two fields are available only when Create cluster is
selected from the Action list.
Node type Select the node type for the cluster.

This list is available when Create cluster, Resize cluster, or
Restore from snapshot is selected from the Action list.
Node count Enter the number of compute nodes in the cluster.

This field is available only when Create cluster or Resize
cluster is selected from the Action list.
Advanced settings
retrieved from.
231

Wait for cluster ready Select this check box to let your Job wait until the launch of
the cluster is completed.
This check box is available when Create cluster or Restore
from snapshot is selected from the Action list.
Original cluster id of snapshot Enter the name of the cluster the source snapshot was
created from.
This field is available when Restore from snapshot or Delete
Parameter group name Enter the name of the parameter group to be associated
with the cluster.
Subnet group name Enter the name of the subnet group where you want the
cluster to be restored.
Publicly accessible Select this check box so that the cluster can be accessed
from a public network.
This check box is available when Create cluster or Restore
from snapshot is selected from the Action list.
Set public ip address Select this check box and in the field displayed enter the
Elastic IP (EIP) address for the cluster.
This check box is available only when the Publicly
accessible check box is selected.
Availability zone Enter the EC2 Availability Zone in which you want Amazon
Redshift to provision the cluster.
VPC security group ids Enter Virtual Private Cloud (VPC) security groups to be
associated with the cluster and separate them with a
comma, for example, gname1, gname2, gname3.
Global Variables
Global Variables CLUSTER_FINAL_ID: the ID of the cluster. This is an After

ENDPOINT: the endpoint address of the cluster. This is an
After variable and it returns a string.
232

check box.
use from it.
User Guide.
Usage
Usage rule tAmazonRedshiftManage is usually used as a standalone

component.
Related scenario
233
tApacheLogInput
tApacheLogInput
Reads the access-log file for an Apache HTTP server.
To effectively manage the Apache HTTP Server, it is necessary to get feedback about the activity and
performance of the server as well as any problems that may be occurring.
tApacheLogInput Standard properties

These properties are used to configure tApacheLogInput running in the Standard Job framework.
The Standard tApacheLogInput component belongs to the File family.
Basic settings
Repository: Select the repository file where the

in the Repository.
In the context of tApacheLogInput usage, the schema is rea
d-only.
Built-in: You can create the schema and store it locally

for this component. Related topic: see Talend Studio User
Guide.
Repository: You have already created and stored the

schema in the Repository. You can reuse it in various
projects and Job flowcharts. Related topic: see Talend Studio
User Guide.
File Name Name of the file and/or the variable to be processed.

Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.
Advanced settings
Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.
234
tApacheLogInput
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tApacheLogInput can be used with other components or as

a standalone component. It allows you to create a data flow
using a Row > Main connection, or to create a reject flow to
filter specified data using a Row > Reject connection. For an
example of how to use these two links, see Procedure on
page 975.
Reading an Apache access-log file

The following scenario creates a two-component Job, which aims at reading the access-log file for an
Apache HTTP server and displaying the output in the Run log console.
Procedure
Procedure
1. Drop a tApacheLogInput component and a tLogRow component from the Palette onto the design
workspace.
2. Right-click on the tApacheLogInput component and connect it to the tLogRow component using
a Main Row link.
3. In the design workspace, select tApacheLogInput.
235
tApacheLogInput
4. Click the Component tab to define the basic settings for tApacheLogInput.
5. If desired, click the Edit schema button to see the read-only columns.
6. In the File Name field, enter the file path or browse to the access-log file you want to read.
7. In the design workspace, select tLogRow and click the Component tab to define its basic settings.
For more information, see tLogRow on page 1977
Results
The log lines of the defined file are displayed on the console.
236
tAS400Close
tAS400Close
Closes the transaction committed in the connected database.
tAS400Close Standard properties

These properties are used to configure tAS400Close running in the Standard Job framework.
The Standard tAS400Close component belongs to the Databases family.
Basic settings
Component list Select the tAS400Connection component in the list if more

Advanced settings
level.
Usage
Usage rule This component is to be used along with AS/400

components, especially with tAS400Connection and
tAS400Commit.
User Guide.
237
tAS400Close
Related scenario
238
tAS400Commit
tAS400Commit
tAS400Commit validates the data processed through the Job into the connected database.
tAS400Commit Standard properties

These properties are used to configure tAS400Commit running in the Standard Job framework.
The Standard tAS400Commit component belongs to the Databases family.
Basic settings

Warning:
tAS400Commit to your Job, your data will be committed
Advanced settings
level.
Usage
Usage rule This component is more commonly used with other tAS400*
components, especially with the tAS400Connection and
tAS400Rollback components.
239
tAS400Commit

User Guide.
Related scenario
For a similar scenario using other database, see Inserting data in mother/daughter tables on page
2426.
240
tAS400Connection
tAS400Connection
subjobs.
tAS400Connection opens a connection to the database for a current transaction.
tAS400Connection Standard properties

These properties are used to configure tAS400Connection running in the Standard Job framework.
The Standard tAS400Connection component belongs to the Databases and the ELT families.
Basic settings

DB Version Select the AS/400 version in use

settings.
241
tAS400Connection

Advanced settings

commit component.
component.
Usage
components, especially with the tAS400Commit and
tAS400Rollback components.
Related scenario
For similar scenarios using other database, see tMysqlConnection on page 2425.
242
tAS400Input
tAS400Input
tAS400Input executes a DB query with a strictly defined order which must correspond to the schema
definition. Then it passes on the field list to the next component via a Row > Main link.
tAS400Input Standard properties

These properties are used to configure tAS400Input running in the Standard Job framework.
The Standard tAS400Input component belongs to the Databases family.
Basic settings
connection.
Guide.


243
tAS400Input

DB Version Select the AS 400 version in use

settings.

Guide.

Studio User Guide.

available:
only.
definition.
Advanced settings

244
tAS400Input

columns.
level.
Usage
unusable.
User Guide.
Handling data with AS/400

This scenario describes a Job that writes the user information into AS/400, and then reads the
information in AS/400 and displays it on the console.
245
tAS400Input

Procedure
1. Create a new Job and add a tFixedFlowInput component, a tAS400Output component, a
tAS400Input component, and a tLogRow component by typing their names in the design
2. Connect tFixedFlowInput to tAS400Output using a Row > Main connection.
3. Do the same to connect tAS400Input to tLogRow.
4. Connect tFixedFlowInput to tAS400Input using a Trigger > OnSubjobOk connection.

Writing the data into AS/400
Procedure
2. Click the [...] button next to Edit schema and in the Schema dialog box define the schema by
adding three columns: id of Integer type, and name and city of String type.
246
tAS400Input
Click OK to close the Schema dialog box and accept the propagation prompted by the pop-up
dialog box.
3. In the Mode area, select Use Inline Content (delimited file) and enter the following user
information in the Content field.
1;George;Bismarck
2;Abraham;Boise
3;Taylor;Nashville
4;William;Jefferson City
5;Alexander;Jackson
6;James;Boise
7;Gerald;Little Rock
8;Tony;Richmond
9;Thomas;Springfield
10;Andre;Nashville
4. Double-click tAS400Output to open its Basic settings view.
5. In the Host, Database, Username and Password fields, enter the information required for the
connection to AS/400.
6. In the Table field, specify the table into which you want to write the data. In this example, it is
doct1018.
7. Select Drop table if exists and create from the Action on table drop-down list, and select Insert
from the Action on data drop-down list.
247
tAS400Input
Retrieving the data from AS/400
Procedure
1. Double-click tAS400Input to open its Basic settings view.
2. In the Host, Database, Username and Password fields, enter the information required for the
connection to AS/400.
adding three columns: id of Integer type, and name and city of String type. The data structure is
same as the structure you have defined for tFixedFlowInput.
4. In the Table Name field, enter or browse to the table into which you write the data. In this
example, it is doct1018.
5. In the Query field, enter the SQL query sentence to be used to retrieve the user data from AS/400.
In this example, it is SELECT * FROM doct1018.

Procedure
248
tAS400Input
As shown above, the user information is written into AS/400, and then the data is retrieved from
AS/400 and displayed on the console.
Related scenarios
For similar scenarios using other databases, see:
Related topic in tContextLoad, see Reading data from different MySQL databases using dynamically
loaded connection parameters on page 497.
249
tAS400LastInsertId
tAS400LastInsertId
Obtains the primary key value of the record that was last inserted in an AS/400 table.
tAS400LastInsertId fetches the last inserted ID from a selected AS/400 Connection.
tAS400LastInsertId Standard properties

These properties are used to configure tAS400LastInsertId running in the Standard Job framework.
The Standard tAS400LastInsertId component belongs to the Databases family.
Basic settings
Guide.
Repository: You have already created the schema and

stored it in the Repository. You can reuse it in various
projects and job flow charts. Related topic: see Talend
Studio User Guide.

available:
only.
Component list Select the relevant tAS400Connection component in the list

if more than one connection is planned for the current job.
250
tAS400LastInsertId
Advanced settings
level.
Usage
Usage rule This component is to be used as an intermediary

component.
User Guide.
Related scenario
For a similar scenario using other database, see Getting the ID for the last inserted record with
tMysqlLastInsertId on page 2455.
251
tAS400Output
tAS400Output
tAS400Output executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
tAS400Output Standard properties

These properties are used to configure tAS400Output running in the Standard Job framework.
The Standard tAS400Output component belongs to the Databases family.
Basic settings


252
tAS400Output
connection.
Guide.

settings.
operations:
again.
exist.

Action on data
component only.

Job designs.
253
tAS400Output

alend.com).

available:
only.
Row > Rejects link.
Advanced settings
Use commit control Select this check box to have access to the Commit every
field where you can define the commit operation.
Commit every: Enter the number of rows to be completed

before committing batches of rows together into the DB.
This option ensures transaction quality (but not rollback)
and, above all, better performance at execution.

Note:
global variables.


254
tAS400Output


column.
processing.
Note:
This check box is available only when you have selected
the Insert, Update or Delete option in the Action on data
field.

batch.
is selected.
level.
Usage
a table in a AS/400 database. It also allows you to create a
For an example of tMySqlOutput in use, see Retrieving data
unusable.
255
tAS400Output

User Guide.
Related scenarios
For related scenario, see Handling data with AS/400 on page 245.
256
tAS400Rollback
tAS400Rollback
involuntarily.
tAS400Rollback Standard properties

These properties are used to configure tAS400Rollback running in the Standard Job framework.
The Standard tAS400Rollback component belongs to the Databases family.
Basic settings

Advanced settings
level.
Usage
components, especially with the tAS400Connection and
tAS400Commit components.
257
tAS400Rollback

User Guide.
Related scenarios
For a similar scenario using other database, see Rollback from inserting data in mother/daughter
tables on page 2429.
258
tAS400Row
tAS400Row
Depending on the nature of the query and the database, tAS400Row acts on the actual DB structure
statements. tAS400Row is the specific component for this database query. The row suffix means the
tAS400Row Standard properties

These properties are used to configure tAS400Row running in the Standard Job framework.
The Standard tAS400Row component belongs to the Databases family.
Basic settings
connection.
Guide.

259
tAS400Row

settings.

Guide.

Studio User Guide.

available:
only.



definition.
free rows. If needed, you can retrieve the rows on error via
a Row > Rejects link.
260
tAS400Row
Advanced settings
Additional JDBC Parameters Specify additional connection properties for the DB

use column list.
Note:
tab.
instruction.
Note:
increased

level.
Usage
261
tAS400Row

unusable.
User Guide.
Related scenarios
• Combining two flows for selective output on page 2503.
262
tAssert
tAssert
Generates the boolean evaluation on the concern for the Job execution status and provides the Job
status messages to tAssertCatcher.
The status includes:
• Ok: the Job execution succeeds.
• Fail: the Job execution fails.
The tested Job's result does not match the expectation or an execution error occurred at runtime.
The tAssert component works alongside tAssertCatcher to evaluate the status of a Job execution. It
concludes with the boolean result based on an assertive statement related to the execution and feed
the result to tAssertCatcher for proper Job status presentation.
tAssert Standard properties

These properties are used to configure tAssert running in the Standard Job framework.
The Standard tAssert component belongs to the Logs & Errors family.
Basic settings
Description Type in your descriptive message to help identify the

assertion of a tAssert.
Expression Type in the assertive statement you base the evaluation on.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component follows the action the assertive condition
is directly related to. It can be the intermediate or end
component of the main Job, or the start, intermediate or end
component of the secondary Job.
263
tAssert
Limitation The evaluation of tAssert is captured only by tAssertCatcher.
Viewing product orders status (on a daily basis) against a

benchmark number
This scenario allows you to insert the orders information into a database table and to evaluate the
orders status (every day once scheduled to run) by using tAssert to compare the orders against a fixed
number and tAssertCatcher to indicate the results. In this case, Ok is returned if the number of orders
is greater than 20 and Failed is returned if the number of orders is less than 20.
In practice, this Job can be scheduled to run every day for the daily orders report and tFixedFlowInput
as well as tLogRow are replaced by input and output components in the Database/File families.
Linking the components

Procedure
1. Drop tFixedFlowInput, tMysqlOutput, tAssert, tAssertCatcher, and tLogRow onto the workspace.
2. Rename tFixedFlowInput as orders, tAssert as orders >=20, tAssertCatcher as catch comparison
result and tLogRow as ok or failed.
3. Link tFixedFlowInput to tMysqlOutput using a Row > Main connection.
4. Link tFixedFlowInput to tAssert using the Trigger > On Subjob OK connection.
5. Link tAssertCatcher to tLogRow using a Row > Main connection.

Procedure
264
tAssert
Select Use Inline Content (delimited file) in the Mode area.

In the Content field, enter the data to write to the Mysql database, for example:
AS2152;Washingto Berry Juice;2013-02-19 11:14:15;3.6

Note that the orders listed are just for illustration of how tAssert functions and the number here is
less than 20.
2. Click the Edit schema button to open the schema editor.
265
tAssert
3. Click the [+] button to add four columns, namely product_id, product_name, date and price, of the
String, Date, Float types respectively.
Click OK to validate the setup and close the editor.
4. Double-click tMysqlOutput to display the Basic settings view.
5. In the Host, Port, Database, Username and Password fields, enter the connection details and the
authentication credentials.
6. In the Table field, enter the name of the table, for example order.
7. In the Action on table list, select the option Drop table if exists and create.
8. In the Action on data list, select the option Insert.
9. Double-click tAssert to display the Basic settings view.
266
tAssert
10. In the description field, enter the descriptive information for the purpose of tAssert in this case.
11. In the expression field, enter the expression allowing you to compare the data to a fixed number:
((Integer)globalMap.get("tMysqlOutput_1_NB_LINE_INSERTED"))>=20
12. Double-click tLogRow to display the Basic settings view.
13. In the Mode area, select Table (print values in cells of a table) for a better display.
Executing the Job

Procedure
2. Press F6 to run the Job.
As shown above, the orders status indicates Failed as the number of orders is less than 20.
Setting up the assertive condition for a Job execution

This scenario describes how to set up an assertive condition in tAssert in order to evaluate that a Job
execution succeeds or not. Moreover, you can also find out how the two different evaluation results
display and the way to read them. Apart from tAssert, the scenario uses the following components as
well:
• tFileInputDelimited and tFileOutputDelimited. The two components compose the main Job of
which the execution status is evaluated. For the detailed information on the two components, see
tFileInputDelimited on page 1015 and tFileOutputDelimited on page 1113.
• tFileCompare. It realizes the comparison between the output file of the main Job and a standard
reference file. The comparative result is evaluated by tAssert against the assertive condition set
267
tAssert
up in its settings. For more detailed information on tFileCompare, see tFileCompare on page
984.
• tAssertCatcher. It captures the evaluation generated by tAssert. For more information on
tAssertCatcher, see tAssertCatcher on page 273.
• tLogRow. It allows you to read the captured evaluation. For more information on tLogRow, see
tLogRow on page 1977.
First proceed as follows to design the main Job:
• Prepare a delimited .csv file as the source file read by your main Job.
• Edit two rows in the delimited file. The contents you edit are not important, so feel free to
simplify them.
• Name it source.csv.
• In Talend Studio , create a new job JobAssertion.
• Place tFileInputDelimited and tFileOutputDelimited on the workspace.
• Connect them with a Row Main link to create the main Job.
• Double-click tFileInputDelimited to open its Component view.

• In the File Name field of the Component view, fill in the path or browse to source.csv.
• Still in the Component view, set Property Type to Built-In and click next to Edit schema to
define the data to pass on to tFileOutputDelimited. In the scenario, define the data presented in
source.csv you created.
For more information about schema types, see Talend Studio User Guide.
• Define the other parameters in the corresponding fields according to source.csv you created.
• Double-click tFileOutputDelimited to open its Component view.
• In the File Name field of the Component view, fill in or browse to specify the path to the output
file, leaving the other fields as they are by default.
268
tAssert
• Press F6 to execute the main Job. It reads source.csv, pass the data to tFileOutputDelimited and
output an delimited file, out.csv.
Then contine to edit the Job to see how tAssert evaluates the execution status of the main Job.
• Rename out.csv as reference.csv.This file is used as the expected result the main Job should output.
• Place tFileCompare, tAssert and tLogRow on the workspace.
• Connect them with Row Main link.
• Connect tFileInputDelimited to tFileCompare with OnSubjobOk link.
• Double-click tFileCompare to open its Component view.

• In the Component view, fill in the corresponding file paths in the File to compare field and the
Reference file field, leaving the other fields as default.
269
tAssert
For more information on the tFileCompare component, see tFileCompare on page 984.
• Then click tAssert and click the Component tab on the lower side of the workspace.
• In the Component view, edit the assertion row2.differ==0 in the expression field and the
descriptive message of the assertion in description field.
In the expression field, row2 is the data flow transmissing from tFileCompare to tAssert, differ
is one of the columns of the tFileCompare schema and presents whether the compared files
are identical, and 0 means no difference is detected between the out.csv and reference.csv by
tFileCompare. Hence when the compared files are identical, the assertive condition is thus fulfilled,
tAssert concludes that the main Job succeeds; otherwise, it concludes failure.
Note:
The differ column is in the read-only tFileCompare schema. For more information on its schema, see
tFileCompare on page 984.
• Press F6 to execute the Job.

• Check the result presented in the Run view
The console shows the comparison result of tFileCompare: Files are identical. But you find
nowhere the evaluation result of tAssert.
So you need tAssertCatcher to capture the evaluation.
• Place tAssertCatcher and tLogRow on the workspace.
• Connect them with Row Main link.
270
tAssert
• Use the default configuration in the Component view of tAssertCatcher.
• Press F6 to execute the Job.

• Check the result presented in the Run view. You will see the Job status information is added in:
2010-01-29 15:37:33|fAvAzH|TASSERT|JobAssertion|java|tAssert_1|Ok|--|
The output file should be identical with the reference file
The descriptive information on JobAssertion in the console is organized according to the

tAssertCatcher schema. This schema includes, in the following order, the execution time, the process
ID, the project name, the Job name, the code language, the evaluation origin, the evaluation result,
detailed information of the evaluation, descriptive message of the assertion. For more information on
the schema of tAssertCatcher, see tAssertCatcher on page 273.
The console indicates that the execution status of Job JobAssertion is Ok. In addition to the evalution,
you can still see other descriptive information about JobAssertion including the descriptive message
you have edited in the Basic settings of tAssert.
271
tAssert
Then you will perform operations to make the main Job fail to generate the expected file. To do so,
proceed as follows in the same Job you have executed:
• Delete a row in reference.csv.
• Press F6 to execute the Job again.
• Check the result presented in Run view.
2010-02-01 19:47:43|GeHJNO|TASSERT|JobAssertion|tAssert_1|Failed|Test
logically failed|The output file should be identical with the reference
file
The console shows that the execution status of the main Job is Failed. The detailed explanation for
this status is closely behind it, reading Test logically failed.
You can thus get a basic idea about your present Job status: it fails to generate the expected file
because of a logical failure. This logical failure could come from a logical mistake during the Job
design.
The status and its explanatory information are presented respectively in the status and the substatus
columns of the tAssertCatcher schema. For more information on the columns, see tAssertCatcher on
page 273.
272
tAssertCatcher
tAssertCatcher
Generates a data flow consolidating the status information of a job execution and transfer the data
into defined output files.
Based on its pre-defined schema, tAssertCatcher fetches the execution status information from
repository, Job execution and tAssert.
tAssertCatcher Standard properties

These properties are used to configure tAssertCatcher running in the Standard Job framework.
The Standard tAssertCatcher component belongs to the Logs & Errors family.
Basic settings
processed and passed on to the next component. In this
particular case, the schema is read-only, as this component
gathers standard log information including:
Moment: Processing time and date.
Pid: Process ID.
Project: Project which the job belongs to.
Job: Job name.
Language: Language used by the Job (Java)
Origin: Status evaluation origin. The origin may be different

tAssert components.
Status: Evaluation fetched from tAssert. They may be

- Ok: if the assertive statement of tAssert is evaluated as
true at runtime.
- Failed: if the assertive statement of tAssert is evaluated
as false or an execution error occurs at runtime. The tested
Job's result does not match the expectation or an execution
error occured at runtime.
Substatus: Detailed explanation for failed execution. The

explanation can be:
- Test logically failed: the investigated Job does not produce
the expected result.
- Execution error: an execution error occurred at runtime.
Description: Descriptive message typed in Basic settings of

tAssert (when Catch tAssert is selected) and/or the message
of the exception captured (when Catch Java Exception is
selected).
273
tAssertCatcher
Exception: The Exception object thrown by the Job, namely

the original exception.
Available when Get original exception is selected.
Catch Java Exception This check box allows to capture Java exception errors and
show the message in the Description column (Get original
exception not selected) or in the Exception column (Get
original exception selected) column, once checked.
Get original exception This check box allows to show the original exception object
in the Exception column, once checked.
Available when Catch Java Exception is selected.
Catch tAssert This check box allows to capture the evaluations of tAssert.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is the start component of a secondary Job

which fetches the execution status information from several
sources. It generates a data flow to transfer the information
to the component which proceeds.
Limitation This component must be used with tAssert together.
Related scenarios
For using case in relation with tAssertCatcher, see tAssert scenario:
• Setting up the assertive condition for a Job execution on page 267
274
tAzureAdlsGen2Input
tAzureAdlsGen2Input
Retrieves data from an ADLS Gen2 file system of an Azure storage account and passes the data to the
subsequent component connected to it through a Main>Row link.
tAzureAdlsGen2Input Standard properties

These properties are used to configure tAzureAdlsGen2Input running in the Standard Job framework.
The Standard tAzureAdlsGen2Input component belongs to the Cloud family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
projects and Job designs.
Click Edit schema to make changes to the schema.
Note: If you make changes, the schema automatically

becomes built-in.

only.
275
tAzureAdlsGen2Input
Guess schema Click this button to retrieve the schema from the data object
specified.
Authentication method Select one of the following authentication method from the
drop-down list.
• Shared key, which requires an account access key. See
Manage a storage account for related information.
• Shared access signature, which requires a shared
access signature. See Constructing the Account SAS
URI for related information.
Account name Enter the name of the Data Lake Storage account you need
to access. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
account.
Endpoint suffix Enter the Azure Storage service endpoint.

The combination of the account name and the Azure
Storage service endpoint forms the endpoint of the storage
account.
Shared key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access. To know
how to get your key, read Manage a storage account.
This field is available if you select Shared key from
Authentication method drop-down list.
SAS token Enter your account SAS token. You can get the SAS token
for each allowed service on the Microsoft Azure portal
after generating SAS. The SAS token format is https://
<$storagename><$service>.core.windows.net/
<$sastoken>, where <$storagename> is the storage
account name, <$service> is the allowed service name
(blob, file, queue or table), and <$sastoken> is the SAS
token value. For more information, read Constructing the
Account SAS URI.
This field is available if you select Shared access signature
from Authentication method drop-down list.
Check connection Click this button to validate the connection parameters

provided.
Filesystem Enter the name of the target Blob container.

You can also click the ... button to the right of this field and
select the desired Blob container from the list in the dialog
box.
Blobs Path Enter the path to the target blobs.
Format Set the format for the incoming data. Currently, the
following formats are supported: CSV, AVRO, JSON, and
Parquet.
Field Delimiter Set the field delimiter. You can select Semicolon, Comma,
Tabulation, and Space from the drop-down list; you can
also select Other and enter your own in the Custom field
delimiter field.
276
tAzureAdlsGen2Input
Record Separator Set the record separator. You can select LF, CR, and CRLF
from the drop-down list; you can also select Other and enter
your own in the Custom Record Separator field.
Text Enclosure Character Enter the character used to enclose text.
Escape character Enter the character of the row to be escaped.
Header Select this check box to insert a header row to the data
retrieved.
Note:
• Select this option if the data to be retrieved has a
header row. In this case, you need also to make sure
that the column names in the schema are consistent
with the column headers of the data.
• Clear this option if the data to be retrieved does not
have a header row. In this case, you need to name
the columns in the schema as field0, field1,
field2, and so on.
File Encoding Select the file encoding from the drop-down list.
Advanced settings
Global Variables

NB_LINE The number of rows successfully processed. This is an After

Usage

Job or subJob and it always needs an output link.
Related scenario
For a related scenario, see Accessing Azure ADLS Gen2 storage on page 280.
277
tAzureAdlsGen2Output
Uploads incoming data to an ADLS Gen2 file system of an Azure storage account in the specified
format.
tAzureAdlsGen2Output Standard properties

These properties are used to configure tAzureAdlsGen2Output running in the Standard Job framework.
The Standard tAzureAdlsGen2Output component belongs to the Cloud family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.
Basic settings
filled in.

becomes built-in.

only.
278
Sync colnmns Click this button to retrieve the schema from the previous
Authentication method Select one of the following authentication method from the
drop-down list.
• Shared key, which requires an account access key. See
Manage a storage account for related information.
• Shared access signature, which requires a shared
access signature. See Constructing the Account SAS
URI for related information.
Account name Enter the name of the Data Lake Storage account you need
to access. Ensure that the administrator of the system has
account.
Endpoint suffix Enter the Azure Storage service endpoint.

The combination of the account name and the Azure
Storage service endpoint forms the endpoint of the storage
account.
Shared key Enter the key associated with the storage account you need
default, either of them can be used for this access. To know
how to get your key, read Manage a storage account.
This field is available if you select Shared key from
Authentication method drop-down list.
SAS token Enter your account SAS token. You can get the SAS token
for each allowed service on the Microsoft Azure portal
after generating SAS. The SAS token format is https://
<$storagename><$service>.core.windows.net/
<$sastoken>, where <$storagename> is the storage
account name, <$service> is the allowed service name
(blob, file, queue or table), and <$sastoken> is the SAS
token value. For more information, read Constructing the
Account SAS URI.
This field is available if you select Shared access signature
from Authentication method drop-down list.
Check connection Click this button to validate the connection parameters

provided.
Filesystem Enter the name of the target Blob container.

You can also click the ... button to the right of this field and
select the desired Blob container from the list in the dialog
box.
Blobs Path Enter the path to the target blobs.
Format Set the format for the incoming data. Currently, the
following formats are supported: CSV, AVRO, JSON, and
Parquet.
Field Delimiter Set the field delimiter. You can select Semicolon, Comma,
Tabulation, and Space from the drop-down list; you can
also select Other and enter your own in the Custom field
delimiter field.
279
Record Separator Set the record separator. You can select LF, CR, and CRLF
from the drop-down list; you can also select Other and enter
your own in the Custom Record Separator field.
Text Enclosure Character Enter the character used to enclose text.
Escape character Enter the character of the row to be escaped.
Header Select this check box to insert a header row to the data. The
schema column names will be used as column headers.
File Encoding Select the file encoding from the drop-down list.
Advanced settings
Max batch size Set the maximum number of lines allowed in each batch.
Do not change the default value unless you are facing
performance issues. Increasing the batch size can improve
the performance but a value too high could cause Job
failures.
Blob Template Name Enter a string as the name prefix for the Blob files
generated. The name of a Blob file generated will be the
name prefix followed by another string.
Global Variables

NB_LINE The number of rows successfully processed. This is an After

Usage
Usage rule This component is usually used as an end component of a

Job or subJob and it always needs an input link.
Accessing Azure ADLS Gen2 storage

This scenario demonstrates the use of the tAzureAdlsGen2Output and tAzureAdlsGen2Input
components. In the first subJob, a tFixedFlowInput component passes data to tAzureAdlsGen2Output,
which then uploads the data to Azure ADLS Gen2 storage; in the second subJob, tAzureAdlsGen2Input
reads the data and passes it to tLogRow.
280
In this scenario, the following data is uploaded and then retrieved.
1;James
2;Josephine
3;Donette
4;Simona
5;Mitsue
6;Leota
This scenario requires an Azure storage user account with permissions for reading and writing files.
Optionally, you can monitor the data using Microsoft Azure Storage Explorer, a utility for managing
your Azure storage resources. Check Azure Storage Explorer for related information.
Accessing Azure ADLS Gen2 storage: establishing the Job

Procedure
1. Create a standard Job and drop tFixedFlowInput, tAzureAdlsGen2Output, tAzureAdlsGen2Input,
and tLogRow onto the workspace.
2. Connect tFixedFlowInput and tAzureAdlsGen2Output using the Row > Main link.
3. Connect tAzureAdlsGen2Input and tLogRow using the Row > Main link.
4. Connect tFixedFlowInput and tAzureAdlsGen2Input using the RowTrigger > OnSubjobOk link.
Accessing Azure ADLS Gen2 storage: setting up the Job

Procedure
1. In the Basic settings view of tFixedFlowInput:
• Click the Edit schema button and add two columns: id (type Integer) and name (type String);
• Select Use Inline Content(delimited file) and enter the following into the Content field.
1;James
2;Josephine
3;Donette
4;Simona
5;Mitsue
6;Leota
• Leave other options as they are.

2. In the Basic settings view of tAzureAdlsGen2Output:
• Click the Edit schema button and add two columns: id (type Integer) and name (type String);
• Provide your Azure storage user account credentials in the Authentication method, Account
name, Endpoint suffix, and Shared key.
• Validate your Azur storage user account by clicking Check connection.
281
• Enter the name of an existing Blob container in Filesystem. You can also click ... to the right of
this field and select the Blob container from the list in the dialog box.
• In Blobs Path, enter the name of the directory where you want to put the data.
• Select CSV for Format; Semicolon for Field Delimiter; and CRLF for Record Separator. Select
the Header option.
3. In the Advanced settings view of tAzureAdlsGen2Input, enter the prefix for the Blob files
generated in the Blob Template Name field (data- in this example).
4. Do exact the same described in step 2 for the tAzureAdlsGen2Input component. Be sure to
propagate the schema to the subsequent component when prompted.
5. In the Basic settings view of tLogRow:
• Select Table (print values in cells of a table).
Accessing Azure ADLS Gen2 storage: executing the Job

Procedure
2. Check the result in the Run console.
3. (Option) Check the Blob file generated using Microsoft Azure Storage Explorer. See Get started
with Storage Explorer for related information.
282
tAzureStorageConnection
Uses authentication and the protocol information to create a connection to the Microsoft Azure
Storage system that can then be reused by other Azure Storage components.
tAzureStorageConnection Standard properties

These properties are used to configure tAzureStorageConnection running in the Standard Job
framework.
The Standard tAzureStorageConnection component belongs to the Cloud family.
Basic settings
filled in.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
storage account.
Account Key Enter the key associated with the storage account you need
default, either of them can be used for this access.
Protocol Select the protocol for this connection to be created.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
283
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Advanced settings
Global Variables

Usage
Usage rule This component is generally used with other Azure Storage
components.
Knowledge about Microsoft Azure Storage is required.
Related scenario
• Retrieving files from a Azure Storage container on page 303
• Creating a container in Azure Storage on page 286
• Handling data with Microsoft Azure Table storage on page 313
284
tAzureStorageContainerCreate
Creates a new storage container used to hold Azure blobs (Binary Large Object) for a given Azure
storage account.
tAzureStorageContainerCreate Standard properties

These properties are used to configure tAzureStorageContainerCreate running in the Standard Job
framework.
The Standard tAzureStorageContainerCreate component belongs to the Cloud family.
Basic settings
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Connection Component Select the component whose connection details will be

used to set up the connection to Azure storage from the
drop-down list.
storage account.
Signatures (SAS).
285

SAS URI.
running your Job.
Container name Enter the name of the blob container you need to create.
Access control Select the access restriction level you need to apply on the
container to be created.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
Advanced settings
Global Variables
CONTAINER The name of the blob container. This is an After variable

and it returns a string.

Usage
Usage rule This component can be used as a standalone component of

a Job or subJob.
Prerequisites Knowledge about Microsoft Azure Storage is required.
Creating a container in Azure Storage

In this scenario, a four-component Job uses Azure Storage components to create a container in a given
Azure Storage system and check whether this container is successfully created.
286
Before replicating this scenario, you must have appropriate rights and permissions to read and write
files in the Azure storage account to be used. For further information, see Microsoft's documentation
for Azure Storage: http://azure.microsoft.com/en-us/documentation/services/storage/.

Procedure
1. In the Integration perspective of the Studio, create an empty Job, named azureTalend for
example, from the Job Designs node in the Repository tree view.
2. Drop tAzureStorageConnection, tAzureStorageContainerCreate, tAzureStorageContainerExist and
tJava onto the workspace.
3. Connect them using the Trigger > OnSubjobOk link.
Connecting to an Azure storage account

Procedure
1. Double-click tAzureStorageConnection to open its Component view.
2. In the Account name field, enter the name of the storage account to be connected to. In this exam
ple, it is talendstorage, an account that has been created for demonstration purposes.
3. In the Account key field, paste the primary or the secondary key associated with the storage
account to be used. These keys can be found in the Manage Access Key dashboard in the Azure St
orage system to be connected to.
4. From the Protocol list, select the protocol for the endpoint of the storage account to be used. In
this example, it is HTTPS.
287
Creating a container
Procedure
1. Double-click tAzureStorageContainerCreate to open its Component view.
2. Select the component whose connection details will be used to set up the Azure storage
connection. In this example, it is tAzureStorageConnection_1.
3. In the Container name field, enter the name of the container you need to create. If a container
using the same name exists, that container will be overwritten at runtime.
4. From the Access control list, select the access restriction level for the container to be created. In
this example, select Private.
Verifying the creation

Procedure
1. Double-click tAzureStorageContainerExist to open its Component view.
2. Select the component whose connection details will be used to set up the Azure storage c
onnection. In this example, it is tAzureStorageConnection_1.
3. In the Container name field, enter the name of the container you need to check whether it exists.
4. Double-click tJava to open its Component view.
5. In the Code field, enter System.out.println();

6. In the Outline panel, which, by default, is found to the left side of the Component view, expand
the tAzureStorageContainerExist node.
288
7. From the Outline panel, drop the CONTAINER_EXSIT global variable into the parentheses
in the code in the Component view in order to make the code read: System.out.pri
ntln(((Boolean)globalMap.get("tAzureStorageContainerExist_1_CONTAINER_
EXIST")));
Executing the Job

Procedure
1. Press F6 to run this Job.
2. Check the execution result on the Run console.
You can read that the Job returns true as the verification result, that is to say, the
talendcontainer container has been created in the storage account being used.
3. Double-check the result in the web console of the Azure storage account.
289
You can read as well that the talendcontainer container has been created.
290
tAzureStorageContainerDelete
Automates the removal of a given blob container from the space of a specific storage account.
tAzureStorageContainerDelete Standard properties

These properties are used to configure tAzureStorageContainerDelete running in the Standard Job
framework.
The Standard tAzureStorageContainerDelete component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
291
SAS URI.
running your Job.
Container name Enter the name of the blob container to be removed.
an error occurs.
Advanced settings
Global Variables


Usage

a Job or subJob.
Related scenarios
292
tAzureStorageContainerExist
Automates the verification of whether a given blob container exists or not within a storage account.
tAzureStorageContainerExist Standard properties

These properties are used to configure tAzureStorageContainerExist running in the Standard Job
framework.
The Standard tAzureStorageContainerExist component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
293
SAS URI.
running your Job.
Container name Enter the name of the blob container you need to verify
whether it exists.
an error occurs.
Advanced settings
Global Variables

CONTAINER_EXIST The result of whether the given container exists or not. This
is an After variable and it returns a boolean.

Usage

a Job or subJob.
Related scenario
For a related scenario, see Creating a container in Azure Storage on page 286
294
tAzureStorageContainerList
Lists all containers in a given Azure storage account.
tAzureStorageContainerList Standard properties

These properties are used to configure tAzureStorageContainerList running in the Standard Job
framework.
The Standard tAzureStorageContainerList component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
295
SAS URI.
running your Job.
The schema of this component is predefined with a single
column ContainerName of String type, which indicates
the name of each container to be listed.
available:
only.
an error occurs.
Advanced settings
Global Variables
NB_LINE The number of rows processed. This is an After variable and

it returns an integer.

296
Usage

Related scenario
No scenario is available for this component yet.
297
tAzureStorageDelete
tAzureStorageDelete
Deletes blobs from a given container for an Azure storage account according to the specified blob
filters.
tAzureStorageDelete Standard properties

These properties are used to configure tAzureStorageDelete running in the Standard Job framework.
The Standard tAzureStorageDelete component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
298
tAzureStorageDelete
SAS URI.
running your Job.
Container name Enter the name of the container from which you need to
delete blobs.
Blob filter Complete this table to select the blobs to be deleted. The
parameters to be provided are:
• Blob prefix: enter the common prefix of the names of
the blobs you need to delete. This prefix allows you to
filter the blobs which have the specified prefix in their
names in the given container.
A blob name contains the virtual hierarchy of the blob
itself. This hierarchy is a virtual path to that blob and is
relative to the container where that blob is stored. For
example, in a container named photos, the name of a
photo blob might be 2014/US/Oakland/Talend.jpg.
For this reason, when you define a prefix, you are
actually designating a directory level as the blob filter,
for example, 2014/ or 2014/US/.
• Include subdirectories: select this check box to select
all of the sub-folders and the blobs in those folders
beneath the designated directory level. If you leave
this check box clear, tAzureStorageDelete deletes only
the blobs directly beneath that directory level.
an error occurs.
Advanced settings
Global Variables


299
tAzureStorageDelete
Usage

a Job or subJob.
Related scenarios
300
tAzureStorageGet
tAzureStorageGet
Retrieves blobs from a given container for an Azure storage account according to the specified filters
applied on the virtual hierarchy of the blobs and then write selected blobs in a local folder.
tAzureStorageGet Standard properties

These properties are used to configure tAzureStorageGet running in the Standard Job framework.
The Standard tAzureStorageGet component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
301
tAzureStorageGet
SAS URI.
running your Job.
Container Enter the name of the container you need to retrieve blobs
from.
Local folder Enter the path, or browse to the folder in which you need to
store the retrieved blobs.
Blobs Complete this table to select the blobs to be retrieved. The

parameters to be provided are:
• Prefix: enter the common prefix of the names of the
blobs you need to retrieve. This prefix allows you to
filter the blobs which have the specified prefix in their
If you want to select the blobs stored directly beneath
the container level, that is to say, the blobs without
virtual path in their names, remove quotation marks
and enter null.
• Include sub-directories: select this check box to r
etrieve all of the sub-folders and the blobs in those
folders beneath the designated directory level in
the Blob prefix column. If you leave this check box
clear, tAzureStorageGet returns only the blobs directly
beneath that directory level.
• Create parent directories: select this check box to r
eplicate the virtual directory of the retrieved blobs in t
he local folder.
Note that if you leave this check box clear, there
must be the same directory in the local folder as the
retrieved blobs have in the container; otherwise, those
blobs cannot be retrieved.
an error occurs.
302
tAzureStorageGet
Advanced settings
Global Variables

LOCAL_FOLDER The local directory used in this component. This is an After


Usage

a Job or subJob.
Retrieving files from a Azure Storage container

In this scenario, a five-component Job uses Azure Storage components to write files in a given Azure
Storage system and then retrieve selected files (blobs in terms of Azure Storage) from that system.
Before replicating this scenario, you must have appropriate rights and permissions to read and write
files in the Azure storage account to be used. For further information, see Microsoft's documentation
for Azure Storage: http://azure.microsoft.com/en-us/documentation/services/storage/.
303
tAzureStorageGet
The talendcontainer container used in this scenario was created using tAzureStorageContainerCreate
in the scenario Creating a container in Azure Storage on page 286.

Procedure
1. In the Integration perspective of the Studio, create an empty Job, named azureTalend for
2. Drop tAzureStoragePut, tAzureStorageList, tJava and tAzureStorageGet onto the workspace.
3. Connect the Azure Storage components using the Trigger > OnSubjobOk link while connect
tAzureStorageList to tJava using the Row > Iterate link.
Connecting to an Azure storage account

Procedure
1. Double-click tAzureStorageConnection to open its Component view.
2. In the Account name field, enter the name of the storage account to be connected to. In this exam
ple, it is talendstorage, an account that has been created for demonstration purposes.
3. In the Account key field, paste the primary or the secondary key associated with the storage
account to be used. These keys can be found in the Manage Access Key dashboard in the Azure St
orage system to be connected to.
4. From the Protocol list, select the protocol for the endpoint of the storage account to be used. In
this example, it is HTTPS.
Writing files in Azure Storage

Procedure
1. Double-click tAzureStoragePut to open its Component view.
304
tAzureStorageGet
3. In the Container name field, enter the name of the container you need to write files in. In this
example, it is talendcontainer, a container created in the scenario Creating a container in
Azure Storage on page 286.
4. In the Local folder field, enter the path, or browse, to the directory where the files to be used are
stored. In this scenario, they are some pictures showing technical process and stored locally in
E:/photos. Therefore, put E:/photos; this allows tAzureStoragePut to upload all the files of
this folder and its sub-folders into the talendcontainer container.
For demonstration purposes, the example photos are organized as follows in the E:/photos
folder.
• Directly beneath the E:/photos level:
components-use_case_triakinput_1.png
• In the E:/photos/mongodb/step1 directory:
components-use_case_tmongodbbulkload_1.png
• In the E:/photos/mongodb/step2 directory:
5. In the Azure Storage folder field, enter the directory where you want to write files. This directory
will be created in the container to be used if it does not exist. In this example, enter photos.
Verifying the file transfer

Configuring tAzureStorageList
Procedure
1. Double-click tAzureStorageList to open its Component view.
305
tAzureStorageGet
3. In the Container name field, enter the name of the container in which you need to check whether
the given files exist. In this scenario, it is talendcontainer.
4. Under the Blob filter table, click the [+] button to add one row in the table.
5. In the Prefix column, enter the common prefix of the names of the files (blobs) to be checked.
This prefix represents a virtual directory level you designate as the starting point down from
which files (blobs) are checked. In this example, it is photos/.
For further information about blob names, see http://msdn.microsoft.com/en-us/library/dd
135715.aspx.
6. In the Include sub-directories column, select the check box in the newly added row. This allows
tAzureStorageList to check all the files at any hierarchical level beneath the designated starting
point.
Configuring tJava
Procedure
1. Double-click tJava to open its Component view.
2. In the Code field, enter System.out.println();

3. In the Outline panel, which, by default, is found to the left side of the Component view, expand
the tAzureStorageList node.
306
tAzureStorageGet
4. From the Outline panel, drop the CONTAINER_BLOB global variable into the parentheses in the
code in the Component view so as to make the code read: System.out.println(((Boolean
)globalMap.get("tAzureStorageList_1_CURRENT_BLOB")));
Retrieving selected files

Procedure
1. Double-click tAzureStorageGet to open its Component view.
3. In the Container name field, enter the name of the container from which you need to retrieve files.
In this scenario, it is talendcontainer.
4. In the Local folder field, enter the path, or browse, to the directory where you want to put the retri
eved files. In this example, it is E:/screenshots.
5. Under the Blob table, click the [+] button to add one row in the table.
6. In the Prefix column, enter the common name prefix of the files (blobs) to be retrieved. In this
example, it is photos/mongodb/.
7. In the Include sub-directories column, select the check box in the newly added row. This allows
tAzureStorageGet to retrieve all the files (blobs) beneath the photos/mongodb/ level.
8. In the Create parent directories column, select the check box in the newly added row to create the
same directory in the specified local folder as the retrieved blobs have in the container.
307
tAzureStorageGet
Note that having this same directory is necessary for successfully retrieving blobs. If you leave
this check box clear, then you need to create the same directory yourself in the target local folder.
Executing the Job

Procedure
2. Check the execution result on the Run console.
You can read that the Job returns the list of the blobs with the photos prefix in the container.
3. Double-check the resut in the web console of the Azure storage account.
4. Check the retrieved files in the specified local folder.
308
tAzureStorageGet
You can see the blobs with the photos/mongodb/ prefix have been retrieved and their prefix
transformed to directories.
309
tAzureStorageInputTable
Retrieves a set of entities that satisfy the specified filter criteria from an Azure storage table.
tAzureStorageInputTable Standard properties

These properties are used to configure tAzureStorageInputTable running in the Standard Job
framework.
The Standard tAzureStorageInputTable component belongs to the Cloud family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
310

SAS URI.
running your Job.
Table name Specify the name of the table from which the entities will
be retrieved.
The schema of this component is predefined with the
following columns that describe the three system properties
of each entity:
• PartitionKey: the partition key for the partition that the
entity belongs to.
• RowKey: the row key for the entity within the partition.
PartitionKey and RowKey are string type values that
uniquely identify every entity in a table, and the user
must include them in every insert, update, and delete
operation.
• Timestamp: the time that the entity was last modified.
This DateTime value is maintained by the Azure server
and it can not be modified by the user.
For more information about these system properties, see
Understanding the Table Service Data Model.
available:
only.
Use filter expression Select this check box and complete the Filter expressions
table displayed to specify the conditions used to filter the
entities to be retrieved by clicking the [+] button to add as
311
many rows as needed, each row for a condition, and setting

the value for the following parameters for each condition.
• Column: specify the name of the property on which you
want to apply for the condition.
• Function: click the cell and select the comparison
operator you want to use from the drop-down list.
• Value: specify the value used to compare the property
to.
• Predicate: select the predicate used to combine the
conditions.
• Field type: click the cell and select the type of the
column from the drop-down list.
The generated filter expression will be displayed in the
read-only Effective filter field.
For more information about the filter expressions, see
Querying Tables and Entities.
an error occurs.
Advanced settings
Name mappings Complete this table to map the column name of the
component schema with the property name of the Azure
table entity if they are different.
• Schema column name: enter the column name of the
component schema between double quotation marks.
• Entity property name: enter the property name of the
Azure table entity between double quotation marks.
For example, if there are three schema columns
CompanyID, EmployeeID, and EmployeeName that
are used to feed the values for the PartitionKey,
RowKey, and Name entity properties respectively, since
the PartitionKey and RowKey columns have already
been added to the schema automatically and you do not
need to specify the mapping relationship for them, you
only need to add one row and set the value of the Schema
column name cell with "EmployeeName" and the value of
the Entity property name cell with "Name" to specify the
mapping relationship for the EmployeeName column when
retrieving data from the Azure table.
Global variables


312
Usage

Handling data with Microsoft Azure Table storage

Here is an example of using Talend components to connect to a Microsoft Azure storage account that
gives you access to Azure storage table service, write some employee data into an Azure storage table,
and then retrieve the employee data from the table and display it on the console.
The employee data used in this example is as follows:
#Id;Name;Site;Job;Date;Salary
12000;Gerald Roosevelt;Beijing;Software Developer;2008-01-01;15000.01
12001;Benjamin Harrison;Paris;Software Developer;2008-11-22;13000.11
12002;Bob Clinton;Beijing;Software Tester;2008-05-12;12000.22
12003;James Quincy;Paris;Technical Writer;2009-03-10;12000.33
12004;Gerald Harrison;Beijing;Software Tester;2009-06-20;12500.44
12005;Harry Madison;Paris;Software Developer;2009-10-15;14000.55
12006;Helen Roosevelt;Beijing;Software Tester;2009-03-25;13500.66
12007;Mary Clinton;Beijing;Software Developer;2010-02-20;16000.77
12008;Cathey Quincy;Paris;Software Developer;2010-07-15;14000.88
12009;John Smith;Beijing;Technical Writer;2011-02-10;12500.99
Creating a Job for handling data with Azure Table storage

Create a Job to connect to an Azure storage account, write some employee data into an Azure storage
table, and then retrieve that information from the table and display it on the console.
313
Procedure
1. Create a new Job and add a tAzureStorageConnection component, a tFixedFlowInput component,
a tAzureStorageOutputTable component, a tAzureStorageInputTable component, and a tLogRow
component by typing their names in the design workspace or dropping them from the Palette.
2. Link the tFixedFlowInput component to the tAzureStorageOutputTable component using a Row >
Main connection.
3. Do the same to link the tAzureStorageInputTable component to the tLogRow component.
4. Link the tAzureStorageConnection component to the tFixedFlowInput component using a Trigger
> OnSubjobOk connection.
5. Do the same to link the tFixedFlowInput component to the tAzureStorageInputTable component.
Connecting to an Azure Storage account

Configure the tAzureStorageConnection component to open the connection to an Azure Storage
account.
Before you begin

The Azure Storage account, which allows you to access the Azure Table storage service and store
the provided employee data, has already been created. For more information about how to create an
Azure Storage account, see About Azure storage accounts.
Procedure
1. Double-click the tAzureStorageConnection component to open its Basic settings view on the
Component tab.
2. In the Account Name field, specify the name of the storage account you need to access.
3. In the Account Key field, specify the key associated with the storage account you need to access.
Writing data into an Azure Storage table

Configure the tFixedFlowInput component and the tAzureStorageOutputTable component to write
the employee data into an Azure Storage table.
Procedure
1. Double-click the tFixedFlowInput component to open its Basic settings view on the Component
tab.
314
2. Click next to Edit schema to open the schema dialog box and define the schema by adding
six columns: Id, Name, Site, and Job of String type, Date of Date type, and Salary of Double
type. Then click OK to save the changes and accept the propagation prompted by the pop-up
dialog box.
Note that in this example, the Site and Id columns are used to feed the values of the
PartitionKey and RowKey system properties of each entity and they should be of String type,
and the Name column is used to feed the value of the EmployeeName property of each entity.
3. In the Mode area, select Use Inline Content(delimited file) and in the Content field displayed,
enter the employee data that will be written into the Azure Storage table.
4. Double-click the tAzureStorageOutputTable component to open its Basic settings view on the
Component tab
315
5. From the connection component drop-down list, select the component whose connection
details will be used to set up the connection to the Azure Storage service, tAzureStorageC
onnection_1 in this example.
6. In the Table name field, enter the name of the table into which the employee data will be written,
employee in this example.
7. From the Action on table drop-down list, select the operation to be performed on the specified
table, Drop table if exist and create in this example.
8. Click Advanced settings to open its view.
9. Click under the Name mappings table to add three rows and map the schema column name
with the property name of each entity in the Azure table. In this example,
• the Site column is used to feed the value of the PartitionKey system property, in the
first row you need to set the Schema column name cell with the value "Site" and the Entity
property name cell with the value "PartitionKey".
• the Id column is used to feed the value of the RowKey system property, in the second row
you need to set the Schema column name cell with the value "Id" and the Entity property
name cell with the value "RowKey".
• the Name column is used to feed the value of the EmployeeName property, in the third row
you need to set the Schema column name cell with the value "Name" and the Entity property
name cell with the value "EmployeeName".
Retrieving data from the Azure Storage table

Configure the tAzureStorageInputTable component and the tLogRow component to retrieve the
employee data from the Azure Storage table.
Procedure
1. Double-click the tAzureStorageInputTable component to open its Basic settings view.
316
2. From the connection component drop-down list, select the component whose connection
details will be used to set up the connection to the Azure Storage service, tAzureStorageC
onnection_1 in this example.
3. In the Table name field, enter the name of the table from which the employee data will be
retrieved, employee in this example.
4. Click next to Edit schema to open the schema dialog box.
Note that the schema has already been predefined with two read-only columns RowKey and
PartitionKey of String type, and another column Timestamp of Date type. The RowKey
and PartitionKey columns correspond to the Id and Site columns of the tAzureStorageO
utputTable schema.
5. Define the schema by adding another four columns that hold other employee data, Name and Job
of String type, Date of Date type, and Salary of Double type. Then click OK to save the changes
and accept the propagation prompted by the pop-up dialog box.
317
7. Click under the Name mappings table to add one row and set the Schema column name cell
with the value "Name" and the Entity property name cell with the value "EmployeeName" to
map the schema column name with the property name of each entity in the Azure table.
Note that for the tAzureStorageInputTable component, the PartitionKey and RowKey
columns have already been added automatically to the schema and you do not need to specify the
mapping relationship for them.
8. Double-click the tLogRow component to open its Basic settings view and in the Mode area, select
Table (print values in cells of a table) for a better display of the result.
Executing the Job to handle data with Azure Table storage

After setting up the Job and configuring the components used in the Job for handling data with Azure
Table storage, you can then execute the Job and verify the Job execution result.
Procedure
As shown above, the Job is executed successfully and the employee data is displayed on the
console, with the timestamp value that indicates when each entity was inserted.
3. Double-check the employee data that has been written into the Azure Storage table employee
using Microsoft Azure Storage Explorer if you want.
318
319
tAzureStorageList
tAzureStorageList
Lists blobs in a given container according to the specified blob filters.
tAzureStorageList Standard properties

These properties are used to configure tAzureStorageList running in the Standard Job framework.
The Standard tAzureStorageList component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
320
tAzureStorageList

SAS URI.
running your Job.
Container name Enter the name of the container from which you need to
select blobs to be listed.
Blob filter Complete this table to select the blobs to be listed. The par
ameters to be provided are:
• Prefix: enter the common prefix of the names of the
blobs you need to list. This prefix allows you to filter
the blobs which have the specified prefix in their
If you want to select the blobs stored directly beneath
the container level, that is to say, the blobs without
virtual path in their names, remove quotation marks
and enter null.
• Include sub-directories: select this check box to select
all of the sub-folders and the blobs in those folders
beneath the designated directory level. If you leave
this check box clear, tAzureStorageList returns only
the blobs, if any, directly beneath that directory level.
The schema of this component is predefined with a single
column BlobName of String type, which indicates the nam
e of each blob to be listed.
available:
only.
321
tAzureStorageList

an error occurs.
Advanced settings
Global Variables

CURRENT_BLOB The blob name being processed by this component. This is



Usage

a Job or subJob.
Related scenario
For a related scenario, see Retrieving files from a Azure Storage container on page 303
322
tAzureStorageOutputTable
Performs the defined action on a given Azure storage table and inserts, replaces, merges or deletes
entities in the table based on the incoming data from the preceding component.
tAzureStorageOutputTable Standard properties

These properties are used to configure tAzureStorageOutputTable running in the Standard Job
framework.
The Standard tAzureStorageOutputTable component belongs to the Cloud family.
Fabric.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
323

SAS URI.
running your Job.
Table name Specify the name of the table into which the entities will be
written.
available:
only.
Partition Key Select the schema column that holds the partition key value
from the drop-down list.
Row Key Select the schema column that holds the row key value
from the drop-down list.
Action on data Select an action to be performed on data of the table

defined.
• Insert: insert a new entity into the table.
• Insert or replace: replace an existing entity or insert
a new entity if it does not exist. When replace an
entity, any properties from the previous entity will be
removed if the new entity does not define them.
• Insert or merge: merge an existing entity or insert a
new entity if it does not exist. When merge an entity,
any properties from the previous entity will be retained
if the new entity does not define or include them.
324
• Merge: update an existing entity without removing the

property value of the previous entity if the new entity
does not define its value.
• Replace: update an existing entity and remove the
property value of the previous entity if the new entity
does not define its value.
• Delete: delete an existing entity.
For performance reasons, the incoming data is processed
in parallel and in random order. Therefore, it is not
recommended to perform any order-sensitive data operation
(for example, insert or replace) if there are duplicated rows
in your data.
Action on table Select an operation to be performed on the table defined.

• Default: No operation is carried out.
created again.
• Create table: The table does not exist and gets created.
• Create table if does not exist: The table is created if it
does not exist.
• Drop table if exist and create: The table is removed if it
Process in batch Select this check box to process the input entities in batch.
Note that the entities to be processed in batch should
belong to the same partition group, which means, they
should have the same partition key value.
an error occurs.
Advanced settings
Name mappings Complete this table to map the column name of the
component schema with the property name of the Azure
table entity if they are different.
• Schema column name: enter the column name of the
component schema between double quotation marks.
• Entity property name: enter the property name of the
Azure table entity between double quotation marks.
For example, if there are three schema columns
CompanyID, EmployeeID, and EmployeeName that are
used to feed the values for the PartitionKey, RowKey,
and Name entity properties respectively, then you need to
add the following rows for the mapping when writing data
into the Azure table.
• the Schema column name cell with the value
"CompanyID" and the Entity property name cell with
the value "PartitionKey".
"EmployeeID" and the Entity property name cell
with the value "RowKey".
"EmployeeName" and the Entity property name cell
with the value "Name".
325
Global variables

NB_SUCCESS The number of rows successfully processed. This is an After

NB_REJECT The number of rows rejected. This is an After variable and it

returns an integer.

Usage

Related scenario
For a related scenario, see Handling data with Microsoft Azure Table storage on page 313.
326
tAzureStoragePut
tAzureStoragePut
Uploads local files into a given container for an Azure storage account.
tAzureStoragePut Standard properties

These properties are used to configure tAzureStoragePut running in the Standard Job framework.
The Standard tAzureStoragePut component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
327
tAzureStoragePut

SAS URI.
running your Job.
Container name Enter the name of the container you need to write files in.
This container must exist in the Azure Storage system you
are using.
Local folder Enter the path, or browse to the folder from which you need
to upload files.
Azure storage folder Enter the path to the virtual blob folder in the remote Azure
storage system you want to upload files into.
If you do not put any value in this field but leave this
field as it is with only its default quotation marks,
tAzureStoragePut writes files directly beneath the
container level.
Use file list Select this check box to be able to define file filtering
conditions. Once selecting it, the Files table is displayed.
Files Complete this table to select the files to be uploaded into

Azure. The parameters to be provided are:
• Filemask: file names or path to the files to be
uploaded.
• New name: name to give to the files after they are
uploaded.
an error occurs.
Advanced settings
Global Variables

LOCAL_FOLDER The local directory used in this component. This is an After

REMOTE_FOLDER The remote directory used in this component. This is an

328
tAzureStoragePut

Usage

a Job or subJob.
Related scenario
For a related scenario, see Retrieving files from a Azure Storage container on page 303
329
tAzureStorageQueueCreate
Creates a new queue under a given Azure storage account.
tAzureStorageQueueCreate Standard properties

These properties are used to configure tAzureStorageQueueCreate running in the Standard Job
framework.
The Standard tAzureStorageQueueCreate component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
330
SAS URI.
running your Job.
Queue name Specify the name of the Azure queue to be created. For
more information about the queue naming rules, see
Naming Queues and Metadata.
an error occurs.
Advanced settings
Global Variables
QUEUE_NAME The name of the Azure queue. This is an After variable and
it returns a string.

Usage

a Job or subJob.
Related scenario
331
tAzureStorageQueueDelete
Deletes a specified queue permanently under a given Azure storage account.
tAzureStorageQueueDelete Standard properties

These properties are used to configure tAzureStorageQueueDelete running in the Standard Job
framework.
The Standard tAzureStorageQueueDelete component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
332
SAS URI.
running your Job.
Queue name Specify the name of the Azure queue to be deleted.
an error occurs.
Advanced settings
Global Variables

Usage

a Job or subJob.
Related scenario
333
tAzureStorageQueueInput
Retrieves one or more messages from the front of an Azure queue.
tAzureStorageQueueInput Standard properties

These properties are used to configure tAzureStorageQueueInput running in the Standard Job
framework.
The Standard tAzureStorageQueueInput component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
334
SAS URI.
running your Job.
Queue name Specify the name of the Azure queue from which the
messages will be retrieved.
following columns:
• MessageId: the id of the message.
• MessageContent: the body of the message.
• InsertionTime: the time when the message was added
to the queue.
• ExpirationTime: the time when the message will be
expired.
• NextVisibleTime: the time when the message becomes
visible next time.
• DequeueCount: the number of times that the message
has been dequeued. This value is incremented each
time the message is dequeued, but it will not be
incremented when the message is peeked.
• PopReceipt: the pop receipt value that is required to
delete the message.
available:
only.
Number of messages Enter the number of messages to be retrieved from the

specified queue at a time, up to a maximum of 32.
335
Peek messages Select this check box to retrieve messages without

removing them from the queue and altering the visibility
of them. The messages will remain available to other
consumers.
Delete the message while streaming Select this check box to delete the messages while
retrieving them from the queue.
an error occurs.
Advanced settings
Visibility timeout in seconds Enter the visibility timeout value (in seconds) relative
to the server time. This timeout value is added to the
time at which the message is retrieved to determine its
NextVisibleTime value. The message will not be visible
to other consumers for this time interval after it has been
retrieved.
Global Variables


Usage

Related scenario
336
tAzureStorageQueueInputLoop
Runs an endless loop to retrieve messages from the front of an Azure queue.
tAzureStorageQueueInputLoop Standard properties

These properties are used to configure tAzureStorageQueueInputLoop running in the Standard Job
framework.
The Standard tAzureStorageQueueInputLoop component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
337
SAS URI.
running your Job.
Queue name Specify the name of the Azure queue from which the
messages will be retrieved.
following columns:
• MessageId: the id of the message.
• MessageContent: the body of the message.
• InsertionTime: the time when the message was added
to the queue.
• ExpirationTime: the time when the message will be
expired.
• NextVisibleTime: the time when the message becomes
visible next time.
• DequeueCount: the number of times that the message
has been dequeued. This value is incremented each
time the message is dequeued, but it will not be
incremented when the message is peeked.
• PopReceipt: the pop receipt value that is required to
delete the message.
available:
only.
Number of messages Enter the number of messages to be retrieved from the

specified queue at a time, up to a maximum of 32.
338
Loop wait time Specify the duration (in seconds) for which the loop will
wait for the message to arrive in the queue before returning.
an error occurs.
Advanced settings
Global Variables


Usage

Related scenario
339
tAzureStorageQueueList
Returns all queues associated with the given Azure storage account.
tAzureStorageQueueList Standard properties

These properties are used to configure tAzureStorageQueueList running in the Standard Job
framework.
The Standard tAzureStorageQueueList component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
340
SAS URI.
running your Job.
The schema of this component is predefined with one single
column QueueName that stores the name of each queue to
be returned.
available:
only.
an error occurs.
Advanced settings
Global Variables
NUMBER_OF_QUEUES The number of queues returned. This is an After variable

and it returns an integer.

341
Usage

Related scenario
342
tAzureStorageQueueOutput
Adds messages to the back of an Azure queue.
Note that this component can only be used with Java 8.
tAzureStorageQueueOutput Standard properties

These properties are used to configure tAzureStorageQueueOutput running in the Standard Job
framework.
The Standard tAzureStorageQueueOutput component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
343

SAS URI.
running your Job.
Queue name Specify the name of the Azure queue to which the messages
will be added.
The schema of this component is predefined with one single
column MessageContent that stores the body of each
message.
available:
only.
an error occurs.
Advanced settings
Global Variables
NB_LINE The number of messages processed. This is an After variable

344
NB_SUCCESS The number of messages successfully enqueued. This is an

NB_REJECT The number of messages rejected. This is an After variable


Usage

Related scenario
345
tAzureStorageQueuePurge
Purges messages in an Azure queue.
tAzureStorageQueuePurge Standard properties

These properties are used to configure tAzureStorageQueuePurge running in the Standard Job
framework.
The Standard tAzureStorageQueuePurge component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

drop-down list.
storage account.
Signatures (SAS).
346
SAS URI.
running your Job.
Queue name Specify the name of the Azure queue in which the messages
will be purged.
an error occurs.
Advanced settings
Global Variables

Usage

a Job or subJob.
Related scenario
347
tBarChart
tBarChart
Generates a bar chart from the input data to ease technical analysis.
tBarChart reads data from an input flow and transforms the data into a bar chart in a PNG image file.
tBarChart Standard properties

These properties are used to configure tBarChart running in the Standard Job framework.
The Standard tBarChart component belongs to the Business Intelligence family.
Basic settings
available:
only.
Note:
The schema of tBarChart contains three read-only
columns named series (string), category (string), and
value (integer) respectively, in a fixed order. The data
in any extra columns will be only passed to the next
component, if any, without being presented in the bar c
hart.

Guide.

Studio User Guide.
connection is linked with the output component.
348
tBarChart
Generated image path Name and path of the output image file.
Chart title Enter the title of the bar chart to be generated.
Include legend Select this check box if you want the bar chart to include a
legend, indicating all series in different colors.
3Dimensions Select this check box to create an image with 3D effect. By

default, this check box is selected and the bars representing
the series of each category will be stacked one over
another. If this check box is cleared, a 2D image will be
created, with the bars displayed one besides another along
the category axis.
Image width and Image height Enter the width and height of the image file, in pixels.
Category axis name and Value axis name Enter the category axis name and value axis name.
Foreground alpha Enter an integer in the range of 0 to 100 to define the

transparency of the image. The smaller the number you
enter, the more transparent the image will be.
Plot orientation Select the plot orientation of the bar chart: VERTICAL or
HORIZONTAL.
Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is mainly used as Output component. It

requires an Input component and Row main link as input.
349
tBarChart
Creating a bar chart from the input data

This scenario describes a Job that reads source data from a CSV file and transforms the data into a bar
chart showing a comparison of several large cities. The input file is shown below:
City;Population(x1000);LandArea(km2);PopulationDensity(people/km2)
Beijing;10233;1418;7620
Moscow;10452;1081;9644
Seoul;10422;605;17215
Tokyo;8731;617;14151
Jakarta;8490;664;12738
New York;8310;789;10452
Because the input file has a different structure than the one required by the tBarChart component,
this use case uses the tMap component to adapt the source data to the three-column schema of
tBarChart so that a temporary CSV file can be created as the input to the tBarChart component.
Note:
You will usually use the tMap component to adjust the input schema in accordance with the
schema structure of the tBarChart component. For more information about how to use the tMap
component, see Talend Studio User Guide and tMap on page 1983.
To ensure correct generation of the temporary input file, a pre-treatment subJob is used to delete the
temporary file in case it already exists before the main Job is executed; as this temporary file serves
this specific Job only, a post-treatment subJob is used to deleted it after the main Job is executed.

Procedure
1. Drop the following components from the Palette to the design workspace: a tPrejob, a tPostjob,
two tFileDelete components, two tFileInputDelimited components, a tMap, three tFileOutputDel
imited components, and a tBarChart.
2. Connect the tPrejob component to one tFileDelete component using a Trigger > On Component
Ok connection, and connect the tPostjob component to the other tFileDelete component using the
same type of connection.
3. Connect the first tFileInputDelimited to the tMap component using a Row > Main connection.
4. Connect the tMap component to the first tFileOutputDelimited component using a Row > Main
connection, and name the connection Population.
5. Repeat the step above to connect the tMap component to the other two tFileOutputDelimited
components using Row > Main connections, and name the connections Area and Density
respectively.
6. Connect the section tFileInputDelimited to the tBarChart component using a Row > Main
connection.
7. Connect the first tFileInputDelimited component to the second tFileInputDelimited component
using a Trigger > On Subjob Ok connection.
8. Relabel the components to best describe their functionality.
350
tBarChart
Results
Reading the source data

Procedure
1. Double-click the first tFileInputDelimited component, which is labelled Large_Cities, to display its
Basic settings view.
2. Fill in the File name field by browsing to the input file.

3. In the Header field, specify the number of header rows. In this use case, you have only one header
row.
4. Click Edit schema to describe the data structure of the input file. In this use case, the input
schema is made of four columns: City, Population, Area, and Density. Upon defining the column
names and data types, click OK to close the schema dialog box.
351
tBarChart
Adapting the source data to the tBarChart schema

Procedure
1. Double-click the tMap to open the Map Editor.
You can see an input table on the input panel, row1 in this example, and three empty output
tables, named Population, Area, and Density on the output panel.
2. Use the Schema editor to add three columns to each output table: series (string), category (string),
and value (integer).
3. In the relevant Expression field of the output tables, enter the text to be presented in the
legend area of the bar chart, "Population (x1000 people)", "Land area (km2)", and
"Population density (people/km2)" respectively in this example.
4. Drop the City column of the input table onto the category column of each output table.
5. Drop the Population column of the input table onto the value column of the Population table.
6. Drop the Area column of the input table onto the value column of the Area table.
7. Drop the Density column of the input table onto the value column of the Density table.
352
tBarChart
8. Click OK to save the mappings and close the Map Editor and propagate the output schemas to the
output components.
Generating the temporary input file

Procedure
1. Double-click the first tFileOutputDelimited component to display its Basic settings view.
2. In the File Name field, define a temporary CSV file to send the mapped data flows to. In this use
case, we name this file Temp.csv. This file will be used as the input to the tBarChart component.
3. Select the Append check box.
4. Repeat the steps above to define the properties of the other two tFileOutputDelimited
components, using exactly the same settings as in the first tFileOutputDelimited component.
353
tBarChart
Note:
Note that the order of output flows from the tMap component is not necessarily the actual
order of writing data to the target file. To ensure the target file is correctly generated, delete
the file by the same name if it already exists before Job execution and select the Append check
box in all the tFileOutputDelimited components in this step.
Configuring bar chart generation

Procedure
1. Double-click the second tFileInputDelimited component, which is labelled Temp_Input, to display
its Basic settings view.
2. Fill in the File name field with the path to the temporary input file generated by the
tFileOutputDelimited components. In this use case, the temporary input file to the tBarChart is
Temp.csv.
3. Double-click the tBarChart component to display its Basic settings view.
4. In the Generated image path field, define the file path of the image file to be generated.
5. In the Chart title field, define a title for the bar chart.
6. Define the category and series axis names.
354
tBarChart
7. Define the size and transparency degree of the image if needed. In this use case, we simply use
the default settings.
8. Click Edit schema to open the schema dialog box.
9. Copy all the columns from the output schema to the input schema by clicking the left-pointing
double arrow button. Then, click OK to close the schema dialog box.
Deleting the temporary file

About this task
As the tPrejob and tPostjob components simply trigger the connected subJobs and do not have any
settings to define, all you need to do is to define the properties of the two tFileDelete components.
Procedure
1. Double-click the first tFileDelete component to display its Basic settings view.
2. Fill in the File name field with the path to the temporary input file.
If the Fail on error check box is selected while the pre-treatment subJob fails because of errors
such as the file to delete does not exist, this failure will prevent the main subJob from being
launched. In this situation, you can clear the Fail on error check box to avoid this interruption.
355
tBarChart
3. Specify the same file path in the other tFileDelete component.
Executing the Job

Procedure
2. Press F6 to launch it.
A bar chart is generated, showing a graphical comparison of the specified large cities.
356
tBigQueryBulkExec
tBigQueryBulkExec
Transfers given data to Google BigQuery.
The tBigQueryOutputBulk and tBigQueryBulkExec components are generally used together as parts
of a two step process. In the first step, an output file is generated. In the second step, this file is used
to feed a dataset. These two steps are fused together in the tBigQueryOutput component, detailed
in a separate section. The advantage of using two separate components is that the data can be
transformed before it is loaded in the dataset.
This component transfers a given file from Google Cloud Storage to Google BigQuery, or uploads a
given file into Google Cloud Storage and then transfers it to Google BigQuery.
tBigQueryBulkExec Standard properties

These properties are used to configure tBigQueryBulkExec running in the Standard Job framework.
The Standard tBigQueryBulkExec component belongs to the Big Data family.
Basic settings

becomes built-in.

only.
• The Record type of BigQuery is not supported.

• The columns for table metadata such as the
Description column or the Mode column cannot be
retrieved.
• The Timestamp data from your BigQuery system is
formated to be String data.
357
tBigQueryBulkExec
• The numeric data of BigQuery is converted to

BigDecimal.
Authentication mode Select the mode to be used to authenticate to your project.

• OAuth 2.0: authenticate the access using OAuth
credentials. When selecting this mode, the parameters
to be defined in the Basic settings view are Client ID,
Client secret and Authorization code.
• Service account: authenticate using a Google account
that is associated with your Google Cloud Platform
project. When selecting this mode, the parameter to be
defined in the Basic settings view is Service account
credentials file.
Service account credentials file Enter the path to the credentials file created for the service
account to be used. This file must be stored in the machine
in which your Talend Job is actually launched and executed.
For further information about how to create a Google
service account and obtain the credentials file, see
Getting Started with Authentication from the Google
documentation.
Client ID and Client secret Paste the client ID and the client secret, both created and
viewable on the API Access tab view of the project hosting
the Google BigQuery service and the Cloud Storage service
you need to use.
To enter the client secret, click the [...] button next to the
client secret field, and then in the pop-up dialog box enter
the client secret between double quotes and click OK to
save the settings.
Project ID Paste the ID of the project hosting the Google BigQuery

service you need to use.
The ID of your project can be found in the URL of the
Google API Console, or by hovering your mouse pointer over
the name of the project in the BigQuery Browser Tool.
Authorization code Paste the authorization code provided by Google for the
access you are building.
To obtain the authorization code, you need to execute
the Job using this component and when this Job pauses
execution to print out an URL address, you navigate to this
address to copy the authorization code displayed.
Dataset Enter the name of the dataset you need to transfer data to.
Table Enter the name of the table you need to transfer data to.
If this table does not exist, select the Create the table if it
doesn't exist check box.
Action on data Select the action to be performed from the drop-down list
when transferring data to the target table. The action may
be:
• Truncate: it empties the contents of the table and
repopulates it with the transferred data.
• Append: it adds rows to the existing data in the table.
• Empty: it populates the empty table.
358
tBigQueryBulkExec
Bulk file already exists in Google storage Select this check box to reuse the authentication
information for Google Cloud Storage connection, then,
complete the File and the Header fields.
Access key and Secret key Paste the authentication information obtained from Google
for making requests to Google Cloud Storage.
settings.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project.
File to upload When the data to be transferred to Google BigQuery is not

stored on Google Cloud Storage, browse to, or enter the
path to it.
Bucket Enter the name of the bucket, the Google Cloud Storage
container, which holds the data to be transferred to Google
BigQuery.
File Enter the directory of the data stored on Google Cloud

Storage and to be transferred to Google BigQuery. This data
must be stored directly under the bucket root. For example,
enter gs://my_bucket/my_file.csv.
If the data is not on Google Cloud Storage, this directory
is used as the intermediate destination before the data is
transferred to Google BigQuery.
Header Set values to ignore the header of the transferred data. For
example, enter 0 to ignore no rows for the data without
header.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
token properties File Name Enter the path to, or browse to the refresh token file you
need to use.
At the first Job execution using the Authorization code you
have obtained from Google BigQuery, the value in this field
is the directory and the name of that refresh token file to
be created and used; if that token file has been created and
you need to reuse it, you have to specify its directory and
file name in this field.
With only the token file name entered, Talend Studio
considers the directory of that token file to be the root of
the Studio folder.
For further information about the refresh token, see the
manual of Google BigQuery.
Set the field delimiter Enter character, string or regular expression to separate
fields for the transferred data.
359
tBigQueryBulkExec
Drop table if exists Select the Drop table if exists check box to remove the
table specified in the Table field, if this table already exists.
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
component level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This is a standalone component.

This component automatically detects and supports both
multi-regional locations and regional locations. When using
the regional locations, the buckets and the datasets to be
used must be in the same locations.
Related Scenario
For related topic, see Writing data in Google BigQuery on page 371
360
tBigQueryInput
tBigQueryInput
Performs the queries supported by Google BigQuery.
This component connects to Google BigQuery and performs queries in it.
tBigQueryInput Standard properties

These properties are used to configure tBigQueryInput running in the Standard Job framework.
The Standard tBigQueryInput component belongs to the Big Data family.
Basic settings

becomes built-in.

only.

retrieved.
BigDecimal.

361
tBigQueryInput

credentials file.
documentation.
you need to use.
save the settings.

Use legacy SQL and Query Enter the query you need to use.
If the query to be used is the legacy SQL of BigQuery, select
this Use legacy SQL check box. For further information
about this legacy SQL, see Legacy SQL query reference from
the Google BigQuery documentation.
Result size Select the option depending on the volume of the query
result.
By default, the Small option is used, but when the query
result is larger than the maximum response size, you need
to select the Large option.
If the volume of the result is not certain, select Auto.
Advanced settings
need to use.
362
tBigQueryInput
the Studio folder.
Advanced Separator (for number) Select this check box to change the separator used for the
numbers.
docs.oracle.com.
Use custom temporary Dataset name Select this check box to use an existing dataset to which
you have access, instead of creating one, and in the field
that is displayed, enter the name of this dataset. This way,
you avoid rights and permissions issues related to dataset
creation.
Large from the Result size drop-down list in the Basic
settings tab.
component level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This is an input component. It sends the extracted data to

the component that follows it.
363
tBigQueryInput
Performing a query in Google BigQuery

This scenario uses two components to perform the SELECT query in BigQuery and present the result
in the Studio.
The following figure shows the schema of the table, UScustomer, we use as example to perform the
SELECT query in.
We will select the State records and count the occurrence of each State among those records.

Procedure
1. In the Integration perspective of Studio, create an empty Job, named BigQueryInput for example,
from the Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2. Drop tBigQueryInput and tLogRow onto the workspace.
3. Connect them using the Row > Main link.
364
tBigQueryInput
Creating the query

Building access to BigQuery
Procedure
1. Double-click tBigQueryInput to open its Component view.
2. Click Edit schema to open the editor
3. Click the button twice to add two rows and enter the names of your choice for each of them in
the Column column. In this scenario, they are: States and Count.
4. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog
box.
5. In the Authentication area, add the authentication information. In most cases, the Service account
mode is more straight-forward and easy to handle.
Authentication mode Description
Service account Authenticate using a Google account that is

associated with your Google Cloud Platform
project.
When selecting this mode, the Service account
credentials file field is displayed. In this field,
enter the path to the credentials file created
for the service account to be used. This file
365
tBigQueryInput

must be stored in the machine in which your
Talend Job is actually launched and executed.
For further information about how to
create a Google service account and obtain
the credentials file, see Getting Started
with Authentication from the Google
documentation.
OAuth 2.0 Authenticate the access using OAuth

credentials. When selecting this mode,
the parameters to be defined in the Basic
settings view are Client ID, Client secret and
Authorization code.
1. Navigate to the Google APIs Console in
your web browser to access the Google
project hosting the BigQuery and the
Cloud Storage services you need to use.
2. Click the API Access tab to open its view.
and copy Client ID, Client secret and
Project ID.
3. In the Component view of the Studio,
paste Client ID, Client secret and Project
ID from the API Access tab view to the
corresponding fields, respectively.
4. In the Run view of the Studio, click Run
to execute this Job. The execution will
pause at a given moment to print out in
the console the URL address used to get
the authorization code.
5. Navigate to this address in your web
browser and copy the authorization code
displayed.
6. In the Component view of tBigQueryOutpu
t, paste the authorization code in the
Authorization Code field.
Writing the query
Procedure
In the Query field, enter select States, count (*) as Count from documentation.
UScustomer group by States
366
tBigQueryInput
Executing the Job

About this task
The tLogRow component presents the execution result of the Job. You can configure the presentation
mode on its Component view.
To do this, double-click tLogRow to open the Component view and in the Mode area, select the Table
(print values in cells of a table) option.
Procedure
To execute this Job, press F6.
Results
Once done, the Run view is opened automatically, where you can check the execution result.
367
tBigQueryOutput
tBigQueryOutput
Transfers the data provided by its preceding component to Google BigQuery.
This component writes the data it receives in a user-specified directory and transfers the data to
Google BigQuery via Google Cloud Storage.
tBigQueryOutput Standard properties

These properties are used to configure tBigQueryOutput running in the Standard Job framework.
The Standard tBigQueryOutput component belongs to the Big Data family.
Basic settings

becomes built-in.

only.

retrieved.
BigDecimal.
Property type Built-In: You create and store the schema locally for this
component only.
368
tBigQueryOutput

Job designs.
Local filename Browse to, or enter the path to the file you want to write the
received data in.
Append Select this check box to add rows to the existing data in the
file specified in Local filename.

credentials file.
documentation.
you need to use.
save the settings.

Dataset Enter the name of the dataset you need to transfer data to.
Table Enter the name of the table you need to transfer data to.
If this table does not exist, select the Create the table if it
doesn't exist check box.
369
tBigQueryOutput
Action on data Select the action to be performed from the drop-down list
when transferring data to the target table. The action may
be:
• Truncate: it empties the contents of the table and
repopulates it with the transferred data.
• Append: it adds rows to the existing data in the table.
• Empty: it populates the empty table.
Access key and Secret key Paste the authentication information obtained from Google
for making requests to Google Cloud Storage.
settings.
tab view under the Google Cloud Storage tab of the project.
Bucket Enter the name of the bucket, the Google Cloud Storage
container, which holds the data to be transferred to Google
BigQuery.
File Enter the directory of the data stored on Google Cloud

Storage and to be transferred to Google BigQuery. This data
must be stored directly under the bucket root. For example,
enter gs://my_bucket/my_file.csv.
If the data is not on Google Cloud Storage, this directory
is used as the intermediate destination before the data is
transferred to Google BigQuery.
Note that this file name must be identical with the name of
the file specified in the Local filename field.
header and set 1 for the data with header at the first row.
rows.
Advanced settings
need to use.
the Studio folder.
370
tBigQueryOutput
Field Separator Enter character, string or regular expression to separate

Drop table if exists Select the Drop table if exists check box to remove the
table specified in the Table field, if this table already exists.
Create directory if not exists Select this check box to create the directory you defined in
the File field for Google Cloud Storage, if it does not exist.
Custom the flush buffer size Enter the number of rows to be processed before the
memory is freed.
Check disk space Select this check box to throw an exception during
execution if the disk is full.
docs.oracle.com.
component level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This is an output component used at the end of a Job.

It receives data from its preceding component such as
tFileInputDelimited, tMap or tMysqlInput.
Writing data in Google BigQuery

This scenario uses two components to write data in Google BigQuery.
371
tBigQueryOutput

Procedure
1. In the Integration perspective of Talend Studio , create an empty Job, named WriteBigQuery for
2. Drop tRowGenerator and tBigQueryOutput onto the workspace.
The tRowGenerator component generates the data to be transferred to Google BigQuery in this
scenario. In the real-world case, you can use other components such as tMysqlInput or tMap in the
place of tRowGenerator to design a sophisticated process to prepare your data to be transferred.
Preparing the data to be transferred

Procedure
1. Double-click tRowGenerator to open its Component view.
372
tBigQueryOutput
2. Click RowGenerator Editor to open the editor.

3. Click three times to add three rows in the Schema table.
4. In the Column column, enter the name of your choice for each of the new rows. For example,
fname, lname and States.
5. In the Functions column, select TalendDataGenerator.getFirstName for the fname row,
TalendDataGenerator.getLastName for the lname row and TalendDataGenerator.getUsState for the
States row.
6. In the Number of Rows for RowGenerator field, enter, for example, 100 to define the number of
rows to be generated.
7. Click OK to validate these changes.
Configuring the access to BigQuery and Cloud Storage

Building access to Cloud Storage
Procedure
1. Double-click tBigQueryOutput to open its Component view.
373
tBigQueryOutput
2. Click Sync columns to retrieve the schema from its preceding component.
3. In the Local filename field, enter the directory where you need to create the file to be transferred
to BigQuery.
4. Navigate to the Google APIs Console in your web browser to access the Google project hosting
the BigQuery and the Cloud Storage services you need to use.
5. Click Google Cloud Storage > Interoperable Access to open its view.
6. In Google storage configuration area of the Component view, paste Access key, Access secret from
the Interoperable Access tab view to the corresponding fields, respectively.
7. In the Bucket field, enter the path to the bucket you want to store the transferred data in. In this
example, it is talend/documentation
This bucket must exist in the directory in Cloud Storage
8. In the File field, enter the directory where in Google Clould Storage you receive and create the file
to be transferred to BigQuery. In this example, it is gs://talend/documentation/biquery_UScustom
er.csv. The file name must be the same as the one you defined in the Local filename field.
374
tBigQueryOutput
Troubleshooting: if you encounter issues such as Unable to read source URI of the file stored in
Google Cloud Storage, check whether you put the same file name in these two fields.
9. Enter 0 in the Header field to ignore no rows in the transferred data.
Building access to BigQuery
Procedure
1. In the Dataset field of the Component view, enter the dataset you need to transfer data in. In this
scenario, it is documentation.
This dataset must exist in BigQuery. The following figure shows the dataset used by this scenario.
2. In the Table field, enter the name of the table you need to write data in, for example, UScustomer.
3. In the Action on data field, select the action. In this example, select Truncate to empty the
contents, if there are any, of target table and to repopulate it with the transferred data.
4. In the Authentication area, add the authentication information. In most cases, the Service account
mode is more straight-forward and easy to handle.
Service account Authenticate using a Google account that is

associated with your Google Cloud Platform
project.
When selecting this mode, the Service account
credentials file field is displayed. In this field,
enter the path to the credentials file created
for the service account to be used. This file
must be stored in the machine in which your
Talend Job is actually launched and executed.
For further information about how to
create a Google service account and obtain
the credentials file, see Getting Started
with Authentication from the Google
documentation.
OAuth 2.0 Authenticate the access using OAuth

credentials. When selecting this mode,
the parameters to be defined in the Basic
settings view are Client ID, Client secret and
Authorization code.
1. Navigate to the Google APIs Console in
your web browser to access the Google
375
tBigQueryOutput

project hosting the BigQuery and the
Cloud Storage services you need to use.
2. Click the API Access tab to open its view.
3. In the Component view of the Studio,
paste Client ID, Client secret and Project
ID from the API Access tab view to the
corresponding fields, respectively.
In the Advanced settings tab, see the file
path in the token properties File Name
field. The Studio automatically generates
this file during the first successful login
and stores all future successful logins in it.
4. In the Run view of the Studio, click Run
to execute this Job. The execution will
pause at a given moment to print out in
the console the URL address used to get
the authorization code.
5. Navigate to this address in your web
browser and copy the authorization code
displayed.
6. In the Component view of tBigQueryOutpu
t, paste the authorization code in the
Authorization Code field.
5. If you have been using the OAuth 2.0 authentication mode, in the Action on data field, select
the action to be performed on your data. In this example, select Truncate to empty the contents,
if there are any, of target table and to repopulate it with the transferred data. If your are using
Service account, ignore this step.
If the table to be used does not exist in BigQuery, select Create the table if it doesn't exist.
Executing the Job

Procedure
Press F6.
Results
376
tBigQueryOutput
The data is transferred to Google BigQuery.
377
tBigQueryOutput
378
tBigQueryOutputBulk
tBigQueryOutputBulk
Creates a .txt or .csv file for the data of large size so that you can process it according to your
needs before transferring it to Google BigQuery.
The tBigQueryOutputBulk and tBigQueryBulkExec components are generally used together as parts
of a two step process. In the first step, an output file is generated. In the second step, this file is used
to feed a dataset. These two steps are fused together in the tBigQueryOutput component, detailed
in a separate section. The advantage of using two separate components is that the data can be
transformed before it is loaded in the dataset.
This component writes given data into a .txt or .csv file, ready to be transferred to Google
BigQuery.
tBigQueryOutputBulk Standard properties

These properties are used to configure tBigQueryOutputBulk running in the Standard Job framework.
The Standard tBigQueryOutputBulk component belongs to the Big Data family.
Basic settings

becomes built-in.

only.

retrieved.
379
tBigQueryOutputBulk

BigDecimal.
File name Browse, or enter the path to the .txt or .csv file you need to
generate.
Append Select the check box to write new data at the end of
the existing data. Otherwise, the existing data will be
overwritten.
Advanced settings

Create directory if not exists Select this check box to create the directory you defined in
the File field for Google Cloud Storage, if it does not exist.
Custom the flush buffer size Enter the number of rows to be processed before the
memory is freed.
Check disk space Select the this check box to throw an exception during
execution if the disk is full.
docs.oracle.com.
component level/
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This is an output component which needs the data provided
by its preceding component.
380
tBigQueryOutputBulk

Related Scenario
For related topic, see Writing data in Google BigQuery on page 371
381
tBigQuerySQLRow
tBigQuerySQLRow
Connects to Google BigQuery and performs queries to select data from tables row by row or create or
delete tables in Google BigQuery.
tBigQuerySQLRow Standard properties

These properties are used to configure tBigQuerySQLRow running in the Standard Job framework.
The Standard tBigQueryInput component belongs to the Big Data family.
Basic settings
available:
only.
component only.

Job designs.

credentials file.
382
tBigQuerySQLRow

documentation.
you need to use.
save the settings.

Use legacy SQL and Query Enter the query you need to use.
If the query to be used is the legacy SQL of BigQuery, select
this Use legacy SQL check box. For further information
about this legacy SQL, see Legacy SQL query reference from
the Google BigQuery documentation.
Advanced settings
need to use.
the Studio folder.
Advanced Separator (for number) Select this check box to change the separator used for the
numbers.
docs.oracle.com.
383
tBigQuerySQLRow
Result size Select the option depending on the volume of the query
result.
By default, the Small option is used, but when the query
result is larger than the maximum response size, you need
to select the Large option.
If the volume of the result is not certain, select Auto.
component level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule It can be a starting or an end component. When starting

a Job, it sends the extracted data to the component that
follows it; When ending a Job, it deletes a given table.
384
tBonitaDeploy
tBonitaDeploy
Deploys a specific Bonita process to a Bonita Runtime.
This component configures any Bonita Runtime engine and deploys a specific Bonita process (a .bar
file exported from the Bonita solution) to this engine.
tBonitaDeploy Standard properties

These properties are used to configure tBonitaDeploy running in the Standard Job framework.
The Standard tBonitaDeploy component belongs to the Business family.
Basic settings
Bonita version Select a version number for the Bonita Runtime engine.
Bonita Runtime Environment File Browse to, or enter the path to the Bonita Runtime
environment file.
Note:
This field is displayed only when you select Bonita
version 5.3.1 from the Bonita version list.
Bonita Runtime Home Browse to, or enter the path to the Bonita Runtime
environment directory.
Note:
Bonita Runtime Jaas File Browse to, or enter the path to the Bonita Runtime jaas file.
Bona Runtime logging file Browse to, or enter the path to the Bonita Runtime logging
file.
Login Module Type in the name of login module for logging in Bonita
Runtime engine which is defined in the Bonita Runtime jaas
file.
Business Archive Browse to, or enter the path to the Bonita process .bar file
you want to use.
User name Type in your user name used to log in Bonita studio.
Password Type in your password used to log in Bonita studio.

settings.
385
tBonitaDeploy
rows.
Advanced settings
Global Variables
Global Variables ProcessDefinitionUUID: the identifier number of the process

being deployed. This is a Flow variable and it returns a
string.
check box.
use from it.
User Guide.
Usage
Usage rule Usually used as a stand-alone component.

To use this component, you have to manually download the
Bonita solution you need to use.
Connections Outgoing links (from this component to another):

Trigger: Run if; On Component Ok; On Component Error, On
Subjob Ok, On Subjob Error.
Incoming links (from one component to this one):

Trigger: Run if, On Component Ok, On Component Error, On
Subjob Ok, On Subjob Error
For further information regarding connections, see

Connection types in Talend Studio User Guide.
Limitation The Bonita Runtime environment file, the Bonita Runtime

jaas file and the Bonita Runtime logging file must be
all stored on the excution server of the Job using this
component.
Related Scenario
For related topic, see Executing a Bonita process via a Talend Job on page 390.
386
tBonitaInstantiateProcess
Starts an instance for a specific process deployed in a Bonita Runtime engine.
This component instantiates a process already deployed in a Bonita Runtime engine.
tBonitaInstantiateProcess Standard properties

These properties are used to configure tBonitaInstantiateProcess running in the Standard Job
framework.
The Standard tBonitaInstantiateProcess component belongs to the Business family.
Basic settings
The schema of this component is read-only. You can click
Edit schema to view the schema.
In this component the schema is related to the Module
selected.
Note:
The ProcessInstanceUUID column is pre-defined in the
schema of this component, reserved for the identifier
number of the process instance being created.
Bonita Client Mode Select the client mode you want to use to instantiate a
Bonita process.
For more information about all the Bonita client modes, see
Bonita's manuals.
URL Enter the URL of the Bonita Web application server you
need to access for the process instantiation.
This field is available only in the HTTP client mode.
Auth Username and Auth Password Enter the authentication details used to connect to the
Bonita Web application server as technical user.
settings.
The default authentication information is provided in these
fields. For further information about them, see Bonita's
manuals.
These fields are available only in the HTTP client mode.
Bonita version Select the version number of the Bonita Runtime engine to
be used.
387
This field is available only in the Java client mode.
Bonita Runtime Environment File Browse to, or enter the path to the Bonita Runtime
environment file.
Note:
Bonita Runtime Home Browse to, or enter the path to the Bonita Runtime
environment directory.
Note:
Bonita Runtime Jaas File Browse to, or enter the path to the Bonita Runtime jaas file.
Bonita Runtime logging file Browse to, or enter the path to the Bonita Runtime logging
file.
Use Process ID Select this check box to instantiate an existing process.

Once checked, the Process definition ID field is activated in
which you can enter the Definition ID of this process
Note:
The process definition ID is created when the process is
deployed into the Bonita Runtime engine.
Process Name and Process Version Enter the ID information of a specific process you want
to instantiate. This information is used to automatically
generate the ID of this process.
This field is available in both of the Java client mode and
the HTTP client mode.
User name Type in your user name used to instantiate this process.
This filed is available in both of the Java client mode and
the HTTP client mode.
Password Type in your password used to instantiate this process.

settings.
rows.
388
Advanced settings
Global Variables
Global Variables ProcessInstanceUUID: the identifier number of the process

instance being created. This is a Flow variable and it returns
a string. It can also be retrieved over the Row> Main output
link.
check box.
use from it.
User Guide.
Usage
Usage rule Usually used as a stand-alone component or as an output

component.
To use this component, you have to manually download the
Bonita solution you need to use.

Row: Main (providing the output parameters from this
process)
Trigger: Run if; On Component Ok; On Component Error, On
Subjob Ok, On Subjob Error.

Row: Main (providing the input parameters to this process)
Trigger: Run if, On Component Ok, On Component Error, On
Subjob Ok, On Subjob Error
For further information regarding connections, see

Connection types in Talend Studio User Guide.
Limitation The Bonita Runtime environment file, the Bonita Runtime

jaas file and the Bonita Runtime logging file must be
all stored on the execution server of the Job using this
component.
389
Executing a Bonita process via a Talend Job

This scenario describes a Job that deploys a Bonita process into the Bonita Runtime and executes this
process, in which a personnel request is treated.
The Job in this scenario uses three components.

• tBonitaDeploy: this component deploys a Bonita process into the Bonita Runtime.
• tFixedFlowInput: this component generates the schema used as execution parameters of this
deployed process.
• tBonitaInstantiateProcess: this component executes this deployed process.
Before beginning to replicate this schema, prepare your Bonita.bar file. You need to manually export
this file from the Bonita system and then deploy it into the Bonita Runtime engine, using, for example,
tBonitaDeploy as presented later in this scenario. In this scenario, this file is TEST--4.0.bar. Once
deployed, this process can be checked via the Bonita interface.
Setting up the Job

Procedure
1. Drop tBonitaDeploy, tFixedFlowInput and tBonitaInstantiateProcess onto the design workspace.
2. Right-click tBonitaDeploy and connect tBonitaDeploy to tFixedFlowInput using a Trigger> On
Subjob Ok connection.
3. Right-click tFixedFlowInput and connect this component to tBonitaInstantiateProcess using a
Row > Main connection.
390
Configuring the deployment of the process

About this task
To replicate this scenario, proceed as follows:
Procedure
1. Double-click tBonitaDeploy to open its Basic settings view.
2. Select Bonita version 5.3.1 from the Bonita version list. The version you select should be in sync
with the version number of the Bonita Runtime engine you are using.
3. In the Bonita Runtime Configuration area, browse to the Bonita Runtime variable files. In the
Bonita Runtime Environment file field, browse to the bonita-environnement.xml file; in the Bonita
Runtime Jaas File field, browse to the jaas-standard.cfg file; in the Bonita Runtime Logging File
field, browse to the logging.properties file.
For users based on Bonita version 5.2.3, only the Bonita Runtime Jaas File field and the Bonita
Runtime Logging File field need to be filled.
For users based on Bonita version 5.6.1, in the Bonita Runtime Home field, browse to the Bonita
Runtime environment directory.
391
4. In the Business Archive field, browse to the Bonita .bar file that is the process exported from your
Bonita system and will be deployed into the Bonita Runtime engine.
5. In the Username and the Password fields, type in your authentication information to connect to
your Bonita.
Configuring the input flow

Procedure
2. Click the three-dot button next to Edit schema to open the schema editor.
392
3. Click the plus button to add one row and rename it as Name.
This name is identical with the parameter set in Bonita to execute the same process. This way,
Bonita can recognize this column as valid parameter and read its value to instantiate this process.
4. Click OK.
5. In the Mode area of the Basic settings view, select the Use inline table option and click the plus
button to add one row in the table.
6. In the inline table, click the added row and type in the person's name from your personnel
between the quotation marks: ychen, whose request will be treated by this deployed process.
Configuring the Basic settings of tBonitaInstantiateProcess

Procedure
1. Double-click tBonitaInstantiateProcess to open its Basic settings view.
2. Select Bonita version 5.3.1 from the Bonita version list. The version you select should be in sync
with the version number of the Bonita Runtime engine you are using.
3. In the Bonita Runtime Configuration area, browse to the Bonita Runtime variable files. In the
Bonita Runtime Environment file field, browse to the bonita-environnement.xml file; in the Bonita
Runtime Jaas File field, browse to the jaas-standard.cfg file; in the Bonita Runtime Logging File
field, browse to the logging.properties file.
For users based on Bonita version 5.2.3, only the Bonita Runtime Jaas File field and the Bonita
Runtime Logging File field need to be filled.
393
For users based on Bonita version 5.6.1, in the Bonita Runtime Home field, browse to the Bonita
Runtime environment directory.
4. Select the Use Process ID check box to activate the Process Definition Id field.
5. In the Process Definition Id field, click between the quotation marks and press Ctrl+space to open
the auto-completion drop-down list containing the available global variables for this Job.
6. Double-click the variable you need use to add it between the quotation marks. In this scenario,
double-click tBonitaDeploy_1_ProcessDefinitionUUID, which retrieves the process definition ID of
the process being deployed by tBonitaDeploy.
Note:
You can as well clear the Use Process ID check box to activate the Process name and the
Process version fields and enter the corresponding information in the two fields. tBonitaInstant
iateProcess concatenates the process name and the process version you type in to construct the
process definition ID.
7. In the Username and Password fields, enter the username and password to connect to your Bonita.
394
Executing the Job

Procedure
Press F6 to run the Job.
Results
This process is deployed into the Bonita Runtime and an instance is created for the personnel
requests.
Outputting the process instance UUID over the Row > Main
link
This scenario deploys a Bonita process into the Bonita Runtime, starts an instance and outputs the
process instance UUID via the Row > Main link.

Procedure
1. Drop tBonitaDeploy, tBonitaInstantiateProcess and tLogRow onto the workspace.
2. Rename tBonitaDeploy as deploy_process, tBonitaInstantiateProcess as start_instance and
tLogRow as show_instance_uuid.
3. Link tBonitaDeploy to tBonitaInstantiateProcess using the OnSubjobOk trigger.
4. Link tBonitaInstantiateProcess to tLogRow using a Row > Main connection.
395

Procedure
1. Double-click tBonitaDeploy to open its Basic settings view.
2. In the Bonita Runtime Jaas File field, specify the path and name of the jaas file.
In the Bonita Runtime Logging File field, specify the path and name of the logging file.
In the Business Archive field, specify the path and name of the Bonita process.
3. In the Username and Password fields, enter the user authentication credentials.
4. Double-click tBonitaInstantiateProcess to open its Basic settings view.
5. In the Bonita Runtime Jaas File field, specify the path and name of the jaas file.
In the Bonita Runtime Logging File field, specify the path and name of the logging file.
6. In the Process Name and Process Version fields, enter the process information.
7. In the Username and Password fields, enter the user authentication credentials.
9. In the Mode area, select Table (print values in cells of a table for better display.
396
Executing the Job

Procedure
1. Press Ctrl+S to save the Job.
As shown above, the instance is created and the UUID is output.
397
tBoxConnection
tBoxConnection
Creates a Box connection that the other Box components can reuse.
This component creates the connection to a given Box account.
tBoxConnection Standard properties

These properties are used to configure tBoxConnection running in the Standard Job framework.
The Standard tBoxConnection component belongs to the Cloud family.
Basic settings
Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.
Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.
Access token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.
Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.
Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.
Advanced settings
level.
Global Variables

check box.
398
tBoxConnection

use from it.
User Guide.
Usage
Usage rule This component is used standalone as a subJob to create

the Box connection to be used. In a Job design, it is often
connected to the other Box components using the Trigger
links such as OnSubjobOk link.
Related scenario
For a related scenario, see Uploading and downloading files from Box on page 411.
399
tBoxCopy
tBoxCopy
Copies or moves a given folder or file from Box.
tBoxCopy Standard properties

These properties are used to configure tBoxCopy running in the Standard Job framework.
The Standard tBoxCopy component belongs to the Cloud family.
Basic settings
Use existing connection Select this check box and in the Component List click the
Connection/Client Key Enter the client key required by Box to access the Box API.
Connection/Client Secret Enter the client secret required by Box to access the Box
Connection/Access Token Enter the access token required by Box to access a Box
Connection/Refresh Token Enter the refresh token required by Box to refresh the
Connection/Use HTTP proxy If you are using a proxy, select this check box and enter the
Move Directory Select this check box to move a directory in Box.
Copy Directory Select this check box to copy a directory in Box.
File Name Enter file name with the path in Box you want to copy.
Source Directory This option appears when the Move Directory or Copy
Directory check box is selected. Enter the source directory
in Box to be moved or copied.
400
tBoxCopy
Destination Directory Enter the destination directory in Box where the specified
file or directory will be copied or moved.
Rename Select this check box to rename the file or directory to be

copied. When copying a file, specify the new file name in
the Destination File Name field. When copying a directory,
enter the new directory name in the New Directory Name
field.
Remove Source File Select this check box to remove the source file during the
copy action.
Note that the schema of this component is read-only with
four columns named destinationFilePath, destinationFil
eName, sourceDirectory, and destinationDirectory.
Advanced settings
level.
Global Variables

check box.
DESTINATION_FILENAME: the destination file name. This is
DESTINATION_FILEPATH: the destination file path. This is
SOURCE_DIRECTORY: the source directory. This is an After
DESTINATION_DIRECTORY: the destination directory. This is
use from it.
User Guide.
Usage
Usage rule This component is usually used standalone in a subJob to

copy or move data from Box.
401
tBoxCopy
Related scenarios
402
tBoxDelete
tBoxDelete
Removes a given folder or file from Box.
This component connects to a given Box account and removes a specified file or folder.
tBoxDelete Standard properties

These properties are used to configure tBoxDelete running in the Standard Job framework.
The Standard tBoxDelete component belongs to the Cloud family.
Basic settings
Path Enter the path on Box pointing to the folder or the file you
need to remove.
one column named filepath.
403
tBoxDelete
Advanced settings
level.
Global Variables

check box.
REMOVED_PATH: the path of the folder or file being deleted
on Box. This is a Flow variable and it returns a string.
use from it.
User Guide.
Usage

remove data from Box.
Related scenarios
404
tBoxGet
tBoxGet
Downloads a selected file from a Box account.
This component connects to a given Box account and downloads files to a specified local directory.
tBoxGet Standard properties

These properties are used to configure tBoxGet running in the Standard Job framework.
The Standard tBoxGet component belongs to the Cloud family.
Basic settings
Path Enter the path on Box pointing to the file you need to
download.
Save as file Select this check box to display the Save To field and br
owse to, or enter the local directory where you want to store
the downloaded file. The existing file, if any, is replaced.
405
tBoxGet

two columns named fileName and content.
The Schema field is not available when you have selected
the Save as file check box.
Advanced settings
level.
Global Variables

check box.
FILE_NAME: the name of the file being processed. This is a
Flow variable and it returns a string.
INPUT_STREAM: the content of the file being fetched. This
is a Flow variable and it returns an InputStream.
use from it.
User Guide.
Usage
Usage rule This component can be used alone or along with other
components via the Iterate link or a trigger link such as
OnSubjobOk.
Related scenario
For a related scenario, see Uploading and downloading files from Box on page 411.
406
tBoxList
tBoxList
Lists the files stored in a specified directory in Box.
This component reads the file(s) in Box held in the directory you specify and lists the metadata and
the contents of that file or those files.
tBoxList Standard properties

These properties are used to configure tBoxList running in the Standard Job framework.
The Standard tBoxList component belongs to the Cloud family.
Basic settings
Path Enter the path pointing to the folder you need to list the
files from, or enter the path pointing to the exact file you
need to read.
List type Select the type of data you need to list from the specified
path, Files, Folders, or Both.
Include subdirectories Select this check box to list files from any existing sub-
folders in addition to the files in the directory defined in
the Path field.
407
tBoxList

six columns named name, path, lastModified, size, id, and
type.
Advanced settings
level.
Global Variables

check box.
NAME: the name of the remote file being processed. This is
a Flow variable and it returns a string.
FILE_PATH: the path pointing to the folder or the file being
processed on Box. This is a Flow variable and it returns a
string.
FILE_DIRECTORY: the directory of the folder or the file
being processed on Box. This is a Flow variable and it
returns a string.
LAST_MODIFIED: the timestamp of the last modification
of the file being processed. This is a Flow variable and it
returns a long.
SIZE: the volume of the file being processed. This is a Flow
variable and it returns a long.
ID: the ID of the folder or the file being processed on Box.
This is a Flow variable and it returns a string.
TYPE: the type of the objects being processed on Box, file or
folder. This is a Flow variable and it returns a string.
use from it.
User Guide.
Usage
Usage rule This component is typically used standalone.
Related scenarios
408
tBoxPut
tBoxPut
Uploads files to a Box account.
This component uploads data to Box from either a local file or a given data flow.
tBoxPut Standard properties

These properties are used to configure tBoxPut running in the Standard Job framework.
The Standard tBoxPut component belongs to the Cloud family.
Basic settings
Remote Path Enter the path pointing to the file you need to write
contents in. This file will be created on the fly if it does not
exist.
Replace if Existing Select this check box to use the uploaded file to replace the
existing one.
Upload mode Select the upload mode to be used:
409
tBoxPut
• Upload incoming content as file: Select this radio

button to read data directly from the input flow of the
preceding component and write the data into the file
specified in the Remote Path field.
• Upload local file: Select this radio button to upload
a locally stored file to Box. In the File field that is
displayed, you need to enter the path or browse to
this file.
• Expose as OutputStream: Select this check box to
expose the output stream of this component, which
can be used by the other components to write the file
content. For example, you can use the Use output
stream feature of the tFileOutputDelimited component
to feed a given tBoxPut's exposed output stream. For
further information, see tFileOutputDelimited on page
1113.
a single column named content and it receives data from
the content column of its input schema only. This means
that you must use a content column in the input data flow
to carry the data to be uploaded. This type of column is typ
ically provided by the tFileInputRaw component. For further
information, see tFileInputRaw on page 1085.
the Expose as OutputStream or the Upload local file upload
mode.
Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
410
tBoxPut
Usage
Usage rule This component is used either standalone in a subJob to

directly upload a local file to Box or as an end component
of a Job flow to upload given data being handled in this
flow.
Uploading and downloading files from Box

In this scenario, a three-component Job consisting of three subJobs is created to upload a file to Box
and then download a file from Box to the local file system.
Before replicating this scenario, you need to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used. For more information about Box App, see
https://app.box.com/developers/services/edit/. The client key and client secret can be obtained from
the account application settings. For how to get the access token and refresh token, check the Box
documentation you can access from https://developers.box.com/.

Procedure
1. In the Integration perspective of the Studio, create an empty Job from the Job Designs node in
the Repository tree view.
For further information about how to create a Job, see Talend Studio User Guide.
2. In the workspace, enter the name of the component to be used and select this component from
the list that opens. In this scenario, the components are tBoxConnection, tBoxPut and tBoxGet.
3. Connect tBoxConnection to tBoxPut using the Trigger > OnSubjobOk link.
4. Connect tBoxPut to tBoxGet using the Trigger > OnSubjobOk link.

Procedure
1. Double-click tBoxConnection to open its Component view.
411
tBoxPut
2. Enter the client key, client secret, access token and refresh token in double quotation marks in
the relevant fields for accessing the Box account.
3. Double-click tBoxPut to open its Component view.
4. Select the Use Existing Connection check box to reuse the connection created by tBoxConnection.
In the Remote Path field, enter the destination path where you want to upload the file.
In the Upload mode area, select Upload Local File. In the File field, enter the file path or browse to
the file you want to upload.
5. Double-click tBoxGet to open its Component view.
6. Select the Use Existing Connection check box to reuse the connection created by tBoxConnection.
In the Path field, enter the path of the file that you want to download.
Select the Save As File check box. In the Save To field, enter the file path where to save the file
on the local file system.
7. Save the Job.
Executing the Job

Execute the Job by pressing F6 or clicking the Run button on the Run tab.
The local file, hello.txt in this example, is uploaded in your Box.
412
tBoxPut
The file box.txt from Box is downloaded to the local file system.
413
tBufferInput
tBufferInput
Retrieves data bufferized via a tBufferOutput component, for example, to process it in another
subJob.
This component retrieves bufferized data in order to process it in a second subJob.
tBufferInput Standard properties

These properties are used to configure tBufferInput running in the Standard Job framework.
The Standard tBufferInput component belongs to the Misc family.
Basic settings
Repository.
available:
only.
In the case of tBufferInput, the column position is more
important than the column label as this will be taken into
account.
Built-in: You create the schema and store it locally for this
Guide.

it in the Repository, hence can be reused in various projects
and Job designs. Related topic: see Talend Studio User Guide.
Global Variables

414
tBufferInput

check box.
use from it.
User Guide.
Usage

which is triggered automatically at the end of the main Job.
Retrieving bufferized data

This scenario describes a Job that retrieves bufferized data from a subJob and displays it on the
console.
• Drop the following components from the Palette onto the design workspace: tFileInputDelimited
and tBufferOutput.
• Select the tFileInputDelimited and on the Basic Settings tab of the Component view, set the
access parameters to the input file.
• In the File Name field, browse to the delimited file holding the data to be bufferized.
• Define the Row and Field separators, as well as the Header.
415
tBufferInput
• Click [...] next to the Edit schema field to describe the structure of the file.
• Describe the Schema of the data to be passed on to the tBufferOutput component.

• Select the tBufferOutput component and set the parameters on the Basic Settings tab of the
Component view.
Note:
Generally speaking, the schema is propagated from the input component and automatically fed into
the tBufferOutput schema. But you can also set part of the schema to be bufferized if you want to.
• Drop the tBufferInput and tLogRow components from the Palette onto the design workspace
below the subJob you just created.
• Connect tFileInputDelimited and tBufferInput via a Trigger > OnSubjobOk link and connect
tBufferInput and tLogRow via a Row > Main link.
• Double-click tBufferInput to set its Basic settings in the Component view.
• In the Basic settings view, click [...] next to the Edit Schema field to describe the structure of the
file.
• Use the schema defined for the tFileInputDelimited component and click OK.
• The schema of the tBufferInput component is automatically propagated to the tLogRow.
Otherwise, double-click tLogRow to display the Component view and click Sync column.
The standard console returns the data retrieved from the buffer memory.
416
tBufferOutput
tBufferOutput
Collects data in a buffer in order to access it later via webservice for example.
tBufferOutput has been designed to be exported as Webservice in order to access data on the web
application server directly. For more information, see Talend Studio User Guide.
tBufferOutput Standard properties

These properties are used to configure tBufferOutput running in the Standard Job framework.
The Standard tBufferOutput component belongs to the Misc family.
Basic settings
Repository.
available:
only.
In the case of the tBufferOutput, the column position is
more important than the column label as this will be taken
into account.

Guide.

Global Variables

check box.
417
tBufferOutput

use from it.
User Guide.
Usage
Usage rule This component is not startable (green background) and it

Buffering data
This scenario describes an intentionally basic Job that bufferizes data in a child job while a parent Job
simply displays the bufferized data onto the standard output console. For an example of how to use
tBufferOutput to access output data directly on the Web application server, see Buffering output data
on the webapp server on page 421.
• Create two Jobs: a first Job (BufferFatherJob) runs the second Job and displays its content onto the
Run console. The second Job (BufferChildJob) stores the defined data into a buffer memory.
• On the first Job, drop the following components: tRunJob and tLogRow from the Palette to the
design workspace.
• On the second Job, drop the following components: tFileInputDelimited and tBufferOutput the
same way.
Let's set the parameters of the second Job first:
• Select the tFileInputDelimited and on the Basic Settings tab of the Component view, set the
access parameters to the input file.
418
tBufferOutput
• In File Name, browse to the delimited file whose data are to be bufferized.
• Define the Row and Field separators, as well as the Header.
• Describe the Schema of the data to be passed on to the tBufferOutput component.

• Select the tBufferOutput component and set the parameters on the Basic Settings tab of the
Component view.
• Generally the schema is propagated from the input component and automatically fed into the
tBufferOutput schema. But you could also set part of the schema to be bufferized if you want to.
• Now on the other Job (BufferFatherJob) Design, define the parameters of the tRunJob component.
• Edit the Schema if relevant and select the column to be displayed. The schema can be identical to
the bufferized schema or different.
• You could also define context parameters to be used for this particular execution. To keep it
simple, the default context with no particular setting is used for this use case.
Press F6 to execute the parent Job. The tRunJob looks after executing the child Job and returns the
data onto the standard console:
419
tBufferOutput
Buffering data to be used as a source system

This scenario describes a Job that buffers data to be used as a source system by MDM.
An MDM process will invoke this Job to retrieve data by looking up the defined elements (agent region
values) from the buffered data. The process can then display the retrieved data in the Talend MDM
Web User Interface without really saving them in the MDM hub.
Creating a data buffer Job

Procedure
1. Create a Job named DetermineRegion.
2. Drop the following components from the Palette onto the design workspace: tJava,
tFixedFlowInput, and tBufferOutput.
3. Connect tJava to tFixedFlowInput using a Trigger > On Component Ok link.
4. Connect tFixedFlowInput to tBufferOutput using a Row > Main link.
Configuring the Job to buffer data

Procedure
1. In the Contexts view, add a new context variable with the Name of xmlInput and the Type of
String.
In this example, the context variable xmlInput of the Job will be specified in the MDM process
which wants to invoke this Job.
You can search for further information about MDM processess on Talend Help Center (https://
help.talend.com).
420
tBufferOutput
If you cannot find the Contexts view, go to Window > Show view > Talend, and select Contexts.
For more information about how to define context variables, see Talend Studio User Guide.
You can search for further information about how to define context variables on Talend Help
Center (https://help.talend.com).
2. Double-click the tJava component to open its Component view, and in the Code area, enter the
code according to your needs.
In this example, enter System.out.println("######################
#######"+context.xmlInput);.
3. Double-click the tFixedFlowInput component to open its Component view.
4. Click the [...] button next to Edit schema to open the dialog box and define the schema for the
data to be used by the source system.
In this example, add one new column col0 of the type String.
5. After the schema is defined, click Yes in the Propagate dialog box to propagate the schema
changes to the following component tBufferOutput.
6. In the Number of rows field, enter 1.
7. In the Mode area, select Use Single Table and enter "Paris" in the Value column that
corresponds to the column col0 you have defined.
In this example, the value of the col0 provides the agent region information to be retrieved by
MDM.
8. Double-click the tBufferOutput component to open its Component view, and then make sure its
schema is synchronized with the previous component tFixedFlowInput.
9. Run the Job and make sure the execution succeeds.
Buffering output data on the webapp server

This scenario describes a Job that is called as a Webservice and stores the output data in a buffer
directly on the server of the Web application. This scenario creates first a Webservice oriented Job
with context variables, and next exports the Job as a Webservice.
Creating a Job
Procedure
1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput and
tBufferOutput.
2. Connect tFixedFlowInput to tBufferOutput using a Row Main link.
421
tBufferOutput
Creating a context variable

About this task
For this scenario, you will define two context variables: nb_lines and lastname. The first variable will
set the number of lines the tFixedFlowInput component will generate, and the second one will set
the last name to display in the output list. For more information about how to create and use context
variables, see Talend Studio User Guide.
To define the two context variables:
Procedure
1. Select the Contexts tab view of your Job, and click the [+] button at the bottom of the view to add
two variables, respectively nb_lines of type Integer and lastname of type String.
2. In the Value field for the variables, set the last name to be displayed and the number of lines to
be generated, respectively Ford and 3 in this example.
Configuring the input data

Procedure
1. In the design workspace, select tFixedFlowInput.
2. Click the Component tab to define the basic settings for tFixedFlowInput.
3. Click the three-dot [...] button next to Edit Schema to describe the data structure you want to
create from internal variables. In this scenario, the schema is made of three columns, now of type
Date, firstname of type String, and lastname of type String.
422
tBufferOutput
4. Click OK to close the dialog box and accept propagating the changes when prompted by the
system. The three defined columns display in the Values panel of the Basic settings view of
tFixedFlowInput.
5. Click in the Value cell of each of the first two defined columns and press Ctrl+Space to access the
global variable list.
6. From the global variable list, select Talend Date.getCurrentDate() and talendDatagenerator.getFirst
Name, for the now and firstname columns respectively.
7. Click in the Value cell of lastname column and press Ctrl+Space to access the global variable list.
8. From the global variable list, select context.lastname, the context variable you created for the last
name column.
Building your Job as a Webservice

About this task
Before building your Job as a Web service, see Talend Studio User Guide for more information.
Procedure
1. In the Repository tree view, right-click on the above created Job and select Build Job. The Build
Job dialog box appears.
423
tBufferOutput
2. Click the Browse... button to select a directory to archive your Job in.
3. In the Build type panel, select the build type you want to use in the Tomcat webapp directory
(WAR in this example) and click Finish. The Build Job dialog box disappears.
4. Copy the War folder and paste it in a Tomcat webapp directory.
Calling a Job with context variables from a browser

This scenario describes how to call the Job you created in Buffering output data on the webapp server
on page 421 from your browser with/without modifying the values of the context variables.
Type the following URL into your browser: http://localhost:8080//export_job/services/export_job3?
method=runJob where "export_job" is the name of the webapp directory deployed in Tomcat and
"export_job3" is the name of the Job.
Click Enter to execute your Job from your browser.
424
tBufferOutput
The Job uses the default values of the context variables: nb_lines and lastname, that is it generates
three lines with the current date, first name and Ford as a last name.
You can modify the values of the context variables directly from your browser. To call the Job from
your browser and modify the values of the two context variables, type the following URL:
http://localhost:8080//export_job/services/export_job3?method=runJob&arg1=--context_param%20lastna
me=MASSY&arg2=--context_param%20nb_lines=2.
%20 stands for a blank space in the URL language. In the first argument "arg1", you set the value
of the context variable to display "MASSY" as last name. In the second argument "arg2", you set the
value of the context variable to "2" to generate only two lines.
Click Enter to execute your Job from your browser.
425
tBufferOutput
The Job generates two lines with MASSY as last name.
Calling a Job exported as Webservice in another Job

This scenario describes a Job that calls another Job exported as a Webservice using the
tWebServiceInput. This scenario will call the Job created in Buffering output data on the webapp
server on page 421.
• Drop the following components from the Palette onto the design workspace: tWebServiceInput
and tLogRow.
• Connect tWebserviceInput to tLogRow using a Row Main link.
• In the design workspace, select tWebServiceInput.

• Click the Component tab to define the basic settings for tWebServiceInput.
• Set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema to
describe the data structure you want to call from the exported Job. In this scenario, the schema is
made of three columns, now, firstname, and lastname.
426
tBufferOutput
• Click the plus button to add the three parameter lines and define your variables. Click OK to close
the dialog box.
• In the WSDL field of the Basic settings view of tWebServiceInput, enter the URL http://localho
st:8080/export_job/services/export_job3?WSDL where "export_job" is the name od the webapp
directory where the Job to call is stored and "export_job3" is the name of the Job itself.
• In the Method name field, enter runJob.

• In the Parameters panel, Click the plus button to add two parameter lines to define your context
variables.
• Click in the first Value cell to enter the parameter to set the number of generated lines using the
following syntax: --context_param nb_line=3.
• Click in the second Value cell to enter the parameter to set the last name to display using the
following syntax: --context_param lastname=Ford.
• Select tLogRow and click the Component tab to display the component view.
• Set the Basic settings for the tLogRow component to display the output data in a tabular mode.
For more information, see tLogRow on page 1977.
427
tBufferOutput
The system generates three columns with the current date, first name, and last name and displays
them onto the log console in a tabular mode.
428
tCassandraBulkExec
tCassandraBulkExec
Improves performance during Insert operations to a Cassandra column family.
The tCassandraOutputBulk and tCassandraBulkExec components are generally used together as parts
of a two step process. In the first step, an SSTable is generated. In the second step, this SSTable
is written into Cassandra. These two steps are fused together in thetCassandraOutputBulkExec
component, detailed in a separate section. The advantage of using two separate components is that
the data can be transformed before it is loaded into Cassandra.
tCassandraBulkExec writes data from an SSTable into Cassandra.
tCassandraBulkExec Standard properties

These properties are used to configure tCassandraBulkExec running in the Standard Job framework.
The Standard tCassandraBulkExec component belongs to the Big Data and the Databases NoSQL
families.
Fabric.
Basic settings
DB Version Select the Cassandra version you are using.Cassandra 2.0.0

only works with JVM1.7.
Host Hostname or IP address of the Cassandra server.
Port Listening port number of the Cassandra server.
Required authentication Select this check box to provide credentials for the
Cassandra authentication.
Username Fill in this field with the username for the Cassandra
authentication.
Password Fill in this field with the password for the Cassandra
authentication.
settings.
Use configuration file Select this check box and in the field that is displayed,
enter the path, or browse to cassandra.yaml, the main
configuration file for Cassandra.
This way, this component can import and directly use the
configuration from cassandra.yaml, which can contain many
advanced Cassandra properties, such as the properties for
SSL encryption.
When you need to run your Job in different Cassandra
environments, this feature allows your Job to easily switch
between the configurations.
429
tCassandraBulkExec
For further information about this cassandra.yaml file, see

Cassandra configuration.
Keyspace Type in the name of the keyspace into which you want to
write the SSTable.
Column family Type in the name of the column family into which you want
to write the SSTable.
SSTable directory Specify the local directory of the SSTable to be loaded into
Cassandra. Note that the complete path to the SSTable will
be the local directory appended by the specified keyspace
name and column family name.
For example, if you set the local directory to /home/talend/
sstable, and specify testk as the keyspace name and testc as
the column family name, the complete path to the SSTable
will be /home/talend/sstable/testk/testc/.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component can be used as a standalone component.
Limitation Currently, the execution of this component ends the entire

Job.
Related scenarios
430
tCassandraClose
tCassandraClose
Disconnects a connection to a Cassandra server so as to release occupied resources.
tCassandraClose Standard properties

These properties are used to configure tCassandraClose running in the Standard Job framework.
The Standard tCassandraClose component belongs to the Big Data and the Databases NoSQL families.
Fabric.
Basic settings
Component List Select an active Cassandra connection to be closed.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is generally used with other Cassandra

components, particularly tCassandraConnection.
Related Scenario
For a scenario in which tCassandraClose is used, see Handling data with Cassandra on page 439.
431
tCassandraConnection
Enables the reuse of the connection it creates to a Cassandra server.
tCassandraConnection opens a connection to a Cassandra server.
tCassandraConnection Standard properties

These properties are used to configure tCassandraConnection running in the Standard Job framework.
The Standard tCassandraConnection component belongs to the Big Data and the Databases NoSQL
families.
Fabric.
Basic settings
Property type Either Built-In or Repository.

Repository: Select the repository file where the properties
are stored.
DB Version Select the Cassandra version you are using.
Server Type in the IP address or hostname of the Cassandra server.
Port Type in the listening port number of the Cassandra server.
Required authentication Select this check box to enable the database authentication.
authentication.
authentication.
settings.
Use SSL connection Select this check box to enable the SSL or TLS encrypted
connection.
Then you need to use the tSetKeystore component in the
same Job to specify the encryption information.
Advanced settings
432
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is generally used with other Cassandra

components, particularly tCassandraClose.
Related scenario
For a scenario in which tCassandraConnection is used, see Handling data with Cassandra on page
439.
433
tCassandraInput
tCassandraInput
Extracts the desired data from a standard or super column family of a Cassandra keyspace so as to
apply changes to the data.
tCassandraInput allows you to read data from a Cassandra keyspace and send data in the Talend flow.
Mapping tables between Cassandra type and Talend data

type
The first of the following two tables presents the mapping relationships between Cassandra type with
Cassandra API, Datastax, and Talend data type .
Cassandra 2.0 or later versions
Cassandra Type Talend Data Type
Ascii String; Character
BigInt Long
Blob Byte[]
Boolean Boolean
Counter Long
Inet Object
Int Integer; Short; Byte
List List
Map Object
Set Object
Text String; Character
Timestamp Date
UUID String
TimeUUID String
VarChar String; Character
VarInt Object
Boolean Boolean
Float Float
Double Double
434
tCassandraInput
Decimal BigDecimal
Cassandra Hector API ( for Cassandra versions older than 2.0)

The following table presents the mapping relationships between Cassandra type with the Hector API
and Talend data type.
BytesType byte[]
AsciiType String
UTF8Type String
IntegerType Object
Int32Type Integer
LongType Long
UUIDType String
TimeUUIDType String
DateType Date
BooleanType Boolean
FloatType Float
DoubleType Double
DecimalType BigDecimal
tCassandraInput Standard properties

These properties are used to configure tCassandraInput running in the Standard Job framework.
The Standard tCassandraInput component belongs to the Big Data and the Databases NoSQL families.
Fabric.
Basic settings

are stored.
435
tCassandraInput
API type This drop-down list is displayed only when you have
selected the 2.0 version (deprecated) of Cassandra from the
DB version list. From this API type list, you can either select
Datastax to use CQL 3 (Cassandra Query Language) with
Cassandra, or select Hector (deprecated) to use CQL 2.
Note that the Hector API is deprecated along with the
support for Cassandra V2.0.
Along with the evolution of the CQL commands, the
parameters to be set in the Basic settings view varies.
This check box appears only if you do not select the Use
existing connection check box.
authentication.
authentication.
settings.
Keyspace Type in the name of the keyspace from which you want to
read data.
Column family Type in the name of the column family from which you want
to read data.
available:
only.
436
tCassandraInput

Query Enter the query statements to be used to read data from the
Cassandra database.
By default, the query is not case-sensitive. This means that
at runtime, the column names you put in the query are
always taken in lower case. If you need to make the query
case-sensitive, put the column names in double quotation
marks.
The [...] button next to this field allows you to generate the
sample code that shows what the pre-defined variables are
for the data to be read and how these variables can be used.
This feature is available only for the Datastax API of
Cassandra 2.0 (deprecated) or a later version.
Column family type Standard: Column family is of standard type.

Super: Column family is of super type.
Include key in output columns Select this check box to include the key of the column
family in output columns.
• Key column: select the key column from the list.
Row key type Select the appropriate Talend data type for the row key
from the list.
Row key Cassandra type Select the corresponding Cassandra type for the row key
from the list.
Warning:
The value of the Default option varies with the selected
row key type. For example, if you select String from the
Row key type list, the value of the Default option will be
UTF8.
For more information about the mapping table between

Cassandra type and Talend data type, see Mapping tables
between Cassandra type and Talend data type on page
434.
Include super key output columns Select this check box to include the super key of the column
family in output columns.
• Super key column: select the desired super key column
from the list.
This check box appears only if you select Super from the
Column family type drop-down list.
Super column type Select the type of the super column from the list.
Super column Cassandra type Select the corresponding Cassandra type for the super
column from the list.
For more information about the mapping table between
Cassandra type and Talend data type, see Mapping tables
between Cassandra type and Talend data type on page
434.
437
tCassandraInput
Specify row keys Select this check box to specify the row keys of the column
family directly.
Row Keys Type in the specific row keys of the column family in the
correct format depending on the row key type.
This field appears only if you select the Specify row keys
check box.
Key start Type in the start row key of the correct data type.
Key end Type in the end row key of the correct data type.
Key limit Type in the number of rows to be read between the start
row key and the end row key.
Specify columns Select this check box to specify the column names of the
column family directly.
Columns Type in the specific column names of the column family in

the correct format depending on the column type.
This field appears only if you select the Specify columns
check box.
Columns range start Type in the start column name of the correct data type.
Columns range end Type in the end column name of the correct data type.
Columns range limit Type in the number of columns to be read between the start
column and the end column.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
438
tCassandraInput
Usage
Usage rule This component always needs an output link.
Handling data with Cassandra

This scenario applies only to Talend products with Big Data.
This scenario describes a simple Job that reads the employee data from a CSV file, writes the data to
a Cassandra keyspace, then extracts the personal information of some employees and displays the
information on the console.
This scenario requires six components, which are:

• tCassandraConnection: opens a connection to the Cassandra server.
• tFileInputDelimited: reads the input file, defines the data structure and sends it to the next
component.
• tCassandraOutput: writes the data it receives from the preceding component into a Cassandra
keyspace.
• tCassandraInput: reads the data from the Cassandra keyspace.
• tLogRow: displays the data it receives from the preceding component on the console.
• tCassandraClose: closes the connection to the Cassandra server.
439
tCassandraInput
Dropping and linking the components

Procedure
1. Drop the following components from the Palette onto the design workspace: tCassandraConn
ection, tFileInputDelimited, tCassandraOutput, tCassandraInput, tLogRow and tCassandraClose.
2. Connect tFileInputDelimited to tCassandraOutput using a Row > Main link.
3. Do the same to connect tCassandraInput to tLogRow.
4. Connect tCassandraConnection to tFileInputDelimited using a Trigger > OnSubjobOk link.
5. Do the same to connect tFileInputDelimited to tCassandraInput and tCassandraInput to
tCassandraClose.
6. Label the components to better identify their functions.

Opening a Cassandra connection
Procedure
1. Double-click the tCassandraConnection component to open its Basic settings view in
theComponent tab.
2. Select the Cassandra version that you are using from the DB Version list. In this example, it is
Cassandra 1.1.2.
3. In the Server field, type in the hostname or IP address of the Cassandra server. In this example, it
is localhost.
4. In the Port field, type in the listening port number of the Cassandra server.
5. If required, type in the authentication information for the Cassandra connection: Username and
Password.
Reading the input data
Procedure
1. Double-click the tFileInputDelimited component to open its Component view.
440
tCassandraInput
2. Click the [...] button next to the File Name/Stream field to browse to the file that you want to
read data from. In this scenario, the directory is D:/Input/Employees.csv. The CSV file contains four
columns: id, age, name and ManagerID.id;age;name;ManagerID 1;20;Alex;1 2;40;Pet
er;1 3;25;Mark;1 4;26;Michael;1 5;30;Christophe;2 6;26;Stephane;3 7;37
;Cedric;3 8;52;Bill;4 9;43;Jack;2 10;28;Andrews;4
3. In the Header field, enter 1 so that the first row in the CSV file will be skipped.
4. Click Edit schema to define the data to pass on to the tCassandraOutput component.
Writing data to a Cassandra keyspace
Procedure
1. Double-click the tCassandraOutput component to open its Basic settings view in the Component
tab.
441
tCassandraInput
2. Type in required information for the connection or use the existing connection you have
configured before. In this scenario, the Use existing connection check box is selected.
3. In the Keyspace configuration area, type in the name of the keyspace: Employee in this example,
and select Drop keyspace if exists and create from the Action on keyspace list.
4. In the Column family configuration area, type in the name of the column family: Employee_Info in
this example, and select Drop column family if exists and create from the Action on column family
list.
The Define column family structure check box appears. In this example, clear this check box.
5. In the Action on data list, select the action you want to carry on, Upsert in this example.
6. Click Sync columns to retrieve the schema from the preceding component.
7. Select the key column of the column family from the Key column list. In this example, it is id.
If needed, select the Include key in columns check box.
Reading data from the Cassandra keyspace
Procedure
1. Double-click the tCassandraInput component to open its Component view.
2. Type in required information for the connection or use the existing connection you have
configured before. In this scenario, the Use existing connection check box is selected.
3. In the Keyspace configuration area, type in the name of the keyspace: Employee in this example.
442
tCassandraInput
4. In the Column family configuration area, type in the name of the column family: Employee_Info in
this example.
5. Select Edit schema to define the data structure to be read from the Cassandra keyspace. In this
example, three columns id, name and age are defined.
6. If needed, select the Include key in output columns check box, and then select the key column of
the column family you want to include from the Key column list.
7. From the Row key type list, select Integer because id is of integer type in this example.
Keep the Default option for the row key Cassandra type because its value will become the
corresponding Cassandra type Int32 automatically.
8. In the Query configuration area, select the Specify row keys check box and specify the row keys
directly. In this example, three rows will be read. Next, select the Specify columns check box and
specify the column names of the column family directly. This scenario will read three columns
from the keyspace: id, name and age.
9. If needed, the Key start and the Key end fields allow you to define the range of rows, and the
Key limit field allows you to specify the number of rows within the range of rows to be read.
Similarly, the Columns range start and the Columns range end fields allow you to define the range
of columns of the column family, and the Columns range limit field allows you to specify the
number of columns within the range of columns to be read.
Displaying the information of interest
Procedure
1. Double-click the tLogRow component to open its Component view.
2. In the Mode area, select Table (print values in cells of a table).
Closing the Cassandra connection
Procedure
1. Double-click the tCassandraClose component to open its Component view.
443
tCassandraInput
2. Select the connection to be closed from the Component List.

Procedure
The personal information of three employees is displayed on the console.
444
tCassandraOutput
tCassandraOutput
Writes data into or deletes data from a column family of a Cassandra keyspace.
tCassandraOutput receives data from the preceding component, and writes data into Cassandra.
tCassandraOutput Standard properties

These properties are used to configure tCassandraOutput running in the Standard Job framework.
The Standard tCassandraOutput component belongs to the Big Data and the Databases NoSQL
families.
Fabric.
Basic settings

are stored.
API type This drop-down list is displayed only when you have
selected the 2.0 version (deprecated) of Cassandra from the
DB version list. From this API type list, you can either select
Datastax to use CQL 3 (Cassandra Query Language) with
Cassandra, or select Hector (deprecated) to use CQL 2.
Note that the Hector API is deprecated along with the
support for Cassandra V2.0.
Along with the evolution of the CQL commands, the
parameters to be set in the Basic settings view varies.
authentication.
authentication.
445
tCassandraOutput

settings.
Use SSL Select this check box to enable the SSL or TLS encrypted
connection.
Then you need to use the tSetKeystore component in the
same Job to specify the encryption information.
write data.
Action on keyspace Select the operation you want to perform on the keyspace
to be used:
• Drop and create keyspace: The keyspace is removed
and created again.
• Create keyspace: The keyspace does not exist and gets
created.
• Create keyspace if not exists: A keyspace gets created if
it does not exist.
• Drop keyspace if exists and create: The keyspace is
removed if it already exists and created again.
Column family Type in the name of the keyspace into which you want to
write data.
Action on column family Select the operation you want to perform on the column
family to be used:
• None: no operation is carried out.
• Drop and create column family: the column family is
removed and created again.
• Create column family: the column family does not exist
and gets created.
• Create column family if not exists: a column family gets
created if it does not exist.
• Drop column family if exists and create: the column
family is removed if it already exists and created again.
• Upsert: insert the columns if they do not exist or
update the existing columns.
• Insert: insert the columns if they do not exist. This
action also updates the existing ones.
• Update: update the existing columns or add the
columns that do not exist. This action does not support
the Counter Cassandra data type.
• Delete: remove columns corresponding to the input
flow.
Note that the action list varies depending on the Hector
(deprecated) or Datastax API you are using. When the API is
Datastax, more actions become available.
For more advanced actions, use the Advanced settings view.
446
tCassandraOutput

available:
only.
component only.

Job designs.
alend.com).
Sync columns Click this button to retrieve schema from the previous
Die on error Clear the check box to skip any rows on error and complete
Features available only with the Hector API (deprecated)
Row key column Select the row key column from the list.
Include row key in columns Select this check box to include row key in columns.
Super columns Select the super column from the list.

This drop-down list appears only if you select Super from
the Column family type drop-down list.
Include super columns in standard columns Select this check box to include the super columns in
standard columns.
Delete row Select this check box to delete the row.

This check box appears only if you select Delete from the
Action on data drop-down list.
Delete columns Customize the columns you want to delete.
447
tCassandraOutput
Delete super columns Select this check box to delete super columns.
This check box appears only if you select the Delete Row
check box.
Advanced settings
Batch Size Number of lines in each processed batch.

When you are using the Datastax API, this feature is
displayed only when you have selected the Use unlogged
batch check box.
Use unlogged batch Select this check box to handle data in batch but with
Cassandra's UNLOGGED approach. This feature is available
to the following three actions: Insert, Update and Delete.
Then you need to configure how the batch mode works:
• Batch size: enter the number of lines in each batch to
be processed.
• Group batch method: select how to group rows into
batches:
1. Partition: rows sharing the same partition keys are
grouped.
2. Replica: rows to be written to the same replica are
grouped.
3. None: rows are grouped randomly. This option is
suitable for a single node Cassandra.
• Cache batch group: select this check box to load rows
into memory before grouping them. This way, grouping
is not impacted by the order of the rows.
If you leave this check box clear, only successive rows
that meet the same criteria are grouped.
• Async execute: select this check box if you want
tCassandraOutput to send batches in parallel. If you
leave it clear, tCassandraOutput waits for the result of
a batch before sending another batch to Cassandra.
• Maximum number of batches executed in parallel: once
you have selected Async execute, enter the number of
batches to be sent in parallel to Cassandra.
This number should not be a negative number or 0 and
it is also recommended not to use too large a value.
The ideal situation to use batches with Cassandra is when
a small number of tables must synchronize the data to be
inserted or updated.
In this UNLOGGED approach, the Job does not write batches
into Cassandra's batchlog system and thus avoids the
performance issue incurred by this writing. For further
information about Cassandra BATCH statement and
UNLOGGED approach, see Batches.
Insert if not exists Select this check box to insert rows. This row insertion takes
place only when they do not exist in the target table.
This feature is available to the Insert action only.
Delete if exists Select this check box to remove from the target table only
the rows that have the same records in the incoming flow.
448
tCassandraOutput
This feature is available only to the Delete action.
Use TTL Select this check box to write the TTL data in the target
table. In the column list that is displayed, you need to select
the column to be used as the TTL column. The DB type of
this column must be Int.
This feature is available to the Insert action and the Update
action only.
Use Timestamp Select this check box to write the timestamp data in the
target table. In the column list that is displayed, you need to
select the column to be used to store the timestamp data.
The DB type of this column must be BigInt.
This feature is available to the following actions: Insert,
Update and Delete.
IF condition Add the condition to be met for the Update or the Delete
action to take place. This condition allows you to be more
precise about the columns to be updated or deleted.
Special assignment operation Complete this table to construct advanced SET commands
of Cassandra to make the Update action more specific.
For example, add a record to the beginning or a particular
position of a given column.
In the Update column column of this table, you need
to select the column to be updated and then select the
operations to be used from the Operation column. The
following operations are available:
• Append: it adds incoming records to the end of the
column to be updated. The Cassandra data types it can
handle are Counter, List, Set and Map.
• Prepend: it adds incoming records to the beginning of
the column to be updated. The only Cassandra data
type it can handle is List.
• Remove: it removes records from the target table
when the same records exist in the incoming flow. The
Cassandra data types it can handle are Counter, List,
Set and Map.
• Assign based on position/key: it adds records to a
particular position of the column to be updated. The
Cassandra data types it can handle are List and Map.
Once you select this operation, the Map key/list
position column becomes editable. From this column,
you need to select the column to be used as reference
to locate the position to be updated.
For more details about these operations, see Datastax's
related documentation in http://docs.datastax.com/en/
cql/3.1/cql/cql_reference/update_r.html?scroll=reference
_ds_g4h_qzq_xj__description_unique_34.
Row key in the List type Select the column to be used to construct the WHERE clause
of Cassandra to perform the Update or the Delete action on
only selected rows. The column(s) to be used in this table
should be from the set of the Primary key columns of the
Cassandra table.
Delete collection column based on postion/key Select the column to be used as reference to locate the
particular row(s) to be removed.
449
tCassandraOutput
This feature is available only to the Delete action.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is used as an output component and it

always needs an incoming link.
Related Scenario
For a scenario in which tCassandraOutput is used, see Handling data with Cassandra on page 439.
450
tCassandraOutputBulk
Prepares an SSTable of large size and processes it according to your needs before loading this
SSTable into a column family of a Cassandra keyspace.
The tCassandraOutputBulk and tCassandraBulkExec components are generally used together as parts
of a two step process. In the first step, an SSTable is generated. In the second step, this SSTable
is written into Cassandra. These two steps are fused together in thetCassandraOutputBulkExec
component, detailed in a separate section. The advantage of using two separate components is that
the data can be transformed before it is loaded into Cassandra.
tCassandraOutputBulk receives data from the preceding component, and creates an SSTable locally.
tCassandraOutputBulk Standard properties

These properties are used to configure tCassandraOutputBulk running in the Standard Job framework.
The Standard tCassandraOutputBulk component belongs to the Big Data and the Databases NoSQL
families.
Fabric.
Basic settings
available:
only.
component only.

Job designs.
451
alend.com).
Table type Select the type of the data model to be used for the table
to be created. It can be CQL (actually CQL3) or non-CQL (the
legacy thrift-based API of Cassandra before CQL3).
This drop-down list is available only when the DB version
you are using is Cassandra 2.0.0 (deprecated). For the
Cassandra versions later than 2.0.0, CQL becomes the only
model used by this component and so this list is no longer
available.
authentication.
authentication.
settings.
Use configuration file Select this check box and in the field that is displayed,
enter the path, or browse to cassandra.yaml, the main
configuration file for Cassandra.
This way, this component can import and directly use the
configuration from cassandra.yaml, which can contain many
advanced Cassandra properties, such as the properties for
SSL encryption.
When you need to run your Job in different Cassandra
environments, this feature allows your Job to easily switch
between the configurations.
For further information about this cassandra.yaml file, see
Cassandra configuration.
write the SSTable.
Partitioner Select the partitioner which determines how data is

distributed across the Cassandra cluster.
• Random
• Murmur3
• Order preserving: not recommended because it
assumes keys are UTF8 strings.
452
For more information about the partitioner, see http://

wiki.apache.org/cassandra/Partitioners.
Schema statement Enter the statement to define the schema of the column
family to be used or to be created on the fly.
• This statement is a Cassandra prepared statement,
which stores query results locally in the SSTable
directory you define with this component before
sending them to the server. For further information
about the prepared statements, see Prepared
statements.
• A Cassandra column family is a container for a
collection of rows of records that have a similar kind.
Its schema must contain strictly the same columns as
the component schema you have defined, that is to
say, the column names and the order of the columns in
both the schemas must be identical.
An example of this schema statement is provided in the
Schema statement field:
create table ks.tb (id int,

name text, birthday timestamp,
primary key(id, birthday)) with
clustering order by (birthday
desc)
It will create a column family called tb containing the id, the
name and the birthday columns under the keyspace ks.
For further information about a column family, see Standard
column family.
This field is available only when the version of your
Cassandra database is later than 2.0.0. When it is 2.0.0
(deprecated), it is available only when you have selected
CQL from the Table type drop-down list.
Insert statement Enter the statement to instruct how to write the data from
the input flow into the columns of the column family to be
used.
This statement is a Cassandra prepared statement, which
stores query results locally in the SSTable directory you
define with this component before sending them to
the server. For further information about the prepared
statements, see Prepared statements.
An example of this insert statement is provided in the Insert
statement field:
insert into ks.tb (id, name, birthday)

values (?, ?, ?)
It will write data into the id, the name and the birthday
columns, respectively, of a column family called tb in the
keyspace ks. The question marks in the statement are the
bind variable markers for the three columns. For further
information about bind variables and their usage, see Bound
parameters.
453
Column name comparator Select the data type for the column names, which is used to
sort columns. This list is not available when the data model
to be used is CQL3.
For more information about the comparators, see http://
www.datastax.com/docs/1.1/ddl/column_family#about-
data-types-comparators-and-validators.
SSTable directory Specify the local directory for the SSTable. Note that the
complete path to the SSTable will be the local directory
appended by the specified keyspace name and column
family name.
Buffer size Specify what size the SSTable must reach before it is
written into Cassandra.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component always needs an incoming link.
Related scenarios
454
tCassandraOutputBulkExec
Improves performance during Insert operations to a column family of a Cassandra keyspace.
The tCassandraOutputBulk and tCassandraBulkExec components are generally used together to
output data to an SSTable and then to write the SSTable into Cassandra, in a two step process. These
two steps are fused together in the tCassandraOutputBulkExec component.
tCassandraOutputBulkExec receives data from the preceding component, creates an SSTable and then
writes the SSTable into Cassandra.
tCassandraOutputBulkExec Standard properties

These properties are used to configure tCassandraOutputBulkExec running in the Standard Job
framework.
The Standard tCassandraOutputBulkExec component belongs to the Big Data and the Databases
NoSQL families.
Fabric.
Basic settings
available:
only.
component only.

Job designs.
455
alend.com).
Table type Select the type of the data model to be used for the table
to be created. It can be CQL (actually CQL3) or non-CQL (the
legacy thrift-based API of Cassandra before CQL3).
This drop-down list is available only when the DB version
you are using is Cassandra 2.0.0 (deprecated). For the
Cassandra versions later than 2.0.0, CQL becomes the only
model used by this component and so this list is no longer
available.
Warning:
• Cassandra 2.0.0 (deprecated) only works with
JVM1.7.
authentication.
authentication.
settings.
write the SSTable.
Partitioner Select the partitioner which determines how the data is

distributed across the Cassandra cluster.
• Random
• Murmur3
• Order preserving: not recommended because it
assumes keys are UTF8 strings.
For more information about the partitioner, see http://
wiki.apache.org/cassandra/Partitioners.
Schema statement Enter the statement to define the schema of the column
family to be used or to be created on the fly.
• This statement is a Cassandra prepared statement,
which stores query results locally in the SSTable
directory you define with this component before
sending them to the server. For further information
456
about the prepared statements, see Prepared

statements.
• A Cassandra column family is a container for a
collection of rows of records that have a similar kind.
Its schema must contain strictly the same columns as
the component schema you have defined, that is to
say, the column names and the order of the columns in
both the schemas must be identical.
An example of this schema statement is provided in the
Schema statement field:
create table ks.tb (id int,

name text, birthday timestamp,
primary key(id, birthday)) with
clustering order by (birthday
desc)
It will create a column family called tb containing the id, the
name and the birthday columns under the keyspace ks.
For further information about a column family, see Standard
column family.
Insert statement Enter the statement to instruct how to write the data from
the input flow into the columns of the column family to be
used.
This statement is a Cassandra prepared statement, which
stores query results locally in the SSTable directory you
define with this component before sending them to
the server. For further information about the prepared
statements, see Prepared statements.
An example of this insert statement is provided in the Insert
statement field:
insert into ks.tb (id, name, birthday)

values (?, ?, ?)
It will write data into the id, the name and the birthday
columns, respectively, of a column family called tb in the
keyspace ks. The question marks in the statement are the
bind variable markers for the three columns. For further
information about bind variables and their usage, see Bound
parameters.
Column name comparator Select the data type for the column names, which is used to
sort columns.
For more information about the comparators, see http://
www.datastax.com/docs/1.1/ddl/column_family#about-
data-types-comparators-and-validators.
SSTable directory Specify the local directory for the SSTable. Note that the
complete path to the SSTable will be the local directory
457
appended by the specified keyspace name and column

family name.
Buffer size Specify what size the SSTable must reach before it is
written into Cassandra.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage

transformation is required on the data to be loaded into the
database.
Limitation Currently, the execution of this component ends the entire

Job.
Related scenarios
458
tCassandraRow
tCassandraRow
Acts on the actual DB structure or on the data, depending on the nature of the query and the
database.
tCassandraRow is the specific component for this database query. It executes the Cassandra Query
Language (CQL) query stated in the specified database. The row suffix means the component
implements a flow in the Job design although it does not provide output.
tCassandraRow Standard properties

These properties are used to configure tCassandraRow running in the Standard Job framework.
The Standard tCassandraRow component belongs to the Big Data and the Databases NoSQL families.
Fabric.
Basic settings

are stored.
Host Type in the IP address or hostname of the Cassandra server.
Port Type in the listening port number of the Cassandra server.
Required Authentication Select this check box to provide credentials for the
authentication.
authentication.
settings.
Keyspace Type in the name of the keyspace on which you want to

execute the CQL commands.
Column family Name of the column family.
459
tCassandraRow
available:
only.
Query Type in the CQL command to be executed.

By default, the query is not case-sensitive. This means that
at runtime, the column names you put in the query are
always taken in lower case. If you need to make the query
case-sensitive, put the column names in double quotation
marks.
rows.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Related scenario
For related topics, see
460
tCassandraRow

• Using PreparedStatement objects to query data on page 2498.
461
tChangeFileEncoding
tChangeFileEncoding
Transforms the character encoding of a given file and generates a new file with the transformed
character encoding.
tChangeFileEncoding changes the encoding of a given file.
tChangeFileEncoding Standard properties

These properties are used to configure tChangeFileEncoding running in the Standard Job framework.
The Standard tChangeFileEncoding component belongs to the Data Quality and the File families.
Basic settings
Use Custom Input Encoding Select this check box to customize input encoding type.
When it is selected, a list of input encoding types appears,
allowing you to select an input encoding type or specify an
input encoding type by selecting CUSTOM.
Encoding From this list of character encoding types, you can select
one of the offered options or customize the character
encoding by selecting CUSTOM and specifying a character
encoding type.
Input File Name Path of the input file.
Output File Name Path of the output file.
Advanced settings
Create directory if does This check box is selected by default. It creates a directory to hold the output table if required.
not exist
tStatCatcher Statistics Select this check box to collect log data at the component level.
Global Variables
Global Variables EXISTS: the result of whether a specified file exists. This is a
Flow variable and it returns a boolean.
FILENAME: the name of the file processed. This is an After
check box.
462
tChangeFileEncoding

use from it.
User Guide.
Usage
Usage rule This component can be used as standalone component.
Transforming the character encoding of a file

This Java scenario describes a very simple Job that transforms the character encoding of a text file and
generates a new file with the new character encoding.
Procedure
Procedure
1. Drop a tChangeFileEncoding component onto the design workspace.
2. Double-click the tChangeFileEncoding component to display its Basic settings view.
3. Select Use Custom Input Encoding check box. Set the Encoding type to GB2312.
4. In the Input File Name field, enter the file path or browse to the input file.
5. In the Output File Name field, enter the file path or browse to the output file.
6. Select CUSTOM from the second Encoding list and enter UTF-16 in the text field.
463
tChangeFileEncoding
7.
Press F6 to execute the Job.
Results
The encoding type of the file in.txt is transformed and out.txt is generated with the UTF-16 encoding
type.
464
tChronometerStart
tChronometerStart
Operates as a chronometer device that starts calculating the processing time of one or more subJobs
in the main Job, or that starts calculating the processing time of part of your subJob.
Starts measuring the time a subJob takes to be executed.
tChronometerStart Standard properties

These properties are used to configure tChronometerStart running in the Standard Job framework.
The Standard tChronometerStart component belongs to the Logs & Errors family.
Global Variables
Global Variables STARTTIME: the start time to calculate the processing time
of subjob(s). This is a Flow variable and it returns a long.
check box.
use from it.
User Guide.
Usage
Usage rule You can use tChronometerStart as a start or middle

component. It can precede one or more processing tasks in
the subJob. It can precede one or more subJobs in the main
Job.
Related scenario
For related scenario, see Measuring the processing time of a subJob and part of a subJob on page
467.
465
tChronometerStop
tChronometerStop
Operates as a chronometer device that stops calculating the processing time of one or more subJobs
in the main Job, or that stops calculating the processing time of part of your subJob. tChronometerSt
op displays the total execution time.
Measures the time a subJob takes to be executed.
tChronometerStop Standard properties

These properties are used to configure tChronometerStop running in the Standard Job framework.
The Standard tChronometerStop component belongs to the Logs & Errors family.
Basic settings
Since options Select either check box to select measurement starting

point:
Since the beginning: stops time measurement launched at
the beginning of a subJob.
Since a tChronometerStart: stops time measurement
launched at one of the tChronometerStart components used
on the data flow of the subJob.
Display duration in console When selected, it displays subJob execution information on

the console.
Display component name When selected, it displays the name of the component on
the console.
Caption Enter desired text, to identify your subJob for example.
Display human readable duration When selected, it displays subJob execution information in
readable time unites.
Global Variables
Global Variables STOPTIME: the stop time to calculate the processing time of
subjob(s). This is a Flow variable and it returns a long.
DURATION: the processing time of subjob(s). This is a Flow
check box.
use from it.
466
tChronometerStop

User Guide.
Usage
Usage rule Cannot be used as a start component.
Measuring the processing time of a subJob and part of a

subJob
This scenario is a subJob that does the following in a sequence:
• generates 1000 000 rows of first and last names,
• gathers first names with their corresponding last names,
• stores the output data in a delimited file,
• measures the duration of the subJob as a whole,
• measures the duration of the name replacement operation,
• displays the gathered information about the processing time on the Run log console.
To measure the processing time of the subJob:
• Drop the following components from the Palette onto the design workspace: tRowGenerator,
tMap, tFileOutputDelimited, and tChronometerStop.
• Connect the first three components using Main Row links.
Note: When connecting tMap to tFileOutputDelimited, you will be prompted to name the output
table. The name used in this example is "new_order".
• Connect tFileOutputDelimited to tChronometerStop using an OnComponentOk link.

• Select tRowGenerator and click the Component tab to display the component view.
• In the component view, click Basic settings. The Component tab opens on the Basic settings view
by default.
467
tChronometerStop
• Click Edit schema to define the schema of the tRowGenerator. For this Job, the schema is
composed of two columns: First_Name and Last_Name, so click twice the [+] button to add two
columns and rename them.
• Click the RowGenerator Editor three-dot button to open the editor and define the data to be
generated.
• In the RowGenerator Editor, specify the number of rows to be generated in the Number of Rows
for RowGenerator field and click OK. The RowGenerator Editor closes.
• You will be prompted to propagate changes. Click Yes in the popup message.
• Double-click on the tMap component to open the Map editor. The Map editor opens displaying the
input metadata of the tRowGenerator component.
• In the Schema editor panel of the Map editor, click the plus button of the output table to add two
rows and define them.
468
tChronometerStop
• In the Map editor, drag the First_Name row from the input table to the Last_Name row in the
output table and drag the Last_Name row from the input table to the First_Name row in the output
table.
• Click Apply to save changes.
• You will be prompted to propagate changes. Click Yes in the popup message.
• Click OK to close the editor.
• Select tFileOutputDelimited and click the Component tab to display the component view.
• In the Basic settings view, set tFileOutputDelimited properties as needed.
• Select tChronometerStop and click the Component tab to display the component view.
• In the Since options panel of the Basic settings view, select Since the beginning option to measure
the duration of the subJob as a whole.
469
tChronometerStop
• Select/clear the other check boxes as needed. In this scenario, we want to display the subJob
duration on the console preceded by the component name.
• If needed, enter a text in the Caption field.
Note: You can measure the duration of the subJob the same way by placing tChronometerStop
below tRowGenerator, and connecting the latter to tChronometerStop using an OnSubjobOk link.
470
tCloudStart
tCloudStart
Starts instances on Amazon EC2 (Amazon Elastic Compute Cloud).
This component accesses the cloud provider to be used (Amazon EC2) and launches instances, which
are virtual servers in that cloud. If an instance to be launched does not exist, tCloudStart creates it.
tCloudStart Standard properties

These properties are used to configure tCloudStart running in the Standard Job framework.
The Standard tCloudStart component belongs to the Cloud family.
Basic settings
Access key and Secret key Enter or paste the access key and the secret key required by
Amazon to authenticate your requests to its web services.
These access credentials are generated from the Security
Credential tab of your Amazon account page.
settings.
Cloud provider Select the cloud provider to be used.
Image Enter the name of the Amazon Machine Image (AMI) to

be used to launch an instance. This AMI defines the basic
configuration of that instance.
Region and Zone Enter the region and the zone to be used as the geographic
location where you want to launch an instance.
The syntax used to express a location is predefined by
Amazon, for example, us-east-1 representing the US East
(Northern Virginia) region and us-east-1a representing
one of the Availability Zones within that region. For fu
rther information about available regions for Amazon, see
Amazon's documentation about regions and endpoints and
as well Amazon's FAQ about region and Availability Zone.
Instance name Enter the name of the instance to be launched. For example,
you can enter Talend.
Note that the upper letter will be converted to lower letter.
Instance count Enter the number of instances to be launched. At runtime,

the name specified in the Instance name field, for example
Talend, will be used as the initial part of each instance
name, and letters and numbers will be randomly added to
complete each name.
Instance type Select the type of the instance(s) to be launched. Each type
is predefined by Amazon and defines the performance of
every instance you want to launch.
471
tCloudStart
This drop-down list presents the API name of each instance

type. For further information, see Amazon's documentation
about instance types.
Proceed with a Key pair Select this check box to use Amazon Key Pair for your login
to Amazon EC2. Once selecting it, a drop-down list appears
to allow you to select :
• Use an existing Key Pair to enter the name of that Key
Pair in the field next to the drop-down list. If required,
Amazon will prompt you at runtime to find and use
that Key Pair.
• Create a Key Pair to enter the name of the new Key
Pair in the field next to the drop-down list and define
the location where you want to store this Key Pair in
the Advanced settings tab view.
Security group Add rows to this table and enter the names of the security
groups to which you need to assign the instance(s) to be
launched. The security groups set in this table must exist on
your Amazon EC2.
A security group applies specific rules on inbound traffic
to instances assigned to the group, such as the ports to be
used. For further information about security groups, see
Amazon's documentation about security groups.
Note that an instance can be assigned to a group by setting
its security group name or key pair name to jclouds#<
$group_name>, where <$group_name> identifies the
group to which the instance belongs. In this way, you can
change the status of all instances or running instances
in one group at the same time using the tCloudStop
component.
Advanced settings
Key Pair folder Browse to, or enter the path to the folder you use to store
the created Key Pair file.
This field appears when you select Creating a Key Pair in
the Basic settings tab view.
Volumes Add rows and define the volume(s) to be created for the
instances to be launched in addition to the volumes
predefined and allocated by the given Amazon EC2.
The parameters to be set in this table are the same
parameters used by Amazon for describing a volume.
If you need to remove automatically an additional volume
after terminating its related instance, select the check box
in the Delete on termination column.
component level.
Global Variables
Global Variables NODE_GROUP: the name of the instance. This is an After

472
tCloudStart
NODES: the instances launched. This is an After variable and

it returns an object.
check box.
use from it.
User Guide.
Usage
Usage rule This component works standalone to launch an instance

on Amazon EC2. You can use this component to start the
instance you need to deploy Jobs on.
Related scenarios
473
tCloudStop
tCloudStop
Changes the status of a launched instance on Amazon EC2 (Amazon Elastic Compute Cloud).
This component accesses the cloud provider to be used (Amazon EC2) and suspends, resumes or
terminates given instance(s).
tCloudStop Standard properties

These properties are used to configure tCloudStop running in the Standard Job framework.
The Standard tCloudStop component belongs to the Cloud family.
Basic settings
Access key and Secret key Enter or paste the access key and the secret key required by
Amazon to authenticate your requests to its web services.
These access credentials are generated from the Security
Credential view of your Amazon account page.
settings.
Cloud provider Select the cloud provider to be used.
Action Select the action you need tCloudStop to take in order to

change the status of a given instance. This action may be:
• Suspend
• Resume
• Terminate
Note that if you terminate an instance, this instance will be
deleted, while a suspended instance can still be resumed.
Predicate Select the instance(s) of which you need to change the

status. The options are:
• Running instances: status of all the running instances
will be changed.
• Instances in a specific group: status of the instances of
a specific instance group will be changed. You need to
enter the name of that group in the Group name field.
• Running instances in a specific group: status of the
running instances of a specific instance group will be
changed. You need to enter the name of that group in
the Group name field.
• Instance with predefined id: status of a given instance
will be changed. You need to enter the ID of that
instance in the Id field. You can find this ID on your
Amazon EC2.
An instance group is composed of the instances using the
same instance name you have defined in the Instance name
field of tCloudStart.
474
tCloudStop
Group name Enter the name of the group in which you want to change
the status of given instances whose security group name or
key pair name is set to jclouds#<$group_name> in the
tCloudStart component, where <$group_name> identifies
the group to which the instance belongs.
This field is available only when Instances in a specific
group or Running instances in a specific group is selected
from the Predicate list.
Id Enter the ID of the instance of which you need to change

the status, for instance, "${region}/${instance id}". This field
appears when you select Instance with predefined id from
the Predicate list.
Advanced settings
component level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component works standalone to change the status

of given instances on Amazon EC2. You can use this
component to suspend, resume or terminate the instance(s)
you have deployed Jobs on.
This component often works alongside tCloudStart to
change the status of the instances launched by the latter
component.
Related scenarios
475
tCombinedSQLAggregate
Provides a set of matrix based on values or calculations.
tCombinedSQLAggregate collects data values from one or more columns of a table for statistical
purposes. This component has real-time capabilities since it runs the data transformation on the
DBMS itself.
tCombinedSQLAggregate Standard properties

These properties are used to configure tCombinedSQLAggregate running in the Standard Job
framework.
The Standard tCombinedSQLAggregate component belongs to the ELT family.
Basic settings
Schema and Edit schema A schema is a row description, it defines the number of
Repository.
available:
only.
Click Sync columns to retrieve the schema from the
previous component connected in the Job.
Guide.

projects and Jobs. Related topic: see Talend Studio User
Guide.


according to the schema structure you defined. You can add
as many output columns as you wish to make more precise
aggregations.
476
Input Column: Select the input column label to match the

output column's expected content, in case the output label
of the aggregation set needs to be different.
Function: Select any of the following operations to perform

on data: count, min, max, avg, sum, first, last, distinct and
count (distinct).
Input column: Select the input column from which you want
to collect the values to be aggregated.
Advanced settings
Global Variables

or transferred to an output component. This is a Flow
check box.
use from it.
User Guide.
Usage
Usage rule This component is an intermediary component. The use

of the corresponding connection and commit components
is recommended when using this component to allow a
unique connection to be open and then closed during the
Job execution.
477
Filtering and aggregating table columns directly on the

DBMS
The following scenario creates a Job that opens a connection to a MySQL database and:
• populates a database table with the input data,
• creates the output table for the filtered data,
• instantiates the schema from a database table in part (for column filtering),
• filters two columns in the same table to get only the data that meets two filtering conditions,
• collects data from the filtered column(s), grouped by specific value(s) and writes aggregated data
in a target database table.
478

Procedure
1. Drop the following components from the Palette onto the design workspace: tMysqlConnecti
on, tFixedFlowInput, tMysqlOutput, tCreateTable, tCombinedSQLInput, tCombinedSQLFilter,
tCombinedSQLAggregate, tCombinedSQLOutput, tMysqlCommit, tMysqlInput and tLogRow.
2. Connect tMysqlConnection to tFixedFlowInput using a Trigger > On Subjob Ok link
3. Do the same to connect tFixedFlowInput to tCreateTable, tCreateTable to tCombinedSQLInput,
tCombinedSQLInput to tMysqlCommit, and tMysqlCommit to tMysqlInput.
4. Connect tFixedFlowInput and tMysqlOutput using a Row > Main link.
5. Connect tCombinedSQLInput to tCombinedSQLFilter using a Row > Combine link.
6. Do the same to connect tCombinedSQLFilter to tCombinedSQLAggregate, and tCombinedSQLAg
gregate to tCombinedSQLOutput
7. Connect tMysqlInput and tLogRow using a Row > Main link.

The schema defined through tCombinedSQLInput can be different from that of the source table as you
can just instantiate the desired columns of the source table. Therefore, tCombinedSQLInput also plays
a role of column filtering.
In this scenario, the source database table has seven columns: id, first_name, last_name, city, state,
date_of_birth, and salary while tCombinedSQLInput only instantiates four columns that are needed for
the aggregation: id, state, date_of_birth, and salary from the source table.
Opening a MySQL connection
Procedure
1. Launch MySQL Workbench and start a local connection on port 3306.
2. Create a new schema and name it test.
3. Back in the design workspace, select tMysqlConnection and click the Component tab to define its
basic settings.
479
4. In the Basic settings view, set the database connection details manually or select Repository from
the Property Type list and select your DB connection if it has already been defined and stored in
the Metadata area of the Repository tree view.
For more information on centralizing DB connection details in the Repository, see Talend Studio
User Guide.
Populating the database table with input data
Procedure
1. In the design workspace, select tFixedFlowInput and click the Component tab to define its basic
settings
2. In the Basic settings view, in the Number of rows field, enter 500.
3. In this scenario, the source database table has seven columns: id, first_name, last_name, city, state,
date_of_birth, and salary
Click the [...] button next to Edit schema to define the following data structure.
480
4. Click the floppy disk icon to save the schema as a generic schema for later reuse.
5. In the Select folder window, select default and click OK.
6. Choose a name for your generic schema and click Finish.
7. Click OK.
8. The first column of the Values table automatically reflects the data structure you entered
previously.
9. In the Values table, enter a value for each column.
10. In the design workspace, select tMysqlOutput and click the Component tab to define its basic set
tings.
The output schema will automatically be the same as the previous component, in this case
tFixedFlowInput.
Creating the target database table
Procedure
1. In the design workspace, select tCreateTable and click the Component tab to define its basic set
tings.
481
2. Click the [...] button next to Edit schema to define the following data structure.
The schema you enter at this step must reflect the the differents aggregation operations you
want to perform on the input data.
Extracting and filtering data
Procedure
1. In the design workspace, select tCombinedSQLInput and click the Component tab to access the
configuration panel.
2. Enter the source table name, in this case employees in the Table field.
3. In the Schema field, select Repository from the list and click the [...] button right to the empty
field to load the schema you saved.while configuring the settings for tFixedFlowInput.
4. In the Repository Content window, expand Generic schemas and select your schema.
482
5. Click the [...] button right to Edit schema.

6. Select View schema, and in the first column of the table, clear the check boxes for first_name,
last_name and city.
Filtering and aggregating the input data
Procedure
1. In the design workspace, select tCombinedSQLFilter and click the Component tab to access the
2. Click the Sync columns button to retrieve the schema from the previous component, or configure
the schema manually by selecting Built-in from the Schema list and clicking the [...] button next
to Edit schema.
When you define the data structure for tCombinedSQLFilter, column names automatically appear
in the Input column list in the Conditions table.
In this scenario, the tCombinedSQLFilter component instantiates four columns: id, state,
date_of_birth, and salary.
3. In the Conditions table, set input parameters, operators and expected values in order to only
extract the records that fulfill these criteria.
Click two times on the [+] button under the Conditions table, and in Input column, select state and
date_of_birth from the drop-down list.
In this scenario, the tCombinedSQLFilter component filters the state and date_of_birth columns in
the source table to extract the employees who were born after Oct. 19, 1960 and who live in the
states Utah, Ohio and Iowa.
4. For the column state, select IN as operator from the drop-down list, and enter ('Utah','Ohia','Iowa')
as value.
5. For the column date_of_birth, select > as operator from the drop-down list, and enter ('1960-10-19')
as value.
6. Select And in the Logical operator between conditions list to apply the two conditions at the same
time. You can also customize the conditions by selecting the Use custom SQL box and editing the
conditions in the code box.
7. In the design workspace, select tCombinedSQLAggregate and click the Component tab to access
the configuration panel.
483
8. Click on the [...] button.next to Edit schema to enter the following configuration:
The tCombinedSQLAggregate component instantiates four columns: id, state, date_of_birth, and
salary, coming from the previous component.
9. The Group by table helps you define the data sets to be processed based on a defined column. In
this example: State.
In the Group by table, click the [+] button to add one line.
10. In the Output column drop-down list, select State. This column will be used to hold the data filt
ered on State.
11. The Operations table helps you define the type of aggregation operations to be performed.
The Output column list available depends on the schema you want to output (through the
484
tCombinedSQLOutput component). In this scenario, we want to group employees based on the

state they live in. Then we want to count the number of employees per state, calculate the avera
ge/lowest/highest salaries as well as the oldest/youngest employees for each state.
12. In the Operations table, click the [+] button to add a line and then click in the Output column list
to select the output column that will hold the computed data.
13. In the Function field, select the relevant operation to be carried out.
Writing the output data into MySQL
Procedure
1. In the design workspace, select tCombinedSQLOutput and click the Component tab to access the
2. On the Database type list, select the relevant database.

3. On the Component list, select the relevant database connection component if more than one
connection is used.
4. In the Table field, enter the name of the target table which will store the results of the
aggregation operations, empl_by_state in this case
The tCombinedSQLOutput component requires that an output table already exists in the database
to work. That is why the empl_by_state table was created earlier in the scenario.
In this example, the Schema field doesn't need to be filled out as the database is not Oracle.
5. Click the Sync columns button to retrieve the schema from the previous component.
In this scenario, tCombinedSQLOutput instantiates seven columns coming from the previous
component in the Job design (tCombinedSQLAggregate): state, empl_count, avg_salary, min_salary,
max_salary, oldest_empl and youngest_empl.
Committing the data into the database
Procedure
1. In the design workspace, select tCombinedSQLCommit and click the Component tab to access the
2. On the Component list, select the relevant database connection component if more than one
connection is used.
3. Clear the check box Close Connection.
485
Retrieving the filtered and aggregated data
Procedure
1. In the design workspace, select tMysqlIntput and click the Component tab to define its basic set
tings.
2. Select the check box Use an existing connection ans choose tMysqlConnection_1 from the list.
3. Click on the [...] button.next to Edit schema to enter the following schema:
4. In the field Table Name, enter empl_by_state and in the Query field, enter select * from emp
l_by_state.
486
6. Click the Sync columns button to retrieve the schema from the previous component and select the
Table (print values in cells of a table) mode.

Procedure
1. Save your Job and press F6 to execute it.
2. The Run tab opens, where you can observe the result of the Job execution.
3. The output data retrieved by the tLogRow is visible in a table.
Results
Rows are inserted into a seven-column table empl_by_state in the database. The table shows, per
defined state, the number of employees, the average salary, the lowest and highest salaries as well as
the oldest and youngest employees.
487
tCombinedSQLFilter
tCombinedSQLFilter
Filters data by reorganizing, deleting or adding columns based on the source table and to filter the
given data source using the filter conditions.
tCombinedSQLFilter allows you to alter the schema of a source table through column name mapping
and to define a row filter on that table. Therefore, it can be used to filter columns and rows at the
same time. This component has real-time capabilities since it runs the data filtering on the DBMS
itself.
tCombinedSQLFilter Standard properties

These properties are used to configure tCombinedSQLFilter running in the Standard Job framework.
The Standard tCombinedSQLFilter component belongs to the ELT family.
Basic settings
available:
only.
Guide.

Guide.
Logical operator between conditions Select the logical operator between the filter conditions
defined in the Conditions panel.
Two operators are available: Or, And.
488
tCombinedSQLFilter
Conditions Select the type of WHERE clause along with the values and
the columns to use for row filtering.
Input Column: Select the column to filter in the list.
Operator: Select the type of the WHERE clause: =, < >, >, <,
>=, <=, LIKE, IN, NOT IN, and EXIST IN.
Values: Type in the values to be used in the WHERE clause.
Negate: Select this check box to enable the condition that is

opposite to the current setting.
Use custom SQL Customize a WHERE clause by selecting this check box and
editing in the SQL Condition field.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage

Job execution.
Related Scenario
For a related scenario, see Filtering and aggregating table columns directly on the DBMS on page 478.
489
tCombinedSQLInput
tCombinedSQLInput
Extracts fields from a database table based on its schema definition.
Then it passes on the field list to the next component via a Combine row link. The schema of
tCombinedSQLInput can be different from that of the source database table but must correspond to it
in terms of the column order.
tCombinedSQLInput extracts fields from a database table based on its schema. This component also
has column filtering capabilities since its schema can be different from that of the database table.
tCombinedSQLInput Standard properties

These properties are used to configure tCombinedSQLInput running in the Standard Job framework.
The Standard tCombinedSQLInput component belongs to the ELT family.
Basic settings
Table Name of the source database table.
Schema Name of the source table's schema. This field has to be

filled if the database is Oracle.
available:
only.
Guide.

Guide.
Add additional columns This option allows you to call SQL functions to perform
actions on columns, provided that these are not insert,
490
tCombinedSQLInput
update or delete actions, or actions that require pre-

processing.

altered.

in order to alter the data in the corresponding column.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage

Job execution.
Related scenario
491
tCombinedSQLOutput
tCombinedSQLOutput
Inserts records from the incoming flow to an existing database table.
tCombinedSQLOutput Standard properties

These properties are used to configure tCombinedSQLOutput running in the Standard Job framework.
The Standard tCombinedSQLOutput component belongs to the ELT family.
Basic settings
Database Type Select the database type.
Component list Select the relevant DB connection component in the list if

more than one connection is used for the current Job.
Table Name of the target database table.
Schema Name of the target database table's schema. This field has
to be filled if the database is Oracle.
available:
only.
Guide.

Guide.
Action on data Select INSERT from the list to insert the records from the
incoming flow to the target database table.
492
tCombinedSQLOutput
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage

Job execution.
Related scenario
493
tContextDump
tContextDump
Copies the context setup of the current Job to a flat file, a database table, etc., which can then be used
by tContextLoad.
Together with tContextLoad, this component makes it simple to apply the context setup of one Job to
another.
tContextDump dumps the context setup of the current Job to the subsequent component.
tContextDump Standard properties

These properties are used to configure tContextDump running in the Standard Job framework.
The Standard tContextDump component belongs to the Misc family.
Basic settings
Schema and Edit schema A schema is a row description, it defines the fields that will
be processed and passed on to the next component.
Note:
The schema of tContextDump is read only and made
up of two columns, Key and Value, corresponding to the
parameter name and the parameter value of the Job
context.
Hide Password Select this check box to hide the value of context
parameter password, namely displaying the value of context
parameters whose Type is Password as *.
Global Variables

check box.
use from it.
User Guide.
494
tContextDump
Usage
Usage rule As a start component, tContextDump dumps the context

setup of the current Job to a file, a database table, etc.
Related scenarios
495
tContextLoad
tContextLoad
Loads a context from a flow.
This component performs also two controls. It warns when the parameters defined in the incoming
flow are not defined in the context, and the other way around, it also warns when a context value is
not initialized in the incoming flow.But note that this does not block the processing.
tContextLoad modifies dynamically the values of the active context.
tContextLoad Standard properties

These properties are used to configure tContextLoad running in the Standard Job framework.
The Standard tContextLoad component belongs to the Misc family.
Basic settings
Schema and Edit schema A schema is a row description, it defines the fields that will
be processed and passed on to the next component.
In tContextLoad, the schema must be made of two columns,
including the parameter name and the parameter value to
be loaded.
If a variable loaded, but not in the context If a variable is loaded but does not appear in the context,
select how the notification must be displayed. In the shape
of an Error, a warning or an information (info).
If a variable in the context, but not loaded If a variable appears in the context but is not loaded, select
how the notification must be displayed. In the shape of an
Error, a warning or an information (info).
Print operations Select this check box to display the context parameters set
in the Run view.
Disable errors Select this check box to prevent the error from displaying.
Disable warnings Select this check box to prevent the warning from
displaying.
Disable infos Select this check box to prevent the information from
displaying.
free rows.
Advanced settings
496
tContextLoad
Global Variables

KEY_NOT_INCONTEXT: the variables are loaded but do not
appear in the context. This is an After variable and it returns
a string.
KEY_NOT_LOADED: the variables not loaded but appear in
the context. This is an After variable and it returns a string.
check box.
use from it.
User Guide.
Usage
Usage rule This component relies on the data flow to load the context
values to be used, therefore it requires a preceding input
component and thus cannot be a start component.
Code field with a context variable to turn on or off the Print
operations option dynamically at runtime.
When a dynamic parameter is defined, the corresponding
Print operations option in the Basic settings view becomes
unusable.
User Guide.
Limitation tContextLoad does not create any non-defined variable in

the default context.
Reading data from different MySQL databases using

dynamically loaded connection parameters
The Job in this scenario is made of two subJobs. The first subJob aims at dynamically loading the
context parameters from two text files, and the second subJob uses the loaded context parameters to
connect to two different databases and to display the content of an existing database table of each
497
tContextLoad
of them. With the context settings in the Job, we can decide which database to connect to and choose
whether to display the set context parameters on the console dynamically at runtime.

Procedure
1. Drop a tFileInputDelimited component and a tContextLoad component from the Palette onto the
design workspace, and link them using a Row > Main connection to form the first subJob.
2. Drop a tMysqlInput component and a tLogRow component onto the design workspace, and link
them using a Row > Main connection to form the second subJob.
3. Link the two subJobs using a Trigger > On Subjob Ok connection.
Preparing the contexts and context variables

Procedure
1. Create two delimited files corresponding to the two contexts in this scenario, namely two
databases we will access, and name them test_connection.txt and prod_connection.txt, which
contain the database connection details for testing and actual production purposes respectively.
Each file is made of two columns, containing the parameter names and the corresponding values
respectively. Below is an example:
host;localhost
port;3306
database;test
username;root
password;talend
2. Select the Contexts view of the Job, and click the [+] button at the bottom of the view to add
seven rows in the table to define the following parameters:
• host, String type
• port, String type
• database, String type
• username, String type
• password, Password type
• filename, File type
• printOperations, Boolean type
498
tContextLoad
Note that the host, port, database, username and password parameters correspond to the parameter
names in the delimited files and are used to set up the desired database connection, the filename
parameter is used to define the delimited file to read at Job execution, the printOperations
parameter is used to decide whether to print the context parameters set by the tContextLoad
component on the console.
3. Click the Contexts tab and click the [+] button at the upper right corner of the panel to open the
Configure Contexts dialog box.
4. Select the default context, click the Edit button and rename the context to Test.
5. Click New to add a new context named Production. Then click OK to close the dialog box.
6. Back in the Contexts tab view, define the value of the filename variable under each context by
clicking in the respective Value field and browse to the corresponding delimited file.
7. Select the Prompt check box next to the Value field of the filename variable for both contexts to
show the Prompt fields and enter the prompt message to be displayed at the execution time.
8. For the printOperations variable, click in the Value field under the Production context and select
false from the list; click in the Value field under the Test context and select true from the list.
Then select the Prompt check box under both contexts and enter the prompt message to be
displayed at the execution time.
499
tContextLoad

Procedure
1. In the tFileInputDelimited component Basic settings panel, fill the File name/Stream field with
the relevant context variable we just defined: context.filename.
2. Define the file schema manually (Built-in). It contains two columns defined as: key and value.
3. Accept the defined schema to be propagated to the next component (tContextLoad).
4. In the Dynamic settings view of the tContextLoad component, click the [+] button to add a row
in the table, and fill the Code field with context.printOperations to use context variable
printOperations we just defined. Note that the Print operations check box in the Basic settings
view now becomes highlighted and unusable.
5. Then double-click to open the tMysqlInput component Basic settings view.

6. Fill the Host, Port, Database, Username, and Password fields with the relevant variables stored
in the delimited files and defined in the Contexts tab view: context.host, context.port,
context.database, context.username, and context.password respectively in this
example, and fill the Table Name field with the actual database table name to read data from,
customers for both databases in this example.
500
tContextLoad
7. Then fill in the Schema information. If you stored the schema in the Repository Metadata, then
you can retrieve it by selecting Repository and the relevant entry in the list.
In this example, the schema of both database tables is made of four columns: id (INT, 2 characters
long), firstName (VARCHAR, 15 characters long), lastName (VARCHAR, 15 characters long), and city
(VARCHAR, 15 characters long).
8. In the Query field, type in the SQL query to be executed on the DB table specified. In this
example, simply click Guess Query to retrieve all the columns of the table, which will be displayed
on the Run tab, through the tLogRow component.
9. In the Basic settings view of the tLogRow component, select the Table option to display data
records in the form of a table.
Executing the Job

Procedure
1. Press Ctrl+S to save the Job, and press F6 to run the Job using the default context, which is Test in
this use case.
A dialog box appears to prompt you to specify the delimited file to read and decide whether to
display the set context parameters on the console.
501
tContextLoad
You can specify a file other than the default one if needed, and clear the Show loaded variables
check box if you do not want to see the set context variables on the console. To run the Job using
the default settings, click OK.
The context parameters and content of the database table in the Test context are all displayed on
the Run console.
2. Now select the Production context and press F6 to launch the Job again. When the prompt dialog
box appears, simply click OK to run the Job using the default settings.
502
tContextLoad
The content of the database table in the Production context is displayed on the Run console.
Because the printOperations variable is set to false, the set context parameters are not displayed
on the console this time.
503
tConvertType
tConvertType
Converts one Talend java type to another automatically, and thus avoid compiling errors.
tConvertType allows specific conversions at runtime from one Talend java type to another.
tConvertType Standard properties

These properties are used to configure tConvertType running in the Standard Job framework.
The Standard tConvertType component belongs to the Processing family.
Basic settings
available:
only.
component only.

Job designs.
Auto Cast This check box is selected by default. It performs an

automatic java type conversion.
Manual Cast This mode is not visible if the Auto Cast check box is
selected. It allows you to precise manually the columns
where a java type conversion is needed.
Set empty values to Null before converting This check box is selected to set the empty values of String
or Object type to null for the input data.
Die on error This check box is selected to kill the Job when an error
occurs.
504
tConvertType
Note:
Not available for Map/Reduce Jobs.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component cannot be used as a start component as it

requires an input flow to operate.
Converting java types

This Java scenario describes a four-component Job where the tConvertType component is used to
convert Java types in three columns, and a tMap is used to adapt the schema and have as an output
the first of the three columns and the sum of the two others after conversion.
Note:
In this scenario, the input schemas for the input delimited file are stored in the repository, you can
simply drag and drop the relevant file node from Repository - Metadata - File delimited onto the
design workspace to automatically retrieve the tFileInputDelimited component's setting. For more
information, see Talend Studio User Guide.
505
tConvertType
Dropping the components

Procedure
1. Drop the following components from the Palette onto the design workspace: tConvertType, tMap,
and tLogRow.
2. In the Repository tree view, expand Metadata and from File delimited drag the relevant node,
JavaTypes in this scenario, to the design workspace.
The Components dialog box displays.
3. From the component list, select tFileInputDelimited and click Ok.
A tFileInputComponent called Java types displays in the design workspace.
4. Connect the components using Row > Main links.

Procedure
1. Double-click tFileInputDelimited to enter its Basic settings view.
2. Set Property Type to Repository since the file details are stored in the repository. The fields to
follow are pre-defined using the fetched data.
The input file used in this scenario is called input. It is a text file that holds string, integer, and
float java types.
Fill in all other fields as needed. For more information, see tFileInputDelimited on page 1015.
In this scenario, the header and the footer are not set and there is no limit for the number of
processed rows.
506
tConvertType
3. Click Edit schema to describe the data structure of this input file. In this scenario, the schema is
made of three columns, StringtoInteger, IntegerField, and FloatToInteger.
4. Click Ok to close the dialog box.

5. Double-click tConvertType to enter its Basic settings view.
6. Set Schema Type to Built in, and click Sync columns to automatically retrieve the columns from
the tFileInputDelimited component.
7. Click Edit schema to describe manually the data structure of this processing component.
In this scenario, we want to convert a string type data into an integer type and a float type data
into an integer type.
Click OK to close the Schema of tConvertType dialog box.
8. Double-click tMap to open the Map editor.
The Map editor displays the input metadata of the tFileInputDelimited component
507
tConvertType
9. In the Schema editor panel of the Map editor, click the plus button of the output table to add two
rows and name them to StringToInteger and Sum.
10. In the Map editor, drag the StringToInteger row from the input table to the StringToInteger row in
the output table.
11. In the Map editor, drag each of the IntegerField and the FloatToInteger rows from the input table to
the Sum row in the output table and click OK to close the Map editor.
508
tConvertType
Executing the Job

Procedure
2. Press F6 to execute it.
The string type data is converted into an integer type and displayed in the StringToInteger column
on the console. The float type data is converted into an integer and added to the IntegerField
value to give the addition result in the Sum column on the console.
509
tCosmosDBBulkLoad
tCosmosDBBulkLoad
Imports data files in different formats (CSV, TSV or JSON) into the specified Cosmos database so that
the data can be further processed.
tCosmosDBBulkLoad Standard properties

These properties are used to configure tCosmosDBBulkLoad running in the Standard Job framework.
The Standard tCosmosDBBulkLoad component belongs to the Cloud and the Databases families.
Fabric.
Basic settings
available:
only.
MongoDB directory Fill in this field with the MongoDB home directory.
Use replica set address or multiple query routers Select this check box to show the Server addresses table.
In the Server addresses table, define the sharded MongoDB
databases or the MongoDB replica sets you want to connect
to.
Server and Port Enter the IP address and listening port of the database
server.
Available when the Use replica set address or multiple
query routers check box is not selected.
Database Enter the name of the MongoDB database to be connected

to.
Collection Type in the name of the collection to import data to.
Drop collection if exist Select this check box to remove the collection if it already
exists.
510
tCosmosDBBulkLoad
Authentication mechanism Among the mechanisms listed on the Authentication

mechanism drop-down list, the NEGOTIATE one is
recommended if you are not using Kerberos, because it
automatically select the authentication mechanism the
most adapted to the MongoDB version you are using.
For details about the other mechanisms in this list,
see MongoDB Authentication from the MongoDB
documentation.
Set Authentication database If the username to be used to connect to MongoDB has

been created in a specific Authentication database of
MongoDB, select this check box to enter the name of this
Authentication database in the Authentication database
field that is displayed.
For further information about the MongoDB Authentication
database, see User Authentication database.

settings.
Available when the Required authentication check box is
selected.
If the security system you have selected from the
Authentication mechanism drop-down list is Kerberos, you
need to enter the User principal, the Realm and the KDC
server fields instead of the Username and the Password
fields.
Data file Type in the full path of the file from which the data will be
imported or click the [...] button to browse to the desired
data file.
Make sure that the data file is in standard format. For
example, the fields in CSV files should be separated with
commas.
File type Select the proper file type from the list. CSV, TSV and JSON
are supported.
The JSON file starts with an array Select this check box to allow tCosmosDBBulkload to read
the JSON files starting with an array.
This check box appears when the File type you have
selected is JSON.
Action on data Select the action that you want to perform on the data.
• Insert: Insert the data into the database.
Note that when inserting data from CSV or TSV files
into the MongoDB database, you need to specify fields
either by selecting the First line is header check box or
defining them in the schema.
• Upsert: Insert the data if they do not exist or update
the existing data.
Note that when upserting data into the MongoDB
database, you need to specify a list of fields for the
query portion of the upsert operation.
511
tCosmosDBBulkLoad
Upsert fields Customize the fields that you want to upsert as needed.
This table is available when you select Upsert from the
Action on data list.
First line is header Select this check box to use the first line in CSV or TSV files
as a header.
This check box is available only when you select CSV or TSV
from the File type list.
Ignore blanks Select this check box to ignore the empty fields in CSV or
TSV files.
This check box is available only when you select CSV or TSV
from the File type list.
Print log Select this check box to print logs.
Advanced settings
Additional arguments Complete this table to use the additional arguments as

required.
For example, you can use the argument "--jsonArray" to
accept the import of data expressed with multiple MongoDB
documents within a single JSON array. For more information
about the additional arguments, go to http://docs.mo
ngodb.org/manual/reference/program/mongoimport/ and
read the description of options.
tStatCatcher Statistics Select this check box to collect the log data at a component
level.
Usage
Usage rule This component can be used together with the

tCosmosDBInput component to verify if the data is
imported as expected.
Limitation The MongoDB client tool needs to be installed on the

machine where Jobs using this component are executed.
512
tCosmosDBConnection
tCosmosDBConnection
Creates a connection to a CosmosDB database and reuse that connection in other components.
tCosmosDBConnection Standard properties

These properties are used to configure tCosmosDBConnection running in the Standard Job framework.
The Standard tCosmosDBConnection component belongs to the Cloud and the Databases families.
Fabric.
Basic settings
API Select the database API to be used. Then the corresponding

parameters to be defined are displayed in the Component
view.
In the current version of this component, only the MongoDB
API is supported. For this reason, MongoDB database is
often mentioned in the documentation of the CosmosDB
components.
to.
server.

to.

documentation.

513
tCosmosDBConnection

settings.
Available when the Use authentication check box is
selected.
fields.
Advanced settings
level.
No query timeout Select this check box to prevent MongoDB servers from
stopping idle cursors at the end of 10-minute inactivity of
these cursors. In this situation, an idle cursor will stay open
until either the results of this cursor are exhausted or you
manually close it using the cursor.close() method.
A cursor for MongoDB is a pointer to the result set of a
query. By default, that is to say, with this check box being
clear, a MongoDB server automatically stops idle cursors
after a given inactivity period to avoid excess memory use.
For further information about MongoDB cursors, see https://
docs.mongodb.org/manual/core/cursors/.
Usage
Usage rule This component is generally used with other CosmosDB

components, particularly tCosmosClose.
514
tCosmosDBInput
tCosmosDBInput
Retrieves certain documents from a Cosmos database collection by supplying a query document
containing the fields the desired documents should match.
tCosmosDBInput Standard properties

These properties are used to configure tCosmosDBInput running in the Standard Job framework.
The Standard tCosmosDBInput component belongs to the Cloud and the Databases families.
Fabric.
Basic settings

view.
components.
to.
server.

to.
Set read preference Select this check box and from the Read preference drop-
down list that is displayed, select the member to which you
need to direct the read operations.
If you leave this check box clear, the Job uses the default
Read preference, that is to say, uses the primary member in
a replica set.
For further information, see MongoDB's documentation
about Replication and its Read preferences.

515
tCosmosDBInput

documentation.


settings.
selected.
fields.
Collection Name of the collection in the database.
available:
only.
If a column in the database is a JSON document and you
need to read the entire document, put an asterisk (*) in the
DB column column, without quotation marks around.
Query Specify the query condition. This field is available only

when you have selected Find query from the Query type
drop-down list.
For example, type in "{id:4}" to retrieve the record
whose id is 4 from the collection specified in the Collection
field.
Different from the query statements required in the
MongoDB client software, the query here refers to the
contents inside find(), such as the query here {id:4}
516
tCosmosDBInput
versus the MongoDB client query db.blog.find({

id:4}).
Mapping Each column of the schema defined for this component

represents a field of the documents to be read. In this table,
you need to specify the parent nodes of these fields, if any.
For example, in the document reading as follows
{
_id: ObjectId("5099
803df3f4948bd2f98391"),
person: { first:
"Joe", last: "Walker" }
}
The first and the last fields have person as their parent node
but the _id field does not have any parent node. So once
completed, this Mapping table should read as follows:
Column Parent node path

_id
first "person"
last "person"
Sort by Specify the column and choose the order for the sort
operation.
This field is available only when you have selected Find
query from the Query type drop-down list.
Limit Type in the maximum number of records to be retrieved.

This field is available only when you have selected Find
query from the Query type drop-down list.
Advanced settings
component level.
Enable external sort Since the aggregation pipeline stages have a maximum
memory use limit (100 megabytes) and a stage exceeding
this limit will produce errors, when handling large datasets,
select this check box to avoid aggregation stages exceeding
this limit.
For further information about this external sort, see Large
sort operation with external sort.
517
tCosmosDBInput
Usage
Usage rule As a start component, tCosmosDBInput allows you to

retrieve records from a collection in the Cosmos database
and transfer them to the following component for display or
storage.
518
tCosmosDBOutput
tCosmosDBOutput
Inserts, updates, upserts or deletes documents in a Cosmos database collection based on the incoming
flow from the preceding component in the Job.
tCosmosDBOutput Standard properties

These properties are used to configure tCosmosDBOutput running in the Standard Job framework.
The Standard tCosmosDBOutput component belongs to the Cloud and the Databases families.
Fabric.
Basic settings

view.
components.
to.
server.

to.
Set write concern Select this check box to set the level of acknowledgemen
t requested from for write operations. Then you need to
select the level of this operation.
For further information, see the related MongoDB
documentation on http://docs.mongodb.org/manual/core/
write-concern/.
Bulk write Select this check box to insert, update or remove data in
bulk. Note this feature is available only when the version of
MongoDB you are using is 2.6+.
Then you need to select Ordered or Unordered to define
how the MongoDB database processes the data sent by the
Studio.
519
tCosmosDBOutput
• If you select Ordered, MongoDB processes the queries

sequentially.
• If you select Unordered, MongoDB optimizes the bulk
write operations without keeping the order in which
the individual operations were inserted in the bulk
write.
In the Bulk write size field, enter the size of each query
group to be processed by MongoDB. In the documentation
of MongoDB, some restrictions and expected behaviors as
to this size are explained. You can find the details on http://
docs.mongodb.org/manual/core/bulk-write-operations/.

documentation.


settings.
selected.
fields.
Collection Name of the collection in the database.
Drop collection if exist Select this check box to drop the collection if it already
exists.
Action on data The following operations are available:

• Insert: insert documents.
• Set: modifies the existing fields of an existing
document and appends a field if it does not exist in
this document.
If you need to apply this action on all the documents
in the collection to be used, select the Update all
document check box that is displayed; otherwise, only
the first document is updated.
520
tCosmosDBOutput
• Update: replaces the existing documents with the

incoming data but keeps the technical ID of these
documents.
• Upsert: inserts a document if it does not exist
otherwise it applies the same rules as Update.
• Upsert with set: inserts a document if it does not exist
otherwise it applies the same rules as Set
If you need to apply this action on all the documents
in the collection to be used, select the Update all
document check box that is displayed; otherwise, only
the first document is updated.
• Delete: delete documents.
available:
only.
component only.

Job designs.
alend.com).
Mapping Each column of the schema defined for this component

represents a field of the documents to be read. In this table,
you need to specify the parent nodes of these fields, if any.
For example, in the document reading as follows
{
_id: ObjectId("5099
803df3f4948bd2f98391"),
person: { first:
"Joe", last: "Walker" }
}
521
tCosmosDBOutput
The first and the last fields have person as their parent node
but the _id field does not have any parent node. So once
completed, this Mapping table should read as follows:
Column Parent node path

_id
first "person"
last "person"
Not available when the Generate JSON Document check box
is selected in Advanced settings.
rows.
Advanced settings
Generate JSON Document Select this check box for JSON configuration:
Configure JSON Tree: click the [...] button to open the
interface for JSON tree configuration. For more information,
see Configuring a JSON Tree on page 3897.
Group by: click the [+] button to add lines and choose the
input columns for grouping the records.
Remove root node: select this check box to remove the root
node.
Data node and Query node (available for update and upsert
actions): type in the name of data node and query node
configured on the JSON tree.
These nodes are mandatory for update and upsert actions.
They are intended to enable the update and upsert actions
though will not be stored in the database.
component level.
Usage
Usage rule tCosmosDBOutput executes the action defined on the

collection in the database based on the flow incoming from
the preceding component in the Job.
Limitation • The "multi" parameter, which allows to update multiple

documents at a time, is not supported. Therefore, if
two documents have the same key, the first is always
updated, but the second never will.
522
tCosmosDBOutput
• For the update operation, the key cannot be a JSON

array.
523
tCosmosDBRow
tCosmosDBRow
Executes the commands of the Cosmos database.
tCosmosDBRow Standard properties

These properties are used to configure tCosmosDBRow running in the Standard Job framework.
The Standard tCosmosDBRow component belongs to the Cloud and the Databases families.
Fabric.
Basic settings

view.
components.
to.
server.

to.

documentation.

524
tCosmosDBRow


settings.
selected.
fields.
available:
only.
Execute command Select this check box to enter MongoDB commands in the
Command field for execution.
• Command: in this field, enter the command to be
executed, if this command contains one single
variable.
For example, if you need to construct the command
{"isMaster": 1}
You need simply enter isMaster within quotation

marks.
• Construct command from keys and values: if the
command to be executed contains multiple variables,
select this check box and in the Command keys and
values table, add the variables and their respective
values to be used.
525
tCosmosDBRow
For example, if you need to construct the following

command
{ renameCollection : "<source_names
pace>" , to : "<target_namespace>" ,
dropTarget : < true | false > }
You need to add three rows to the Command keys and

values table and enter one variable-value pair to each
row within quotation marks:
"renameCollection" "old_name"
"to" "new_name"
"dropTarget" "false"
• Construct command from a JSON string: if you want

to directly enter the command to be used, select this
check box and enter this command in the JSON string
command field that is displayed. Only one command is
allowed per tCosmosDBRow.
For example:
"{createIndexes: 'restaurants',
indexes : [{key : {restaurant_id
: 1}, name: 'id_index_2', unique:
true}]}"
Note that you must use single quotation marks to

surround the string values used in the command and
double quotation marks to surround the command
itself.
For further information about the MongoDB commands
you can use in this field, see https://docs.mongodb.org/
manual/reference/command/.
rows.
Advanced settings
component level.
Usage
Usage rule tCosmosDBRow allows you to manipulate the Cosmos

database through the MongoDB commands.
526
tCouchbaseDCPInput
tCouchbaseDCPInput
Queries the documents from the Couchbase database, under the Database Change Protocol (DCP), a
streaming protocol.
tCouchbaseDCPInput Standard properties

These properties are used to configure tCouchbaseDCPInput running in the Standard Job framework.
The Standard tCouchbaseDCPInput component belongs to the Databases NoSQL family.
Fabric.
Basic settings
Bootstrap nodes Enter the name or IP of the node to be bootstrapped by

Couchbase SDK. As Couchbase recommends to specify
multiple nodes to bootstrap, enter the names or IPs of these
nodes in this field, separating them using commas (,).
For further information about Couchbase bootstrapping, see
How Couchbase SDKs connect to the cluster.
You can find the node names on the Servers page in your
Couchbase Web Console. If you need further information,
contact the administrator of your Couchbase cluster or
consult your Couchbase documentation.
Note that the Couchbase servers do not support proxies; for
this reason, the Couchbase components from Talend do not
support proxies either.
Password Provide the authentication credentials to a bucket.

settings.
If you are using Couchbase V5.0 and onwards, enter the
same value you put in the Bucket field as password, because
since Couchbase V5.0, no password is associated with a
bucket. However, on Couchbase, you need to create a user
with appropriate role to access the buckets.
For further information about the access control and
other important requirements on the Couchbase side, see
Couchbase release note of your version.
The schema of this component is read-only. The content
column stores the documents to be used, the key column
the IDs of these documents and the other columns the
Couchbase technical information.
Bucket Enter, within double quotation marks, the name of the data
bucket in the Couchbase database.
527
tCouchbaseDCPInput
Ensure that the credentials you are using have the

appropriate rights and permissions to access this bucket.
If you are using Couchbase V5.0 and onwards, this bucket
name is the user name you have created in the Security tab
of your Couchbase UI.
Advanced settings
Connect Timeout Define the timeout interval (in seconds) for the connection
to be aborted.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule As a start component, tCouchbaseDCPInput reads the

documents from the Couchbase database.
528
tCouchbaseDCPOutput
tCouchbaseDCPOutput
Upserts documents in the Couchbase database based on the incoming flat data from preceding
components, under the Database Change Protocol (DCP), a streaming protocol.
This means that it adds a new document or replaces its value if it already exists.
tCouchbaseDCPOutput Standard properties

These properties are used to configure tCouchbaseOutput running in the Standard Job framework.
The Standard tCouchbaseOutput component belongs to the Databases NoSQL family.
Fabric.
Basic settings
Password Provide the authentication credentials to a bucket.

settings.

529
tCouchbaseDCPOutput
available:
only.
component only.

Job designs.
alend.com).
Field to use as ID Enter, without double quotation marks, the name of the
column from the schema to provide IDs for the documents
to be written to Couchbase.
an error occurs.
the process for error-free rows.
Advanced settings
Global Variables

NB_SUCCESS: the number of rows successfully processed.
This is an After variable and it returns an integer.
530
tCouchbaseDCPOutput
NB_REJECT: the number of rows rejected. This is an After

check box.
use from it.
User Guide.
Usage
Usage rule Preceded by an input component, tCouchbaseDCPOutput

wraps flat data into documents for storage in the Couchbase
database.
531
tCouchbaseInput
tCouchbaseInput
Queries the documents from the Couchbase database.
tCouchbaseInput Standard properties

These properties are used to configure tCouchbaseInput running in the Standard Job framework.
The Standard tCouchbaseInput component belongs to the Databases NoSQL family.
Fabric.
Basic settings

Username and Password Provide the authentication credentials to your Couchbase

cluster.
settings.
When using non-JSON documents, define an id column of
the String type, then define a content column. The type
of this content column should be String for the string
documents and byte[] for the binary documents.
When it comes to JSON documents, define the the fields
that present in your JSON documents.
532
tCouchbaseInput
Document type Data stored in a Couchbase database could be JSON, strings

or binary. From this drop-down list, select the type of the
data you need to use with Couchbase.
Note that it is not recommended to mix JSON, binary and
string documents in a same bucket, as this mixture could
make the document processing error-prone.
If you need to use N1QL to query string or binary
documents, the only possible way is to use the document
ID to get the document. For example, if you need to get a
document for which the ID number is 2, the N1QL query
should be
SELECT meta().id as `_meta_id_` FROM

`bucket_name` where meta().id = '2';
Note that the quotations marks around _meta_id_ and

bucket_name are backticks (`).
Query Type Select the type of queries to be used from the following
options:
• Select All: select all the contents of a given bucket.
• N1QL: use a N1QL statement to perform fine-tuned
queries.
• Document ID: use the document IDs to select
documents. You need to enter the ID to be used in
theDocument ID field that is displayed. Only one
document ID is allowed per component.
Use N1QL query Select this check box and in the Query field that is
displayed, enter a N1QL query statement to perform
complex actions.
Only one statement is allowed and do not put quotation
marks around your statement.
• When you use wildcards in your query such as SELECT
*, the returned result of this query is wrapped in the
bucket name used in this query. In this situation, define
only one column for the result in the schema of this
component.
For example, when performing this query
SELECT * FROM `travel_sample` limit

3
533
tCouchbaseInput
The returned result is wrapped in the

travel_sample bucket, reading like this:
[
{
"travel_sample": {
"callsign": "MILE-AIR",
"country": "United States",
"iata": "Q5",
"icao": "MLA",
"id": 10,
"name": "40-Mile Air",
"type": "airline"
}
},
{
"travel_sample": {
"callsign": "TXW",
"iata": "TQ",
"icao": "TXW",
"id": 10123,
"name": "Texas Wings",
"type": "airline"
}
},
{
"travel_sample": {
"callsign": "atifly",
"iata": "A1",
"icao": "A1F",
"id": 10226,
"name": "Atifly",
"type": "airline"
}
}
]
In the schema, define one single column called, for

example, travel_sample to store the result and
select String as its type.
• If you use a query without wildcards, such as
SELECT callsign, country, iata,

icao, id, name, type FROM
`travel_sample` limit 3;
534
tCouchbaseInput
The returned result is not wrapped, reading like this:
[
{
"callsign": "MILE-AIR",
"iata": "Q5",
"icao": "MLA",
"id": 10,
"name": "40-Mile Air",
"type": "airline"
},
{
"callsign": "TXW",
"iata": "TQ",
"icao": "TXW",
"id": 10123,
"name": "Texas Wings",
"type": "airline"
},
{
"callsign": "atifly",
"iata": "A1",
"icao": "A1F",
"id": 10226,
"name": "Atifly",
"type": "airline"
}
]
In this situation, define the columns that represent

the structure of the actual business data, such as the
following columns: callsign, country, iata, icao,
id, name and airline.
Advanced settings
Connect Timeout Enter, without quotation marks, the timeout interval (in
seconds) for the connection to be aborted.
Limit rows Enter the maximum number of rows to be read. This field is
not available when you use a N1QL query.
Global Variables

check box.
use from it.
535
tCouchbaseInput

User Guide.
Usage
Usage rule As a start component, tCouchbaseInput reads the documents

from the Couchbase database.
536
tCouchbaseOutput
tCouchbaseOutput
Upserts documents in the Couchbase database based on the incoming flat data from preceding
components.
This means that it adds a new document or replaces its value if it already exists.
tCouchbaseOutput Standard properties

These properties are used to configure tCouchbaseOutput running in the Standard Job framework.
The Standard tCouchbaseOutput component belongs to the Databases NoSQL family.
Fabric.
Basic settings

Username and Password Provide the authentication credentials to your Couchbase

cluster.
settings.
When using non-JSON documents, define an id column of
the String type, then define a content column. The type
of this content column should be String for the string
documents and byte[] for the binary documents.
537
tCouchbaseOutput
When it comes to JSON documents, define the the fields

that present in your JSON documents.
Document type Data stored in a Couchbase database could be JSON, strings

or binary. From this drop-down list, select the type of the
data you need to use with Couchbase.
Note that it is not recommended to mix JSON, binary and
string documents in a same bucket, as this mixture could
make the document processing error-prone.
Field to use as ID Enter, without double quotation marks, the name of the
column from the schema to provide IDs for the documents
to be written to Couchbase.
Partial update Select this check box to update only a subset of a

document, without changing any other property that is not
provided by the incoming data.
If you leave this check box, when a document already exists
in the database, that is to say, when this document and a
document from the incoming data have the same ID, the
whole existing document is replaced with the incoming one.
Use N1QL Query with parameters Select this check box to apply variables in your N1QL
queries. Once selecting it, the Query field and the Query
Parameters wraps flat data into documents for storage in
the Couchbase database. table are displayed for you to
enter your query and define the variables to be used in your
query.
Only one query is allowed per tCouchbaseOutput.
For example, enter this query in the Query field:
INSERT INTO 'travel-sample' (KEY,

VALUE)
VALUES
($nm,
{
"name":$nm,
"type":$tp,
"country":$cnty,
"callsign":$call,
"id":$zid
}
)
Then you need to define all of the variables (the strings

starting with $) used in this query in the Query Parameters
table.
Query Parameter Name Column

nm name
tp type
cnty countries
call company
zid docid
538
tCouchbaseOutput
This table creates a map between the variables in your

query and the columns from the schema you have defined
in the component for your data. The values in the Column
column are the column names from this schema; the values
in the Query Parameter Name column are the variables from
your query.
Advanced settings
Connect Timeout Enter, without quotation marks, the timeout interval (in
seconds) for the connection to be aborted.
Global Variables

NB_SUCCESS: the number of rows successfully processed.
NB_REJECT: the number of rows rejected. This is an After
check box.
use from it.
User Guide.
Usage
Usage rule Preceded by an input component, tCouchbaseOutput
539
tCreateTable
tCreateTable
Creates a table for a specific type of database.
tCreateTable Standard properties

These properties are used to configure tCreateTable running in the Standard Job framework.
The Standard tCreateTable component belongs to the Databases family.
Basic settings
Database Type Select the type of the database. The connection properties
may differ slightly according to the database type selected.
filled in.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to
be shared in the Basic settings view of the connection
component which creates that very database
connection.
connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
DB Version Select the version of the database.
Host The IP address or hostname of the database.
Port The listening port number of the database.
Database name The name of the database.
Schema The name of the database schema.
540
tCreateTable
This property is available for DB2, Exasol, Greenplum,

Informix, MS SQL Server, Oracle, PostgresPlus, Postgresql,
Redshift, Sybase, and Vertica database types.
Access File The path to the Access database file.
Firebird File The path to the Firebird database file.
Interbase File The path to the Interbase database file.
SQLite File The path to the SQLite database file.
Running Mode Select the Server Mode that corresponds to your database
setup.
This property is available only for the HSQLDb database
type.
Use TLS/SSL Sockets Select this check box to enable the security mode if
required.
type.
DB Alias The name of the database.

type.
Framework Type Select the framework type for your database.

This property is available only for the JavaDb database type.
DB Root Path Browse to your database root.

This property is available only for the JavaDb database type.
ODBC Name The name of the ODBC database.
Connection Type Select the Oracle database connection type.

• Oracle SID: select this connection type to uniquely
identify a particular database on a system.
• Oracle Service: select this connection type to use
the TNS alias that you give when you connect to the
remote database.
• Oracle OCI: select this connection type to use Oracle
Call Interface with a set of C-language software APIs
that provide an interface to the Oracle database.
• Oracle Custom: select this connection type to access a
clustered database.
• WALLET: select this connection type to store
credentials in an Oracle wallet.
Account In the Account field, enter, in double quotation marks, the

account name that has been assigned to you by Snowflake.
This property is available only for the Snowflake database
type.
Username and Password The database user authentication data.

541
tCreateTable

settings.
Role Enter, in double quotation marks, the default access control

role to use to initiate the Snowflake session.
This role must already exist and has been granted to the
user ID you are using to connect to Snowflake. If this field
is left empty, the PUBLIC role is automatically granted. For
information about Snowflake access control model, see
Understanding the Access Control Model.
type.
Table name The name of the table to be created.
Table Action Select the action to be carried out on the table.

• Create table: the specified table doesn't exist and gets
created.
• Create table if not exists: the specified table is created
if it does not exist.
• Drop table if exits and create: the table is removed if it
already exists and gets created again.
Temporary Table Select this check box to create a temporary table during
an operation, which is automatically dropped at the end
of the operation. Since temporary tables exist in a special
schema, you cannot specify a schema name when creating a
temporary table, and the name of the temporary table must
be distinct from the name of any other table, sequence,
index, and view in the same schema.
Note that once you select to create a temporary table, you
should empty the values when you edit schema.
This field is available only when Postgresql is selected from
the Database Type drop-down list.
Unlogged Table Select this check box to create an unlogged table during an
operation. This way, data is loaded considerably faster than
an ordinary table where the data is logged and then written.
However, the data in an unlogged table is not crash-safe.
This field is available only when Postgresql is selected from
the Database Type drop-down list and Temporary Table is
not selected.
Case Sensitive Select this check box to make the table/column name case
sensitive.
type.
Temporary Table Select this check box if you want to save the created table
temporarily.
This property is available only for the MySQL database type.
Create Select the type of the table to be created.

• SET TABLE: the table that does not allow duplicate
rows.
• MULTISET TABLE: the table that allows duplicate rows.
542
tCreateTable
This property is available only for the Teradata database

type.
alend.com).
available:
only.
Advanced settings

This property is available for the AS/400 and MSSQL Server
database types.
Create projection Select this check box to create a projection.

This property is available only for the Vertica database type.
Enforce database delimited identifiers Select this check box to enable delimited identifiers.
type.
For more information on delimited identifiers, see
https://docs.intersystems.com/latest/csp/docbook/
DocBook.UI.Page.cls?KEY=GSQL_identifiers.
543
tCreateTable
Global Variables
QUERY The query statement being processed. This is a Flow


Usage

a Job or subJob.
Creating new table in a Mysql Database

The Job described below aims at creating a table in a database, made of a dummy schema taken from
a delimited file schema stored in the Repository. This Job is composed of a single component.
Procedure
Procedure
1. Drop a tCreateTable component from the Databases family in the Palette to the design
workspace.
2. In the Basic settings view, and from the Database Type list, select Mysql for this scenario.
3. From the Table Action list, select Create table.

4. Select the Use Existing Connection check box only if you are using a dedicated DB connection
component tMysqlConnection on page 2425. In this example, we won't use this option.
5. In the Property type field, select Repository so that the connection fields that follow are
automatically filled in. If you have not defined your DB connection metadata in the DB connection
directory under the Metadata node, fill in the details manually as Built-in .
6. In the Table Name field, fill in a name for the table to be created.
7. If you want to retrieve the Schema from the Metadata (it doesn't need to be a DB connection
Schema metadata), select Repository then the relevant entry.
544
tCreateTable
8. In any case (Built-in or Repository) click Edit Schema to check the data type mappingClick Edit
Schema to define the data structure.
9. Click the Reset DB Types button in case the DB type column is empty or shows discrepancies
(marked in orange). This allows you to map any data type to the relevant DB data type. Then, click
OK to validate your changes and close the dialog box.
Results
The table is created empty but with all columns defined in the Schema.
545
tCreateTemporaryFile
Creates a temporary file in a specified directory. This component allows you to either keep the
temporary file or delete it after the Job execution.
tCreateTemporaryFile Standard properties

These properties are used to configure tCreateTemporaryFile running in the Standard Job framework.
The Standard tCreateTemporaryFile component belongs to the File family.
Basic settings
Remove file when execution is over Select this check box to delete the temporary file after the
Job execution.
Use default temporary system directory Select this check box to create the file in the default system
temporary directory.
Directory Specify the directory under which the temporary file will be
created.
This field is available only when the Use default temporary
system directory check box is cleared.
Use Prefix Select this check box to specify to use a string as the prefix
of the temporary file name.
File name prefix string helps you prevent existing files from
being overwritten.
Prefix Specify the file name prefix string for the temporary file.
The prefix string needs to be at least three characters in
length.
To prevent existing files from being overwritten, it is
suggested to use a prefix string that is different from those
of any existing file names in the directory.
This option is available only when the Use Prefix check box
is selected.
Template Enter the temporary file name which should contain the
characters XXXX, such as talend_XXXX.
This option is unavailable when the Use Prefix check box is
selected.
Suffix Enter the filename extension of the temporary file.
Advanced settings
546
Global Variables
Global Variables FILEPATH: the path where the file was created. This is an
check box.
use from it.
User Guide.
Usage

a Job or subJob.

Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

Row: Iterate.
Trigger: Run if; On Subjob Ok; On Subjob Error; On
component Ok; On Component Error; Synchronize;
Parallelize.
For further information regarding connections, see Talend

Studio User Guide.
Creating a temporary file and writing data into it

This scenario describes a Job that creates a temporary file in the default system temporary directory,
writes data into the file, and finally displays the data in the file on the console.
547

Procedure
workspace or dropping them from the Palette: a tCreateTemporaryFile component, a tJava
component, a tRowGenerator component, a tFileOutputDelimited component, a tFileInputDeli
mited component, and a tLogRow component.
2. Connect tRowGenerator to tFileOutputDelimited using a Row > Main connection.
3. Do the same to connect tFileInputDelimited to tLogRow.
4. Connect tCreateTemporaryFile to tJava using a Trigger > OnSubjobOk connection.
5. Do the same to connect tJava to tRowGenerator and connect tRowGenerator to tFileInputDeli
mited.

Creating the temporary file
Procedure
1. Double-click tCreateTemporaryFile to open its Basic settings view.
548
2. Select the Remove file when execution is over check box to delete the created temporary file after
the Job execution.
3. Select the Use default temporary system directory check box to create the file in the default
system temporary directory.
4. In the Template field, enter the temporary file name which should contain the characters XXXX. In
this example, it is talend_XXXX.
5. In the Suffix field, enter the filename extension of the temporary file. In this example, it is dat.
6. Double-click tJava to open its Basic settings view.
7. In the Code field, enter the following code to display the default system temporary directory and
the path to the temporary file that will be created on the console:
System.out.println("The default system temporary directory is:\r" + (String)System

.getProperty("java.io.tmpdir"));
System.out.println("The path to the temporary file is:\r" + (String)global
Map.get("tCreateTemporaryFile_1_FILEPATH"));
Writing the data into the file
Procedure
1. Double-click tRowGenerator to open its RowGenerator Editor.
549
2. Click the [+] button to add two columns: id of Integer type and name of String type. Then in the
Functions column, select the predefined function Numeric.sequence(String,int,int) for id and
TalendDataGenerator.getFirstName() for name.
3. In the Number of Rows for RowGenerator field, enter 5 to generate five rows.
4. Click OK to validate the changes and accept the propagation prompted by the pop-up dialog box.
5. Double-click tFileOutputDelimited to open its Basic settings view.
6. In the File Name field, press Ctrl+Space and from the global variable list displayed select
((String)globalMap.get("tCreateTemporaryFile_1_FILEPATH")).
Reading the data from the file
Procedure
1. Double-click tFileInputDelimited to open its Basic settings view.
550
2. In the File name/Stream field, press Ctrl+Space and from the global variable list displayed select
((String)globalMap.get("tCreateTemporaryFile_1_FILEPATH")).
3. Click the [...] button next to Edit schema and in the dialog box displayed define the schema by
adding two columns: id of Integer type and name of String type.
4. Click OK to validate the changes and accept the propagation prompted by the pop-up dialog box.
6. In the Mode area, select Table (print values in cells of a table) to display the output data in a
better way.

Procedure
551
The file talend_MHTI.dat is created under the default system temporary directory C:\Users\lena_li
\AppData\Local\Temp\ during the Job execution, the five generated rows of data is written into it,
then the file is deleted after the Job execution.
552
tDB2BulkExec
tDB2BulkExec
Executes the Insert action on the provided data and gains in performance during Insert operations to
a DB2 database.
tDB2BulkExec Standard properties

These properties are used to configure tDB2BulkExec running in the Standard Job framework.
The Standard tDB2BulkExec component belongs to the Databases family.
Basic settings

connection.
Guide.
553
tDB2BulkExec
Table Schema Name of the DB schema.

settings.
operations:
again.
exist.
Built-in: You create the schema and store it locally for this
Guide.

stored it in the Repository, hence can reuse it. Related topic:
see Talend Studio User Guide.

available:
only.
Use Ingest Command Select this check box to populate data into DB2 using the
INGEST command. For more information about the INGEST
command, see http://www.ibm.com/developerworks/
data/library/techarticle/dm-1304ingestcmd and https://
www-01.ibm.com/support/knowledgecenter/SSEPGG_10
554
tDB2BulkExec
.1.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0057198.html?
cp=SSEPGG_10.1.0%2F3-5-2-4-59.
Load From Select the source of the data to be populated.

• FILE: loads data from a file.
• PIPE: loads data from a pipe.
• FOLDER: loads data from multiple files in a folder.
This list is available only when the Use Ingest Command
Data File Name of the file to be loaded.
Warning:
This file is located on the machine specified by the URI
in the Host field so it should be on the same machine as
the database server.
This field is not visible when PIPE or FOLDER is selected

from the Load From drop-down list.
Pipe Name Enter the name of the pipe.

This field is available only when PIPE is selected from the
Load From drop-down list.
Folder Specify the path to the folder holding the files to be loaded.
This field is available only when FOLDER is selected from
the Load From drop-down list.
Action on Data On the data of the table defined, you can perform one of the
• Insert: Add new records to the table. If duplicates are
found, Job stops.
• Replace: Add new records to the table. If an old record
in the table has the same value as a new record for
a PRIMARY KEY or a UNIQUE index, the old record is
deleted before the new record is inserted.
• Update: Make changes to existing records.
• Delete: Remove the records that match the input data.
• Merge: Merge the input data to the table.
Delete and Merge are available only when the Use Ingest
Command check box is selected.
File Glob Pattern Specify the global expression for the files to be loaded.
This field is available only when FOLDER is selected from
the Load From drop-down list.
Where Clause Enter the WHERE clause to filter the data to be processed.
This field is available only when update or delete is sel
ected from the Action on Data drop-down list.
Custom Insert Values Clause Select this check box and in the Insert Values Clause field
displayed enter the VALUES clause for the insert operation.
This check box is available only when the Use Ingest
Command check box is selected and insert is selected from
the Action on Data drop-down list.
555
tDB2BulkExec
Custom Update Set Clause Select this check box and specify the SET clause for the
update operation by completing the Set Mapping table.
This check box is available only when the Use Ingest
Command check box is selected and update is selected from
the Action on Data drop-down list.
Set Mapping Complete this table to specify the SET clause for the update
operation.
• Column: the name of the column. By default, the fields
in the Column column are same as what they are in the
schema.
• Expression: the expression for the corresponding
column.
This table is available only when the Custom Update Set
Clause check box is selected.
Merge Clause Specify the MERGE clause for the merge operation.
This table is available only when the Use Ingest Command
check box is selected and merge is selected from the Action
on Data drop-down list.
Content Format Select the format of the input file, either Delimited or
Positional.
This list is available only when the Use Ingest Command
Delimited By Enter the character that separates the fields in the

delimited file.
This field is available only when Delimited is selected from
the Content Format drop-down list.
Optionally Enclosed By Enter the character that encloses the string in the delimited
file.
This field is available only when Delimited is selected from
Fixed Length Enter the length (in bytes) of the record in the positional
file.
This field is available only when Positional is selected from
Mapping Complete this table to specify the mapping relationship

between the source column and the DB2 table column.
• Column: the name of the column. By default, the fields
in the Column column are same as what they are in the
schema.
• Is Table Column: select the check box if the
corresponding column is a table column.
• Start Position: the starting position of the
corresponding column.
• End Position: the ending position of the corresponding
column.
The Start Position and End Position columns are
available only when Positional is selected from the
Content Format drop-down list.
556
tDB2BulkExec
This table is available only when the Use Ingest Command

Script Generated Folder Specify the directory under which the script file will be cre
ated.
This field is available only when the Use Ingest Command
Advanced settings

Note:
Field terminated by Character, string or regular expression to separate fields.
Date Format Use this field to define the way months and days are or
dered.
Time Format Use this field to define the way hours, minutes and seconds
are ordered.
Timestamp Format Use this field to define the way date and time are ordered.
Remove load pending When the box is ticked, tables blocked in "pending" status
following a bulk load are de-blocked.
Load options Click + to add data loading options:

Parameter: select a loading parameter from the list.
Value: enter a value for the parameter selected.
level.
Global Variables

check box.
557
tDB2BulkExec

use from it.
User Guide.
Usage
Usage rule This dedicated component offers performance and flexibility

of DB2 query handling.
unusable.
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For tDB2BulkExec related topics, see:
• Inserting transformed data in MySQL database on page 2482.
• Truncating and inserting file data into an Oracle database on page 2681.
558
tDB2Close
tDB2Close
Closes a transaction committed in the connected DB.
tDB2Close Standard properties

These properties are used to configure tDB2Close running in the Standard Job framework.
The Standard tDB2Close component belongs to the Databases family.
Basic settings
Component list Select the tDB2Connection component in the list if more

Advanced settings
level.
Usage
Usage rule This component is to be used along with DB2 components,

especially with tDB2Connection and tDB2Commit.
Guide.
559
tDB2Close
Related scenarios
560
tDB2Commit
tDB2Commit
Commits in one go a global transaction instead of doing that on every row or every batch and thus
provides gain in performance.
tDB2Commit validates the data processed through the Job into the connected DB.
tDB2Commit Standard properties

These properties are used to configure tDB2Commit running in the Standard Job framework.
The Standard tDB2Commit component belongs to the Databases family.
Basic settings

Warning:
tDB2Commit to your Job, your data will be committed row
by row. In this case, do not select the Close connection
Advanced settings
level.
Usage
Usage rule This component is more commonly used with other tDB2*
components, especially with the tDB2Connection and
tDB2Rollback components.
561
tDB2Commit

Guide.
Related scenario
For tDB2Commit related scenario, see Inserting data in mother/daughter tables on page 2426
562
tDB2Connection
tDB2Connection
subjobs.
tDB2Connection Standard properties

These properties are used to configure tDB2Connection running in the Standard Job framework.
The Standard tDB2Connection component belongs to the Databases and the ELT families.
Basic settings

Host name Database server IP address.
Table Schema Name of the schema.

settings.
handling.
563
tDB2Connection

Talend Runtime .
This check box is not visible when the Use or register a
shared DB Connection check box is selected.
Data source alias Enter the alias of the data source created on the Talend
Runtime side.
This field is available only when the Specify a data source
alias check box is selected.
Advanced settings

Note:
commit component.
component.
Usage

tDB2* components, especially with the tDB2Commit and
tDB2Rollback components.
564
tDB2Connection
Related scenarios
For tDB2Connection related scenario, see tMysqlConnection on page 2425
565
tDB2Input
tDB2Input
Executes a DB query with a strictly defined order which must correspond to the schema definition.
Then tDB2Input passes on the field list to the next component via a Row > Main link.
If double quotes exist in the column names of a table, the double quotation marks cannot be retrieved
when retrieving the column. Therefore, it is recommended not to use double quotes in column names
in a DB2 database table.
tDB2Input Standard properties

These properties are used to configure tDB2Input running in the Standard Job framework.
The Standard tDB2Input component belongs to the Databases family.
Basic settings


566
tDB2Input
connection.
Guide.

settings.

Guide.

Studio User Guide.

available:
only.
567
tDB2Input
Table name Select the source table where to capture any changes made
on data.
definition.
Talend Runtime .
This check box is not available when the Use an existing
Runtime side.
Advanced settings

Note:

columns.
level.
Global Variables

check box.
use from it.
568
tDB2Input

User Guide.
Usage
Usage rule This component covers all possible SQL queries for DB2 da
tabases.
unusable.
Guide.
Related scenarios
See also the related topic in Reading data from different MySQL databases using dynamically loaded
connection parameters on page 497.
569
tDB2Output
tDB2Output
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tDB2Output writes, updates, makes changes or suppresses entries in a database.
tDB2Output Standard properties

These properties are used to configure tDB2Output running in the Standard Job framework.
The Standard tDB2Output component belongs to the Databases family.
Basic settings
connection.
Guide.


570
tDB2Output

Table schema Name of the DB schema.

settings.
operations:
Default: No operation is carried out.
again.
exist.
Truncate table: The table content is deleted. You do not
have the possibility to rollback the operation.
Truncate table with reuse storage: The table content is
deleted. You do not have the possibility to rollback the
operation. However, you can reuse the existing storage
allocated to the table, even if the storage is considered
empty.
Warning:
If you select the Use an existing connection check
box, and then select Truncate table or Truncate table
with reuse storage from the Action on table list, a
commit statement will be invoked before the truncate
operation because the truncate statement must be the
first statement in a transaction.
Job stops.
571
tDB2Output

be inserted.
Warning:
component only.

Job designs.
alend.com).

available:
only.
Talend Runtime .
572
tDB2Output

Runtime side.
Row > Rejects link.
Advanced settings

Note:





column.
573
tDB2Output
Convert columns and table names to uppercase Select this check box to uppercase the names of the
columns and the name of the table.
Note:
processing.
Note:
the Insert, the Update or the Delete option in the Action
on data field.

batch.
is selected.
level.
Global Variables

check box.
use from it.
574
tDB2Output

User Guide.
Usage
allows you to carry out actions on a table or on the data of a
table in a DB2 database. It also allows you to create a reject
flow using a Row > Rejects link to filter data in error. For
an example of tMySqlOutput in use, see Retrieving data in
error with a Reject link on page 2474.
unusable.
Guide.
Related scenarios
For tDB2Output related topics, see
575
tDB2Rollback
tDB2Rollback
Avoids to commit part of a transaction involuntarily.
tDB2Rollback cancels the transaction committed in the connected DB.
tDB2Rollback Standard properties

These properties are used to configure tDB2Rollback running in the Standard Job framework.
The Standard tDB2Rollback component belongs to the Databases family.
Basic settings

Advanced settings
level.
Usage
Usage rule This component is more commonly used with other tDB2*
components, especially with the tDB2Connection and
tDB2Commit components.
576
tDB2Rollback

Guide.
Related scenarios
For tDB2Rollback related scenario, see Rollback from inserting data in mother/daughter tables on
page 2429 of the tMysqlRollback.
577
tDB2Row
tDB2Row
Acts on the actual DB structure or on the data (although without handling data) depending on
the nature of the query and the database. The SQLBuilder tool helps you write easily your SQL
statements.
tDB2Row is the specific component for this database query. It executes the SQL query stated onto
the specified database. The row suffix means the component implements a flow in the job design
although it doesn't provide output.
tDB2Row Standard properties

These properties are used to configure tDB2Row running in the Standard Job framework.
The Standard tDB2Row component belongs to the Databases family.
Basic settings
connection.
Guide.

578
tDB2Row

settings.

Guide.

Studio User Guide.

available:
only.



definition.
Talend Runtime .
579
tDB2Row
Runtime side.
Row > Rejects link.
Advanced settings

Note:
use column list.
Note:

tab.
instruction.
Note:
increased
level.
580
tDB2Row
Global Variables
check box.
use from it.
User Guide.
Usage
unusable.
Guide.
Related scenarios
For tDB2Row related topics, see:
• Procedure on page 622
581
tDB2SCD
tDB2SCD
Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table
tDB2SCD reflects and tracks changes in a dedicated DB2 SCD table.
tDB2SCD Standard properties

These properties are used to configure tDB2SCD running in the Standard Job framework.
The Standard tDB2SCD component belongs to the Business Intelligence and the Databases families.
Basic settings
connection.
Guide.
Repository: Select the Repository file where properties are

data.
582
tDB2SCD
Table Schema Name of the DB schema.

settings.
available:
only.

Guide.

Studio User Guide.
SCD Editor The SCD editor helps to build and configure the data flow
for slowly changing dimension outputs.
For more information, see SCD management methodology
on page 2511.
Use memory saving Mode Select this check box to maximize system performance.
rows.
Advanced settings

583
tDB2SCD
Note:
End date time details Specify the time value of the SCD end date time setting in
the format of HH:mm:ss. The default value for this field is
12:00:00.
This field appears only when SCD Type 2 is used and Fixed
year value is selected for creating the SCD end date.
Debug mode Select this check box to display each step during
processing entries in a database.
level.
Global Variables
Global Variables NB_LINE_UPDATED: the number of rows updated. This is an

check box.
use from it.
User Guide.
Usage
Usage rule This component is used as Output component. It requires an

Input component and Row main link as input.
584
tDB2SCD

unusable.
Guide.
Limitation This component does not support using SCD type 0 together
with other SCD types.
Related scenarios
For related topics, see tMysqlSCD on page 2508.
585
tDB2SCDELT
tDB2SCDELT
Addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and
logs the changes into a dedicated DB2 SCD table.
tDB2SCDELT Standard properties

These properties are used to configure tDB2SCDELT running in the Standard Job framework.
The Standard tDB2SCDELT component belongs to the Business Intelligence and the Databases
families.
Basic settings
Built-in: No property data stored centrally. Enter properties

manually.
Repository: Select the repository file where Properties are

stored. The fields that come after are pre-filled in using the
fetched data.
connection.
Guide.
Host The IP address of the database server.
Port Listening port number of database server.
586
tDB2SCDELT
UsernamePassword User authentication data for a dedicated database.

settings.
Source table Name of the input DB2 SCD table.
Action on table Select to perform one of the following operations on the

table defined:
None: No action carried out on the table.
again
Create table: A new table gets created.
Create table if not exists: A table gets created if it does not
exist.
Clear table: The table content is deleted. You have the
possibility to rollback the operation.
Truncate table: The table content is deleted. You don not
available:
only.

Guide.

Studio User Guide.
Surrogate Key Select the surrogate key column from the list.
Creation Select the method to be used for the surrogate key ge

neration.
587
tDB2SCDELT
For more information regarding the creation methods, see

SCD management methodology on page 2511.
Source Keys Select one or more columns to be used as keys, to ensure

the unicity of incoming data.
Source fields value include Null Select this check box to allow the source columns to have
Null values.
Note:
The source columns here refer to the fields defined in
the SCD type 1 fields and SCD type 2 fields tables.
Use SCD Type 1 fields Use type 1 if tracking changes is not necessary. SCD Type
1 should be used for typos corrections for example. Select
the columns of the schema that will be checked for changes.
Use SCD Type 2 fields Use type 2 if changes need to be tracked down. SCD Type
2 should be used to trace updates for example. Select the
columns of the schema that will be checked for changes.
SCD type 2 fields Click the [+] button to add as many rows as needed, each
row for a column. Click the arrow on the right side of
the cell and select the column whose value changes will
be tracked using Type 2 SCD from the drop-down list
displayed .
This table is available only when the Use SCD type 2 fields
option is selected.
Start date Specify the column that holds the start date for type 2 SCD.
This list is available only when the Use SCD type 2 fields
option is selected.
End date Specify the column that holds the end date for type 2 SCD.
This list is available only when the Use SCD type 2 fields
option is selected.
Note: To avoid duplicated change records, it is

recommended to select a column that can identify each
change for this field.
Log active status Select this check box and from the Active field drop-down
list displayed, select the column that holds the true or false
status value, which helps to spot the active record for type 2
SCD.
This option is available only when the Use SCD type 2 fields
option is selected.
Log versions Select this check box and from the Version field drop-down
list displayed, select the column that holds the version
number of the record for type 2 SCD.
This option is available only when the Use SCD type 2 fields
option is selected.
588
tDB2SCDELT
Advanced settings

Note:
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is used as an output component. It requires

an input component and Row main link as input.
unusable.
589
tDB2SCDELT

Guide.
Related Scenarios
For related scenarios,see:
• Tracking data changes in a Snowflake table using the tJDBCSCDELT component on page 1879.
• Tracking data changes in a PostgreSQL table using the tPostgreSQLSCDELT component on page
2948.
590
tDB2SP
tDB2SP
Offers a convenient way to call the database stored procedures.
tDB2SP Standard properties

These properties are used to configure tDB2SP running in the Standard Job framework.
The Standard tDB2SP component belongs to the Databases family.
Basic settings
connection.
Guide.

591
tDB2SP

settings.
component only.

Job designs.

available:
only.
SP Name Type in the exact name of the Stored Procedure
Is Function / Return result in Check this box, if a value only is to be returned.

Select on the list the schema column, the value to be
returned is based on.
Parameters Click the Plus button and select the various Schema
Columns that will be required by the procedures. Note
that the SP schema can hold more columns than there are
parameters used in the procedure.
Select the Type of parameter:
IN: Input parameter
OUT: Output parameter/return value
IN OUT: Input parameters is to be returned as value, likely
after modification through the procedure (function).
RECORDSET: Input parameters is to be returned as a set of
values, rather than single value.
Note:
Check Inserting data in mother/daughter tables on page
2426 if you want to analyze a set of records from a
database table or DB query and return single records.
592
tDB2SP

Talend Runtime .
Runtime side.
Advanced settings

Note:
level.
Usage
Usage rule This component is used as intermediary component. It can

be used as start component but only input parameters are
thus allowed.
unusable.
Guide.
Related scenarios
593
tDB2SP
• Retrieving personal information using a stored procedure on page 2404.

• Using tMysqlSP to find a State Label using a stored procedure on page 2528.
• Checking number format using a stored procedure on page 2735.
• Executing a stored procedure using tMDMSP on page 2180.
Check Inserting data in mother/daughter tables on page 2426 as well if you want to analyze a set of
records from a database table or DB query and return single records.
594
Dynamic database components
Dynamic database components
Talend provides a number of database components that allow you to change dynamically the type of
database you want to work on. These components are available in the Database Common group under
the Databases family of the Palette for standard data integration Jobs.
Each of these components has only one property, the Database list, on its Basic settings view for you
to select the type of database of your interest.
For more information on these dynamic database components, see:
• tDBBulkExec on page 596
• tDBClose on page 597
• tDBColumnList on page 598
• tDBCommit on page 599
• tDBConnection on page 600
• tDBInput on page 601
• tDBLastInsertId on page 603
• tDBOutput on page 604
• tDBOutputBulk on page 606
• tDBOutputBulkExec on page 607
• tDBRollback on page 608
• tDBRow on page 609
• tDBSCD on page 610
• tDBSCDELT on page 611
• tDBSP on page 612
• tDBTableList on page 613
595
tDBBulkExec
tDBBulkExec
Offers gains in performance while executing the Insert operations on a database.
This component works with a variety of databases depending on your selection.
The tDBOutputBulk and tDBBulkExec components are used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT statement
used to feed a database the selected database type. These two steps are fused together in the
tDBOutputBulkExec component, detailed in a separate section. The advantage of using two separate
steps is that the data can be transformed before it is loaded in the database.
tDBBulkExec Standard properties

These properties are used to configure tDBBulkExec running in the Standard Job framework.
The Standard tDBBulkExec component belongs to the Databases family.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessBulkExec on page 79)
• Amazon (tRedshiftBulkExec on page 2964)
• Greenplum (tGreenplumBulkExec on page 1311)
• IBM DB2 (tDB2BulkExec on page 553)
• Informix (tInformixBulkExec on page 1706)
• Ingres (tIngresBulkExec on page 1747)
• Microsoft SQL Server (tMSSqlBulkExec on page 2348)
• MySQL (tMysqlBulkExec on page 2412)
• Netezza (tNetezzaBulkExec on page 2616)
• Oracle (tOracleBulkExec on page 2676)
• ParAccel (tParAccelBulkExec on page 2803)
• PostgreSQL (tPostgresqlBulkExec on page 2906)
• PostgresPlus (tPostgresPlusBulkExec on page 2865)
• Snowflake (tSnowflakeBulkExec on page 3384)
• Sybase (ASE and IQ) (tSybaseBulkExec on page 3658)
• Sybase IQ (tSybaseIQBulkExec on page 3673)
• Vertica (tVerticaBulkExec on page 3822)
596
tDBClose
tDBClose
Closes the transaction committed in a connected database.
tDBClose Standard properties

These properties are used to configure tDBClose running in the Standard Job framework.
The Standard tDBClose component belongs to the Databases family.
Basic settings
• Access (tAccessClose on page 82)
• Amazon Aurora (tAmazonAuroraClose on page 146)
• Amazon Mysql (tAmazonMysqlClose on page 185)
• Amazon Oracle (tAmazonOracleClose on page 207)
• Amazon Redshift (tRedshiftClose on page 2980)
• AS400 (tAS400Close on page 237)
• FireBird (tFirebirdClose on page 1179)
• Greenplum (tGreenplumClose on page 1315)
• IBM DB2 (tDB2Close on page 559)
• Exasol (tEXAClose on page 895)
• Informix (tInformixClose on page 1711)
• Ingres (tIngresClose on page 1751)
• Interbase (tInterbaseClose on page 1784)
• JDBC (tJDBCClose on page 1850)
• MemSQL (tMemSQLClose (deprecated))
• Microsoft SQL Server (tMSSqlClose on page 2353)
• MySQL (tMysqlClose on page 2416)
• Netezza (tNetezzaClose on page 2620)
• Oracle (tOracleClose on page 2684)
• ParAccel (tParAccelClose on page 2807)
• PostgreSQL (tPostgresqlClose on page 2910)
• PostgresPlus (tPostgresPlusClose on page 2869)
• SAPHana (tSAPHanaClose on page 3303)
• SQLite (tSQLiteClose on page 3504)
• Snowflake (tSnowflakeClose on page 3398)
• Sybase (ASE and IQ) (tSybaseClose on page 3663)
• Teradata (tTeradataClose on page 3726)
• Vertica (tVerticaClose on page 3828)
597
tDBColumnList
tDBColumnList
Iterates on all columns of a given database table and lists column names.
tDBColumnList Standard properties

These properties are used to configure tDBColumnList running in the Standard Job framework.
The Standard tDBColumnList component belongs to the Databases family.
Basic settings
• Microsoft SQL Server (tMSSqlColumnList on page 2355)
• MySQL (tMysqlColumnList on page 2418)
598
tDBCommit
tDBCommit
Validates the data processed through the Job into the connected database.
tDBCommit Standard properties

These properties are used to configure tDBCommit running in the Standard Job framework.
The Standard tDBCommit component belongs to the Databases family.
Basic settings
• Access (tAccessCommit on page 84)
• Amazon Aurora (tAmazonAuroraCommit on page 148)
• Amazon Mysql (tAmazonMysqlCommit on page 187)
• Amazon Oracle (tAmazonOracleCommit on page 209)
• AS400 (tAS400Commit on page 239)
• Amazon Redshift (tRedshiftCommit on page 2982)
• FireBird (tFirebirdCommit on page 1181)
• Greenplum (tGreenplumCommit on page 1317)
• IBM DB2 (tDB2Commit on page 561)
• Exasol (tEXACommit on page 897)
• Informix (tInformixCommit on page 1713)
• Ingres (tIngresCommit on page 1753)
• Interbase (tInterbaseCommit on page 1786)
• JDBC (tJDBCCommit on page 1854)
• Microsoft SQL Server (tMSSqlCommit on page 2358)
• MySQL (tMysqlCommit on page 2423)
• Netezza (tNetezzaCommit on page 2622)
• Oracle (tOracleCommit on page 2686)
• ParAccel (tParAccelCommit on page 2809)
• PostgreSQL (tPostgresqlCommit on page 2912)
• PostgresPlus (tPostgresPlusCommit on page 2871)
• SAPHana (tSAPHanaCommit on page 3304)
• SQLite (tSQLiteCommit on page 3506)
• Sybase (ASE and IQ) (tSybaseCommit on page 3665)
• Teradata (tTeradataCommit on page 3728)
• VectorWise (tVectorWiseCommit on page 3803)
• Vertica (tVerticaCommit on page 3830)
599
tDBConnection
tDBConnection
Opens a connection to a database to be reused in the subsequent subJob or subJobs.
tDBConnection Standard properties

These properties are used to configure tDBConnection running in the Standard Job framework.
The Standard tDBConnection component belongs to the Databases family.
Basic settings
• Access (tAccessConnection on page 86)
• Amazon Aurora (tAmazonAuroraConnection on page 150)
• Amazon Mysql (tAmazonMysqlConnection on page 189)
• Amazon Oracle (tAmazonOracleConnection on page 211)
• Amazon Redshift (tRedshiftConnection on page 2984)
• AS400 (tAS400Connection on page 241)
• Exasol (tEXAConnection on page 899)
• FireBird (tFirebirdConnection on page 1183)
• Greenplum (tGreenplumConnection on page 1319)
• IBM DB2 (tDB2Connection on page 563)
• Informix (tInformixConnection on page 1715)
• Ingres (tIngresConnection on page 1755)
• Interbase (tInterbaseConnection on page 1788)
• JDBC (tJDBCConnection on page 1856)
• MemSQL (tMemSQLConnection (deprecated))
• Microsoft SQL Server (tMSSqlConnection on page 2360)
• MySQL (tMysqlConnection on page 2425)
• Netezza (tNetezzaConnection on page 2624)
• Oracle (tOracleConnection on page 2688)
• ParAccel (tParAccelConnection on page 2811)
• PostgreSQL (tPostgresqlConnection on page 2914)
• PostgresPlus (tPostgresPlusConnection on page 2873)
• SAPHana (tSAPHanaConnection on page 3306)
• SQLite (tSQLiteConnection on page 3508)
• Snowflake (tSnowflakeConnection on page 3401)
• Sybase (ASE and IQ) (tSybaseConnection on page 3667)
• Teradata (tTeradataConnection on page 3730)
• VectorWise (tVectorWiseConnection on page 3805)
• Vertica (tVerticaConnection on page 3832)
600
tDBInput
tDBInput
Extracts data from a database.
tDBInput Standard properties

These properties are used to configure tDBInput running in the Standard Job framework.
The Standard tDBInput component belongs to the Databases family.
Basic settings
• Access (tAccessInput on page 91)
• Amazon Aurora (tAmazonAuroraInput on page 153)
• Amazon Mysql (tAmazonMysqlInput on page 192)
• Amazon Oracle (tAmazonOracleInput on page 214)
• Amazon Redshift (tRedshiftInput on page 2987)
• AS400 (tAS400Input on page 243)
• Exasol (tEXAInput on page 902)
• FireBird (tFirebirdInput on page 1185)
• Greenplum (tGreenplumInput on page 1327)
• IBM DB2 (tDB2Input on page 566)
• Informix (tInformixInput on page 1717)
• Ingres (tIngresInput on page 1757)
• Interbase (tInterbaseInput on page 1790)
• JDBC (tJDBCInput on page 1861)
• MemSQL (tMemSQLInput (deprecated))
• Microsoft SQL Server (tMSSqlInput on page 2368)
• MySQL (tMysqlInput on page 2437)
• Netezza (tNetezzaInput on page 2626)
• Oracle (tOracleInput on page 2692)
• ParAccel (tParAccelInput on page 2813)
• PostgreSQL (tPostgresqlInput on page 2916)
• PostgresPlus (tPostgresPlusInput on page 2875)
• SAPHana (tSAPHanaInput on page 3308)
• SAS (tSasInput (deprecated))
• SQLite (tSQLiteInput on page 3510)
• Snowflake (tSnowflakeInput on page 3404)
• Sybase (ASE and IQ) (tSybaseInput on page 3669)
• Teradata (tTeradataInput on page 3742)
• VectorWise (tVectorWiseInput on page 3807)
601
tDBInput
• Vertica (tVerticaInput on page 3834)
602
tDBLastInsertId
tDBLastInsertId
Obtains the primary key value of the record that was last inserted in a database table by a user.
tDBLastInsertId Standard properties

These properties are used to configure tDBLastInsertId running in the Standard Job framework.
The Standard tDBLastInsertId component belongs to the Databases family.
Basic settings
• AS400 (tAS400LastInsertId on page 250)
• Microsoft SQL Server (tMSSqlLastInsertId on page 2372)
• MySQL (tMysqlLastInsertId on page 2453)
603
tDBOutput
tDBOutput
tDBOutput Standard properties

These properties are used to configure tDBOutput running in the Standard Job framework.
The Standard tDBOutput component belongs to the Databases family.
Basic settings
• Access (tAccessOutput on page 95)
• Amazon Aurora (tAmazonAuroraOutput on page 163)
• Amazon Mysql (tAmazonMysqlOutput on page 195)
• Amazon Oracle (tAmazonOracleOutput on page 218)
• Amazon Redshift (tRedshiftOutput on page 2996)
• AS400 (tAS400Output on page 252)
• Exasol (tEXAOutput on page 906)
• FireBird (tFirebirdOutput on page 1189)
• Greenplum (tGreenplumOutput on page 1330)
• IBM DB2 (tDB2Output on page 570)
• Informix (tInformixOutput on page 1720)
• Ingres (tIngresOutput on page 1761)
• Interbase (tInterbaseOutput on page 1794)
• JDBC (tJDBCOutput on page 1865)
• MemSQL (tMemSQLOutput (deprecated))
• Microsoft SQL Server (tMSSqlOutput on page 2375)
• MySQL (tMysqlOutput on page 2460)
• Netezza (tNetezzaOutput on page 2637)
• Oracle (tOracleOutput on page 2699)
• ParAccel (tParAccelOutput on page 2817)
• PostgreSQL (tPostgresqlOutput on page 2920)
• PostgresPlus (tPostgresPlusOutput on page 2879)
• SAPHana (tSAPHanaOutput on page 3312)
• SAS (tSasOutput (deprecated))
• SQLite (tSQLiteOutput on page 3515)
• Snowflake (tSnowflakeOutput on page 3412)
• Sybase (ASE and IQ) (tSybaseOutput on page 3689)
• Teradata (tTeradataOutput on page 3749)
• VectorWise (tVectorWiseOutput on page 3811)
604
tDBOutput
• Vertica (tVerticaOutput on page 3838)
605
tDBOutputBulk
tDBOutputBulk
Writes a file with columns based on the defined delimiter and the standards of the selected database
type.
tDBOutputBulk Standard properties

These properties are used to configure tDBOutputBulk running in the Standard Job framework.
The Standard tDBOutputBulk component belongs to the Databases family.
Basic settings
• Access (tAccessOutputBulk on page 101)
• Amazon Redshift (tRedshiftOutputBulk on page 3002)
• Greenplum (tGreenplumOutputBulk on page 1336)
• Informix (tInformixOutputBulk on page 1726)
• Ingres (tIngresOutputBulk on page 1766)
• Microsoft SQL Server (tMSSqlOutputBulk on page 2382)
• MySQL (tMysqlOutputBulk on page 2480)
• Oracle (tOracleOutputBulk on page 2706)
• ParAccel (tParAccelOutputBulk on page 2823)
• PostgreSQL (tPostgresqlOutputBulk on page 2927)
• PostgresPlus (tPostgresPlusOutputBulk on page 2885)
• Snowflake (tSnowflakeOutputBulk on page 3416)
• Sybase (ASE and IQ) (tSybaseOutputBulk on page 3695)
• Vertica (tVerticaOutputBulk on page 3844)
606
tDBOutputBulkExec
tDBOutputBulkExec
Executes the Insert action in a database.
tDBOutputBulkExec Standard properties

These properties are used to configure tDBOutputBulkExec running in the Standard Job framework.
The Standard tDBOutputBulkExec component belongs to the Databases family.
Basic settings
• Access (tAccessOutputBulk on page 101)
• Amazon Redshift (tRedshiftOutputBulk on page 3002)
• Greenplum (tGreenplumOutputBulk on page 1336)
• Informix (tInformixOutputBulk on page 1726)
• Ingres (tIngresOutputBulk on page 1766)
• Microsoft SQL Server (tMSSqlOutputBulk on page 2382)
• MySQL (tMysqlOutputBulk on page 2480)
• Oracle (tOracleOutputBulk on page 2706)
• ParAccel (tParAccelOutputBulk on page 2823)
• PostgreSQL (tPostgresqlOutputBulk on page 2927)
• PostgresPlus (tPostgresPlusOutputBulk on page 2885)
• Snowflake (tSnowflakeOutputBulkExec on page 3423)
• Sybase (ASE and IQ) (tSybaseOutputBulk on page 3695)
• Vertica (tVerticaOutputBulk on page 3844)
607
tDBRollback
tDBRollback
Cancels the transaction commit in a connected database to avoid committing part of a transaction
involuntarily.
tDBRollback Standard properties

These properties are used to configure tDBRollback running in the Standard Job framework.
The Standard tDBRollback component belongs to the Databases family.
Basic settings
• Access (tAccessRollback on page 108)
• Amazon Aurora (tAmazonAuroraRollback on page 170)
• Amazon Mysql (tAmazonMysqlRollback on page 201)
• Amazon Oracle (tAmazonOracleRollback on page 224)
• Amazon Redshift (tRedshiftRollback on page 3014)
• AS400 (tAS400Rollback on page 257)
• Exasol (tEXARollback on page 912)
• FireBird (tFirebirdRollback on page 1194)
• Greenplum (tGreenplumRollback on page 1342)
• IBM DB2 (tDB2Rollback on page 576)
• Informix (tInformixRollback on page 1733)
• Ingres (tIngresRollback on page 1775)
• Interbase (tInterbaseRollback on page 1800)
• JDBC (tJDBCRollback on page 1870)
• Microsoft SQL Server (tMSSqlRollback on page 2390)
• MySQL (tMysqlRollback on page 2491)
• Netezza (tNetezzaRollback on page 2643)
• Oracle (tOracleRollback on page 2715)
• ParAccel (tParAccelRollback on page 2830)
• PostgreSQL (tPostgresqlRollback on page 2934)
• PostgresPlus (tPostgresPlusRollback on page 2891)
• SAPHana (tSAPHanaRollback on page 3318)
• SQLite (tSQLiteRollback on page 3520)
• Sybase (ASE and IQ) (tSybaseRollback on page 3703)
• Teradata (tTeradataRollback on page 3755)
• VectorWise (tVectorWiseRollback on page 3816)
• Vertica (tVerticaRollback on page 3852)
608
tDBRow
tDBRow
Executes the stated SQL query onto a database.
tDBRow Standard properties

These properties are used to configure tDBRow running in the Standard Job framework.
The Standard tDBRow component belongs to the Databases family.
Basic settings
• Access (tAccessRow on page 110)
• Amazon Mysql (tAmazonMysqlRow on page 203)
• Amazon Oracle (tAmazonOracleRow on page 226)
• Amazon Redshift (tRedshiftRow on page 3016)
• AS400 (tAS400Row on page 259)
• Exasol (tEXARow on page 914)
• FireBird (tFirebirdRow on page 1196)
• Greenplum (tGreenplumRow on page 1344)
• IBM DB2 (tDB2Row on page 578)
• Informix (tInformixRow on page 1735)
• Ingres (tIngresRow on page 1777)
• Interbase (tInterbaseRow on page 1802)
• JDBC (tJDBCRow on page 1872)
• MemSQL (tMemSQLRow (deprecated))
• Microsoft SQL Server (tMSSqlRow on page 2392)
• MySQL (tMysqlRow on page 2493)
• Netezza (tNetezzaRow on page 2645)
• Oracle (tOracleRow on page 2717)
• ParAccel (tParAccelRow on page 2832)
• PostgreSQL (tPostgresqlRow on page 2936)
• PostgresPlus (tPostgresPlusRow on page 2893)
• SAPHana (tSAPHanaRow on page 3319)
• SQLite (tSQLiteRow on page 3522)
• Snowflake (tSnowflakeRow on page 3440)
• Sybase (ASE and IQ) (tSybaseRow on page 3705)
• Teradata (tTeradataRow on page 3757)
• VectorWise (tVectorWiseRow on page 3818)
• Vertica (tVerticaRow on page 3854)
609
tDBSCD
tDBSCD
Reflects and tracks changes in a dedicated database SCD table.
tDBSCD Standard properties

These properties are used to configure tDBSCD running in the Standard Job framework.
The Standard tDBSCD component belongs to the Databases family.
Basic settings
• Greenplum (tGreenplumSCD on page 1348)
• IBM DB2 (tDB2SCD on page 582)
• Informix (tInformixSCD on page 1739)
• Ingres (tIngresSCD on page 1781)
• Microsoft SQL Server (tMSSqlSCD on page 2397)
• MySQL (tMysqlSCD on page 2508)
• Netezza (tNetezzaSCD on page 2649)
• Oracle (tOracleSCD on page 2722)
• ParAccel (tParAccelSCD on page 2836)
• PostgreSQL (tPostgresqlSCD on page 2940)
• PostgresPlus (tPostgresPlusSCD on page 2897)
• Sybase (ASE and IQ) (tSybaseSCD on page 3709)
• Teradata (tTeradataSCD on page 3762)
• Vertica (tVerticaSCD on page 3858)
610
tDBSCDELT
tDBSCDELT
Reflects and tracks changes in a dedicated SCD table through SQL queries.
tDBSCDELT Standard properties

These properties are used to configure tDBSCDELT running in the Standard Job framework.
The Standard tDBSCDELT component belongs to the Databases family.
Basic settings
• IBM DB2 (tDB2SCDELT on page 586)
• MySQL (tMysqlSCDELT on page 2522)
• Oracle (tOracleSCDELT on page 2726)
• PostgreSQL (tPostgresqlSCDELT on page 2944)
• PostgresPlus (tPostgresPlusSCDELT on page 2901)
• Sybase (ASE and IQ) (tSybaseSCDELT on page 3713)
• Teradata (tTeradataSCDELT on page 3766)
611
tDBSP
tDBSP
Calls a database stored procedure.
tDBSP Standard properties

These properties are used to configure tDBSP running in the Standard Job framework.
The Standard tDBSP component belongs to the Databases family.
Basic settings
• IBM DB2 (tDB2SP on page 591)
• Informix (tInformixSP on page 1743)
• JDBC (tJDBCSP on page 1889)
• Microsoft SQL Server (tMSSqlSP on page 2401)
• MySQL (tMysqlSP on page 2526)
• Oracle (tOracleSP on page 2731)
• Sybase (ASE and IQ) (tSybaseSP on page 3718)
612
tDBTableList
tDBTableList
Lists the names of specified database tables using a SELECT statement based on a WHERE clause.
tDBTableList Standard properties

These properties are used to configure tDBSP running in the Standard Job framework.
The Standard tDBSP component belongs to the Databases family.
Basic settings
• Microsoft SQL Server (tMSSqlTableList on page 2410)
• MySQL (tMysqlTableList on page 2532)
• Oracle (tOracleTableList on page 2739)
613
tDBFSConnection
tDBFSConnection
Connects to a given DBFS (Databricks Filesystem) system so that the other DBFS components can
reuse the connection it creates to communicate with this DBFS.
The DBFS (Databricks Filesystem) components are designed for quick and straightforward data
transferring with Databricks. If you need to handle more sophisticated scenarios for optimal
performance, use Spark Jobs with Databricks.
tDBFSConnection Standard properties

These properties are used to configure tDBFSConnection running in the Standard Job framework.
The Standard tDBFSConnection component belongs to the Big Data and the File families.
Fabric.
Basic settings

are stored.
Endpoint In the Endpoint field, enter the URL address of your Azure
Databricks workspace. This URL can be found in the
Overview blade of your Databricks workspace page on your
Azure portal. For example, this URL could look like https://
westeurope.azuredatabricks.net.
Token Click the [...] button next to the Token field to enter the
authentication token generated for your Databricks user
account. You can generate or find this token on the User
settings page of your Databricks workspace. For further
information, see Token management from the Azure
documentation.
Advanced settings
Usage
Usage rule This component is generally used with other DBFS c

omponents.
614
tDBFSGet
tDBFSGet
Copies files from a given DBFS (Databricks Filesystem) system, pastes them in a user-defined directory
and if needs be, renames them.
tDBFSGet Standard properties

These properties are used to configure tDBFSGet running in the Standard Job framework.
The Standard tDBFSGet component belongs to the Big Data and the File families.
Fabric.
Basic settings

are stored.
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
documentation.
DBFS directory In the DBFS directory field, enter the path pointing to the
data to be used in the DBFS file system.
Local directory Browse to, or enter the local directory to store the files
copied from DBFS.
Overwrite file Options to overwrite or not the existing file with the new
one.
615
tDBFSGet
Include subdirectories Select this check box if the selected input source type
includes sub-directories.
Files In the Files area, the fields to be completed are:

- File mask: type in the file name to be selected from HDFS.
Regular expression is available.
- New name: give a new name to the obtained file.
an error occurs.
Advanced settings
Usage
Usage rule This component combines DBFS connection and data

extraction, thus used as a single-component subJob to copy
data from DBFS to an user-defined local directory.
It runs standalone and does not generate input or output
flow for the other components. It is often connected to the
Job using OnSubjobOk or OnComponentOk link, depending
on the context.
616
tDBFSPut
tDBFSPut
Connects to a given DBFS (Databricks Filesystem) system, copies files from an user-defined directory,
pastes them in this system and if needs be, renames these files.
tDBFSPut Standard properties

These properties are used to configure tDBFSPut running in the Standard Job framework.
The Standard tDBFSPut component belongs to the Big Data and the File families.
Fabric.
Basic settings

are stored.
documentation.
DBFS directory In the DBFS directory field, enter the path pointing to the
data to be used in the DBFS file system.
Local directory Local directory where are stored the files to be loaded into
DBFS.
one.
617
tDBFSPut

- File mask: type in the file name to be selected from the
local directory. Regular expression is available.
- New name: give a new name to the loaded file.
an error occurs.
Advanced settings
Usage
Usage rule This component combines DBFS connection and data

extraction, thus usually used as a single-component subJob
to copy data from a user-defined local directory to DBFS.
It runs standalone and does not generate input or output
flow for the other components. It is often connected to the
Job using OnSubjobOk or OnComponentOk link, depending
on the context.
618
tDBSQLRow
tDBSQLRow
Acts on the actual DB structure or on the data (although without handling data) depending on
the nature of the query and the database. The SQLBuilder tool helps you write easily your SQL
statements.
tDBSQLRow is the generic component for database query. It executes the SQL query stated onto t
he specified database. The row suffix means the component implements a flow in the job design
although it does not provide output. For performance reasons, specific DB component should always
be preferred to the generic component.
To use this component, relevant DBMSs' ODBC drivers should be installed and the corresponding
ODBC connections should be configured via the database connection configuration wizard.
tDBSQLRow Standard properties

These properties are used to configure tDBSQLRow running in the Standard Job framework.
The Standard tDBSQLRow component belongs to the Databases family.
Basic settings

Datasource Name of the data source defined via the database

connection configuration wizard.

settings.
component only.

Job designs.

available:
619
tDBSQLRow

only.
Table Name Name of the source table where changes made to data
should be captured.



definition.
rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Note:
use column list.
tab.
instruction.
620
tDBSQLRow
Note:
increased

level.
Global Variables
check box.
use from it.
User Guide.
Usage
Note that the relevant DBRow component should be
preferred according to your DBMSs. Most of the DBMSs
have their specific DBRow components.
Resetting a DB auto-increment
This scenario describes a single component Job which aims at re-initializing the DB auto-increment to
1. This job has no output and is generally to be used before running a script.
Warning:
As a prerequisite of this Job, the relevant DBMS's ODBC driver must have been installed and the
corresponding ODBC connection must have been configured.
621
tDBSQLRow
Procedure
Procedure
1. Drag and drop a tDBSQLRow component from the Palette to the design workspace.
2. Double-click tDBSQLRow to open its Basic settings view.
3. Select Repository in the Property Type list as the ODBC connection has been configured and
saved in the Repository. The follow-up fields gets filled in automatically.
For more information on storing DB connections in the Repository, see Talend Studio User Guide.
4. The Schema is built-in for this Job and it does not really matter in this example as the action is
made on the table auto-increment and not on data.
5. The Query type is also built-in. Click on the [...] button next to the Query statement box to launch
the SQLbuilder editor, or else type in directly in the statement box:
Alter table <TableName> auto_increment = 1
6. Press Ctrl+S to save the Job and F6 to run.
The database autoincrement is reset to 1.
622
tDenormalize
tDenormalize
Denormalizes the input flow based on one column.
tDenormalize Standard properties

These properties are used to configure tDenormalize running in the Standard Job framework.
The Standard tDenormalize component belongs to the Processing family.
Basic settings
available:
only.
component only.

Job designs.
To denormalize In this table, define the parameters used to denormalize

your columns.
Column: Select the column to denormalize.
Delimiter: Type in the separator you want to use to
denormalize your data between double quotes.
Merge same value: Select this check box to merge identical
values.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at component
level. Note that this check box is not available in the Map/
Reduce version of the component.
623
tDenormalize
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component can be used as intermediate step in a data f

low.
Limitation Note that this component may change the order in the
incoming Java flow.
Denormalizing on one column

This scenario illustrates a Job denormalizing one column in a delimited file.
Denormalizing on one column

Procedure
1. Drop the following components: tFileInputDelimited, tDenormalize, tLogRow from the Palette to
the design workspace.
2. Connect the components using Row main connections.
3. On the tFileInputDelimited Component view, set the filepath to the file to be denormalized.
624
tDenormalize
4. Define the Header, Row Separator and Field Separator parameters.

5. The input file schema is made of two columns, Fathers and Children.
6. In the Basic settings of tDenormalize, define the column that contains multiple values to be
grouped.
7. In this use case, the column to denormalize is Children.
8. Set the Delimiter to separate the grouped values. Beware as only one column can be
denormalized.
9. Select the Merge same value check box, if you know that some values to be grouped are strictly
identical.
625
tDenormalize
Results
All values from the column Children (set as column to denormalize) are grouped by their Fathers
column. Values are separated by a comma.
Denormalizing on multiple columns

This scenario illustrates a Job denormalizing two columns from a delimited file.
Denormalizing on multiple columns

Procedure
1. Drop the following components: tFileInputDelimited, tDenormalize, tLogRow from the Palette to
2. Connect all components using a Row main connection.
3. On the tFileInputDelimited Basic settings panel, set the filepath to the file to be denormalized.
4. Define the Row and Field separators, the Header and other information if required.
5. The file schema is made of four columns including: Name, FirstName, HomeTown, WorkTown.
626
tDenormalize
6. In the tDenormalize component Basic settings, select the columns that contain the repetition.
These are the column which are meant to occur multiple times in the document. In this use
case, FirstName, HomeCity and WorkCity are the columns against which the denormalization is
performed.
7. Add as many line to the table as you need using the plus button. Then select the relevant columns
in the drop-down list.
8. In the Delimiter column, define the separator between double quotes, to split concanated values.
For FirstName column, type in "#", for HomeCity, type in "§", ans for WorkCity, type in "¤".
The result shows the denormalized values concatenated using a comma.

10. Back to the tDenormalize components Basic settings, in the To denormalize table, select the
Merge same value check box to remove the duplicate occurrences.
11. Save your Job again and press F6 to execute it..
627
tDenormalize
Results
This time, the console shows the results with no duplicate instances.
628
tDenormalizeSortedRow
Synthesizes sorted input flow to save memory.
tDenormalizeSortedRow combines in a group all input sorted rows. Distinct values of the
denormalized sorted row are joined with item separators.
tDenormalizeSortedRow Standard properties

These properties are used to configure tDenormalizeSortedRow running in the Standard Job
framework.
The Standard tDenormalizeSortedRow component belongs to the Processing family.
Basic settings
in the Repository.
available:
only.
previous component in the Job.
Built-in: You create the schema and store it locally for the
relevant component. Related topic: see Talend Studio User
Guide.

Input rows count Enter the number of input rows.
To denormalize Enter the name of the column to denormalize.
629
Advanced settings
tStatCatcher Statistics Select this ckeck box to collect the log data at component
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component handles flows of data therefore it requires

input and output components.
Regrouping sorted rows

This Java scenario describes a four-component Job. It aims at reading a given delimited file row by
row, sorting input data by sort type and order, denormalizing all input sorted rows and displaying the
output on the Run log console.
• Drop the following components from the Palette onto the design workspace: tFileInputDelimited,
tSortRow, tDenormalizeSortedRow, and tLogRow.
• Connect the four components using Row Main links.
• In the design workspace, select tFileInputDelimited.

• Click the Component tab to define the basic settings for tFileInputDelimited.
630
• Set Property Type to Built-In.

• Fill in a path to the processed file in the File Name field. The name_list file used in this example
holds two columns, id and first name.
• If needed, define row and field separators, header and footer, and the number of processed rows.
• Set Schema to Built in and click the three-dot button next to Edit Schema to define the data to
pass on to the next component. The schema in this example consists of two columns, id and name.
• In the design workspace, select tSortRow.

• Click the Component tab to define the basic settings for tSortRow.
631
• Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the
tFileInputDelimited component.
• In the Criteria panel, use the plus button to add a line and set the sorting parameters for the
schema column to be processed. In this example we want to sort the id columns in ascending
order.
• In the design workspace, select tDenormalizeSortedRow.
• Click the Component tab to define the basic settings for tDenormalizeSortedRow.
• Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tSortRow
component.
• In the Input rows countfield, enter the number of the input rows to be processed or press
Ctrl+Space to access the context variable list and select the variable: tFileInputDeli
mited_1_NB_LINE.
• In the To denormalize panel, use the plus button to add a line and set the parameters to the
column to be denormalize. In this example we want to denormalize the name column.
• In the design workspace, select tLogRow and click the Component tab to define its basic settings.
For more information about tLogRow, see tLogRow on page 1977.
632
The result displayed on the console shows how the name column was denormalize.
633
tDie
tDie
Triggers the tLogCatcher component for exhaustive log before killing the Job.
Both tDie and tWarn components are closely related to the tLogCatcher component.They generally
make sense when used alongside a tLogCatcher in order for the log data collected to be encapsulated
and passed on to the output defined.
This component throws an error and kills the job. If you simply want to throw a warning, see the
tWarn documentation.
tDie Standard properties

These properties are used to configure tDie running in the Standard Job framework.
The Standard tDie component belongs to the Logs & Errors family.
Basic settings
Die message Enter the message to be displayed before the Job is killed.
Error code Enter the error code if need be, as an integer.
Note:
Note that any value greater than 255 can not be used as
an error code on Linux.
Priority Set the level of priority, as an integer
Global Variables
Global Variables DIE_MESSAGES: the die message. This is an After variable

DIE_CODE: the error code of the die message. This is an
DIE_PRIORITY: the priority level of the die message. This is
an After variable and it returns an integer.
check box.
use from it.
User Guide.
634
tDie
Usage
Usage rule This component cannot be used as a start component and it

is generally used with a tLogCatcher for the log purpose.
Related scenarios
For use cases in relation with tDie, see tLogCatcher scenarios:
• Catching messages triggered by a tWarn component on page 1971
• Catching the message triggered by a tDie component on page 1973
635
tDotNETInstantiate
tDotNETInstantiate
Invokes the constructor of a .NET object that is intended for later reuse.
tDotNETInstantiate instantiates an object in the .NET for later reuse.
tDotNETInstantiate Standard properties

These properties are used to configure tDotNETInstantiate running in the Standard Job framework.
The Standard tDotNETInstantiate component belongs to the DotNET family.
Basic settings
Dll to load Type in the path, or browse to the DLL library containing
the classe(es) of interest or enter the assembly's name
to be used. For example, System.Data, Version=2.0.0.0,
Culture=neutral, PublicKeyToken=b77a5c561934e089 for an
OleDb assembly.
Fully qualified class name(i.e. ClassLibrary1. Enter a fully qualified name for the class of interest.
NameSpace2.Class1)
Value(s) to pass to the constructor Click the plus button to add one or more values to be
passed to the constructor for the object. Or, leave this table
empty to call a default constructor for the object.
The valid value(s) should be the parameters required by the
class to be used.
Advanced settings
level.
Global Variables
Global Variables INSTANCE: the instance of a .NET object. This is an After

variable and it returns an object.
check box.
use from it.
User Guide.
636
tDotNETInstantiate
Usage
Usage rule This component can be used as a start component in a flow

or an independent subJob.
To use this component, you must first install the runtime
DLLs, for example janet-win32.dll for Windows 32-bit
version and janet-win64.dll for Windows 64-bit version,
from the corresponding Microsoft Visual C++ Redistributabl
e Package. This allows you to avoid errors like the
UnsatisfiedLinkError on dependent DLL.
So ensure that the runtime and all of the other DLLs which
the DLL to be called depends on are installed and their
versions are consistent among one another.
Note: The required DLLs can be installed in the

System32 folder or in the bin folder of the Java runtime
to be used. If you need to export a Job using this
component to run it outside the Studio, you have to
specify the runtime container of interest by setting
the -Djava.library.path argument accordingly. For users
of Talend solutions with ESB, to run a Job using this
component in ESB Runtime, you need to copy the
runtime DLLs to the %KARAF_HOME%/lib/wrapper/
directory.
Related scenario
For a related scenario, see Utilizing .NET in Talend on page 643.
637
tDotNETRow
tDotNETRow
Facilitates data transform by utilizing custom or built-in .NET classes.
tDotNETRow sends data to and from libraries and classes within .NET or other custom DLL files.
tDotNETRow Standard properties

These properties are used to configure tDotNETRow running in the Standard Job framework.
The Standard tDotNETRow component belongs to the DotNET family.
Basic settings
Schema and Edit schema A schema is a row description, it defines the number
component. The schema is either built-in or remotely stored
in the Repository.

data
Use a static method Select this check box to invoke a static method in .NET and
this will disable Use an existing instance check box.
Propagate a data to output Select this check box to propagate a transformed data to
output.
Use an existing instance Select this check box to reuse an existing instance of a .NET
object from the Existing instance to use list.
Existing instance to use: Select an existing instance of .NET
objects created by the other .NET components from the list.
Note: This check box will be disabled if you have

selected Use a static method and selecting this check
box will disable Dll to load, Fully qualified class
name(i.e. ClassLibrary1.NameSpace2.Class1) and
Value(s) to pass to the constructor.
Dll to load Type in the path, or browse to the DLL library containing
the class(es) of interest or enter the assembly's name
to be used. For example, System.Data, Version=2.0.0.0,
Culture=neutral, PublicKeyToken=b77a5c561934e089 for an
OleDb assembly.
Fully qualified class name(i.e. ClassLibrary1. Enter a fully qualified name for the class of interest.
NameSpace2.Class1)
Method name Fill this field with the name of the method to be invoked
in .NET.
638
tDotNETRow
Value(s) to pass to the constructor Click the plus button to add one or more lines for values to
be passed to the constructor for the object. Or, leave this
table empty to call a default constructor for the object.
The valid value(s) should be the parameters required by the
class to be used.
Method Parameters Click the plus button to add one or more lines for
parameters to be passed to the method.
Output value target column Select a column in the output row from the list to put value
into it.
Advanced settings
Create a new instance at each row Select this check box to create a new instance at each row
that passes through the component.
Method doesn't return a value Select this check box to invoke a method without returning
a value as a result of the processing.
Returns an instance of a .NET Object Select this check box to return an instance of a .NET object
as a result of a invoked method.
Store the returned value for later use Select this check box to store the returned value of a
method for later reuse in another tDotNETRow component.
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is utilized to integrate with .NET objects.

To use this component, you must first install the runtime
DLLs, for example janet-win32.dll for Windows 32-bit
version and janet-win64.dll for Windows 64-bit version,
from the corresponding Microsoft Visual C++ Redistributabl
639
tDotNETRow
e Package. This allows you to avoid errors like the

UnsatisfiedLinkError on dependent DLL.
So ensure that the runtime and all of the other DLLs which
the DLL to be called depends on are installed and their
versions are consistent among one another.
Note:
The required DLLs can be installed in the System32
folder or in the bin folder of the Java runtime to be used.
If you need to export a Job using this component to run
it outside the Studio, you have to specify the runtime
container of interest by setting the -Djava.library.path
argument accordingly. For users of Talend solutions
with ESB, to run a Job using this component in ESB
Runtime, you need to copy the runtime DLLs to the
%KARAF_HOME%/lib/wrapper/ directory.
Integrating .Net into Talend Studio: Introduction

This article describes the way to integrate .Net into Talend Studio, for example, invoking dll methods
in a Talend Studio Job.
Based on the runtime dlls (such as janet-win64.dll), Talend Studio provides the capability of
integrating .NET and Java, through which you can access C++ libraries and invoke their methods
easily in Java. Normally, for a Talend Studio user, this can be implemented in two ways: utilizing the
components in the DotNET family (that is, tDotNetInstantiate and tDotNetRow) in Talend Studio and
custom code. This article discusses the first method.
In a Talend Studio Job, the tDotNetInstantiate component can be used as a start component in a
flow or an independent subJob. It loads a system assembly or a custom dll by creating a .NET object.
The object can then be used by the subsequent tDotNetRow components for invoking the methods.
You need also to specify the class and set parameters of the constructor for a tDotNetInstantiate
component.
The tDotNetRow component references a .NET object created by a tDotNetInstantiate compon
ent. It can be used mid-flow, start the flow, or end the flow. You need to specify the method to be
invoked and set the parameters for the method. This component also passes the output of the method
to a specified column defined in the schema. So, you need to add columns in the schema of the
component and specify the column which the output values are passed to.
Note: For information about configuring the tDotNetInstantiate and tDotNetRow components, see
Talend Components Reference Guide.
This article shows the way to invoke dll methods in a Talend Studio Job, which uses the two DotNet
family components.
Integrating .Net into Talend Studio: Prerequisites

The prerequisites for invoking dll methods in a Talend Studio Job:
• Obtain the janet dll (that is, janet-win64>.dll): click here for .NET 3.5 or here for .NET 4.0.
640
tDotNETRow
• Place the file in a directory that the system variable Path points to (for example, %JAVA_HOME%
\bin, C:\Windows\System32, etc). You can also place it in another directory. In this case, you
need to add the directory as a library path using -Djava.library.path=path_to_direct
ory_containing_the_dll.
• The system assembly or the dll to integrate already exists.
Integrating .Net into Talend Studio: configuring the Job
Configuring tDotNetInstantiate
About this task

In the Basic settings of the tDotNetInstantiate component, take the following steps.
Procedure
1. Specifying the dll to load in the DDL to load field. The DLL can be a system assembly or a custom
DLL.
For system assemblies, you can specify the name of the desired system assembly (for example,
“System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken
=b77a5c561934e089”); for custom dlls, you need to provide the absolute path to the dll (for
example, "C:\\WINDOWS\\system32\\ClassLibrary1.dll)".
2. Specify the class name and the name space in the Fully qualified class name field
3. Set parameter values for the constructor in the Value(s) to pass to the constructor field.
Configuring tDotNetRow
About this task

The tDotNetRow component invokes methods of a .Net object created by a tDotNetInstantiate
component and passes the output (if any) to the next component. This component can also
create .Net objects, which can also be reused by subsequent components.
In the Basic settings of the tDotNetRow component, take the following steps.
641
tDotNETRow
Procedure
1. Add columns in the schema by clicking the Edit schema button or using the schema propagated
to this component. You need to specify one of the columns of the schema for holding the output
value (if any) using the Output value target column drop-down list.
2. Select Propagate data to output to pass the data from input to output.
3. Take either of the following two options.
• If you have deployed a tDotNetInstantiate component for creating the .Net object, select Use
an existing instance and select the component from the Existing instance to use drop-down
list to refer the corresponding .Net object.
• You can also create a new .Net object for use. To achieve this, make sure Use an existing
instance is not select, set DLL to load, Fully qualified class name, Method Name, and Value(s)
to pass to the constructor options as needed.
4. Provide the name of the method to invoke in the Method Name field.
5. Provide the parameter values for the method in rows of the Method Parameters filed. As
prompted, you can use input row values as parameter values (for example, input_row.colu
mn_name).
642
tDotNETRow
Note:
• For information about other options of this component, refer to Talend Components
Reference Guide.
• See Utilizing .NET in Talend section in Talend Components Reference Guide for an example of
this article.
Utilizing .NET in Talend

This scenario describes a three-component Job that uses a DLL library containing a class called
Test1.Class1 Class and invokes a method on it that processes the value and output the result onto the
console.
Prerequisites
Before replicating this scenario, you need first to build up your runtime environment.
• Create the DLL to be loaded by tDotNETInstantiate
This example class built into .NET reads as follows:
using System;
using System.Collections.Generic;
using System.Text;
namespace Test1
{
public class Class1
{
string s = null;
public Class1(string s)
{
this.s = s;
}
public string getValue()

{
return "Return Value from Class1: " + s;
}
643
tDotNETRow
}
This class reads the input value and adds the text Return Value from Class1: in front of this value. It
is compiled using the latest .NET.
• Install the runtime DLL from the latest .NET. In this scenario, we use janet-win32.dll on Windows
32-bit version and place it in the System32 folder.
Thus the runtime DLL is compatible with the DLL to be loaded.
Connecting components
Procedure
1. Drop the following components from the Palette to the design workspace: tDotNETInstantiate,
tDotNETRow and tLogRow.
2. Connect tDotNETInstantiate to tDotNETRow using a Trigger On Subjob OK connection.
3. Connect tDotNETRow to tLogRow using a Row Main connection.
Configuring tDotNETInstantiate
Procedure
1. Double-click tDotNETInstantiate to display its Basic settings view and define the component
properties.
2. Click the three-dot button next to the Dll to load field and browse to the DLL file to be loaded.
Alternatively, you can fill the field with an assembly. In this example, we use :
"C:/Program Files/ClassLibrary1/bin/Debug/ClassLibrary1.dll""
3. Fill the Fully qualified class name field with a valid class name to be used. In this example, we
use:
"Test1.Class1"
4. Click the plus button beneath the Value(s) to pass to the constructor table to add a new line for
the value to be passed to the constructor.
In this example, we use:
"Hello world"
644
tDotNETRow
Configuring tDotNETRow
Procedure
1. Double-click tDotNETRow to display its Basic settings view and define the component properties.
2. Select Propagate data to output check box.

3. Select Use an existing instance check box and select tDotNETInstantiate_1 from the Existing
instance to use list on the right.
4. Fill the Method Name field with a method name to be used. In this example, we use "getValue", a
custom method.
5. Click the three-dot button next to Edit schema to add one column to the schema.
Click the plus button beneath the table to add a new column to the schema and click OK to save
the setting.
6. Select newColumn from the Output value target column list.
Configuring tLogRow
Procedure
1. Double-click tLogRow to display its Basic settings view and define the component properties.
645
tDotNETRow
2. Click Sync columns button to retrieve the schema defined in the preceding component.
3. Select Table in the Mode area.
Results
Save your Job and press F6 to execute it.
From the result, you can read that the text Return Value from Class1 is added in front of the
retrieved value Hello world.
646
tDropboxConnection
tDropboxConnection
Creates a Dropbox connection to a given account that the other Dropbox components can reuse.
tDropboxConnection Standard properties

These properties are used to configure tDropboxConnection running in the Standard Job framework.
The Standard tDropboxConnection component belongs to the Cloud family.
Basic settings
Access Token Enter the access token required by the Dropbox account you
need to connect to. This access token allows the Studio to
make Dropbox API calls for that Dropbox account.
Note that a Dropbox App should have been created under
that account before generating the access token. For further
information about a Dropbox access token, see https://
www.dropbox.com/developers/blog/94/generate-an-access-
token-for-your-own-account.
Use HTTP Proxy If you are using a proxy, select this check box and enter the
Advanced settings
Global Variables

Usage
Usage rule This component is used standalone as a subJob to create

the Dropbox connection to be used. In a Job design, it is
often connected to the other Dropbox components using
the Trigger links such as On Subjob Ok link.
Related scenario
See Uploading files to Dropbox on page 655
647
tDropboxDelete
tDropboxDelete
Removes a given folder or file from Dropbox.
tDropboxDelete Standard properties

These properties are used to configure tDropboxDelete running in the Standard Job framework.
The Standard tDropboxDelete component belongs to the Cloud family.
Basic settings
Use Existing Connection Select this check box and in the Component List click the
Path Enter the path on Dropbox pointing to the folder or the file
you need to remove.
Note that the path string should start with a slash (/). It is
the root folder of the Dropbox App for which you are using
the current access token.
Advanced settings
Global Variables

Usage

remove data from Dropbox.
648
tDropboxDelete
Related scenarios
649
tDropboxGet
tDropboxGet
Downloads a selected file from a Dropbox account to a specified local directory.
tDropboxGet Standard properties

These properties are used to configure tDropboxGet running in the Standard Job framework.
The Standard tDropboxGet component belongs to the Cloud family.
Basic settings
Path Enter the path on Dropbox pointing to the file you need to
download.
Save As File Select this check box to display the File field and browse
to, or enter the local directory where you want to store the
downloaded file. The existing file, if any, is replaced.
component.
the button next to Edit schema to view the predefined
schema that contains the following two columns:
• fileName: the name of the downloaded file.
• content: the content of the downloaded file.
Advanced settings
650
tDropboxGet
Global Variables

Usage
Usage rule This component can be used alone or along with other
components via the Iterate link or a trigger link such as On
Subjob OK.
Related scenarios
651
tDropboxList
tDropboxList
Lists the files stored in a specified directory on Dropbox.
tDropboxList Standard properties

These properties are used to configure tDropboxList running in the Standard Job framework.
The Standard tDropboxList component belongs to the Cloud family.
Basic settings
Path Enter the path pointing to the folder you need to list the
files from, or enter the path pointing to the exact file you
need to read.
List Type Select the type of data you need to list from the specified
path.
Include subdirectories Select this check box to list files from any existing sub-
folders in addition to the files in the directory defined in
the Path field.
Advanced settings
Global Variables
NAME The name of the remote file being processed. This is a Flow
652
tDropboxList
PATH The path to the folder or the file being processed on

Dropbox. This is a Flow variable and it returns a string.
LAST_MODIFIED The timestamp of the last modification of the file being

processed. This is a Flow variable and it returns a long.
SIZE The volume of the file being processed. This is a Flow

IS_FILE The boolean result of the file listing. This is a Flow variable
and it returns a boolean. The result Yes indicates that the
listed data is of the type File; otherwise, the type is Folder.

Usage
Usage rule This component is typically used standalone.
Related scenarios
653
tDropboxPut
tDropboxPut
Uploads data to Dropbox from either a local file or a given data flow.
tDropboxPut Standard properties

These properties are used to configure tDropboxPut running in the Standard Job framework.
The Standard tDropboxPut component belongs to the Cloud family.
Basic settings
Path (File Only) Enter the path pointing to the file you need to write
contents in. This file will be created on the fly if it does not
exist.
Upload Mode Select upload mode to be used:

• Rename if Existing: the uploaded file is automatically
renamed. For example, a file named test.txt might be
renamed to test (1).txt.
• Replace if Existing: the uploaded file replaces the
existing one.
• Update specified Revision: the file you are uploading
is used to update a specific revision of that file. If the
revision you specify is the latest revision, then the
existing file on Dropbox is replaced; if it is an older
revision, the file you are uploading is renamed to
indicate that a conflict is encountered; if the revision
does not exist, an error is returned.
Upload Incoming content as File Select this radio button to read data directly from the input
flow of the preceding component and write the data into
the file specified in the Path field.
654
tDropboxPut
a single column named content and it receives data from
the content column of its input schema only. This means
that you must use a content column in the input data flow
to carry the data to be uploaded. This type of column is typ
ically provided by the tFileInputRaw component. For further
information, see tFileInputRaw on page 1085.
the Expose as OutputStream or the Upload local file radio
button.
Upload local file Select this radio button to upload a locally stored file to Dro
pbox. In the File field that is displayed, you need to enter
the path or browse to this file.
Expose as OutputStream Select this check box to expose the output stream of this
component as a variable named OUTPUTSTREAM so that
the other components can reuse this variable to write the
contents to be uploaded into the exposed output stream.
For example, you can use the Use output stream feature
of the tFileOutputDelimited component to feed a given
tDropboxPut's exposed output stream. For further
information, see tFileOutputDelimited on page 1113.
Advanced settings
Global Variables

Usage
Usage rule This component is used either standalone in a subJob

to directly upload a local file to Dropbox or as an end
component of a Job flow to upload given data being
handled in this flow.
Uploading files to Dropbox

In this scenario, a six-component Job consisting of three subJobs is created to write data onto
Dropbox using different upload modes.
655
tDropboxPut
Before replicating this scenario, you need to create a Dropbox App under the Dropbox account to be
used. In this scenario, the Dropbox App to be used is named to talenddrop and thus the root folder
in which files are uploaded is talenddrop, too. In addition, the access token to this folder has been
generated from the App console provided by Dropbox.
For further information about a Dropbox App, see https://www.dropbox.com/developers/apps/.

Procedure
the list that appears. In this scenario, the components are tDropboxConnection, tFixedFlowInput,
tFileOutputDelimited, tFileInputRaw and two tDropboxPut components.
The tFixedFlowInput component generates some data to be uploaded to Dropbox in this scenario.
In the real-world case, you can use other components such as tMysqlInput or tMap in the place of
tFixedFlowInut to design a sophisticated process to prepare your data to be handled.
3. Connect tFixedFlowInput to tFileOutputDelimited using the Row > Main link.
4. Do the same to connect tFileOutputDelimited to one of the two tDropboxPut components and
connect tFileInputRaw to the other tDropboxPut component.
5. Connect tDropboxConnection to tFixedFlowInput using the Trigger > On Subjob Ok link. Then
connect tFixedFlowInput to tFileInputRaw using the same type of link.
Connecting to Dropbox
Procedure
1. Double-click tDropboxConnection to open its Component view.
656
tDropboxPut
2. In the Access token field, paste the token that you have generated via the App console of Dropbox
for accessing the Dropbox App folder to be used.
Generating the output stream

Defining the input data
Procedure
1. Double-click tFixedFlowInput to open its Component view.
In this scenario, only three rows of sample data are created to indicate three countries and their
calling codes.
33;France
86;China
81;Japan
2. Click the [...] button next to Edit schema to open the schema editor.
3. Click the [+] button twice to add two rows and in the Column column, rename them to code and
country, respectively.
657
tDropboxPut
box.
5. In the Mode area, select the Use Inline Table radio button. The code and the country column have
been automatically created in this table.
6. Enter the sample data mentioned above in this table.
Defining the output stream
Procedure
1. Double-click tFileOutputDelimited to open its Component view.
2. Select the Use output stream check box to write the data to be outputted into a given output
stream.
3. In the Output stream field, enter the code to define the output stream you need to write data
in. In this scenario, it is the output stream of the tDropboxPut_1 component linked with the
current component. Thus the code used to write the data reads as follows:((java.io.Outp
utStream)globalMap.get("tDropboxPut_1_OUTPUTSTREAM"))
Note that in this example code, the tDropboxPut component has the number 1 as its affix, w
hich represents its component ID distributed automatically within this Job. If the tDropboxPut
component you are using has a different ID, you need to adapt the code to that ID number.
4. Click Edit schema to verify that the schema of this component is identical with that of the
preceding tFixedFlowInput component. If not so, click the Sync columns button to make both of
the schemas identical.
5. Navigate to the Advanced settings tab.
658
tDropboxPut
6. Mark the Custom the flush buffer size check box. This automatically adds 1 in the Row number
field.
Exposing the tDropboxPut output stream

Procedure
1. Double-click the tDropboxPut component linked with tFileOutputDelimited to open its
Component view.
2. Select the Use existing connection check box to reuse the connection created by tDropboxConnec
tion.
3. In the Path field, enter the path pointing to the file you need to write data in, with a slash (/) at
the beginning of the path. For example, enter /calling_code.csv.
4. In the Upload mode area, select the Rename if Existing radio button.
5. Select the Expose As OutputStream radio button to expose the output stream of this component
so that the other component, tFileOutputDelimited in this scenario, can write data in the stream.
Defining the media data to be uploaded

Procedure
1. Double-click tFileInputRaw to open its Component view.
This component is used to read a picture named esb_architecture.png into the data flow. In the
real-world practice, this file can be of many other formats, such as pdf, xls, ppt or mp3.
659
tDropboxPut
2. In the Filename field, enter the path or browse to the file you need to upload.
3. In the Mode area, select the Read the file as a bytes array radio button.
Uploading the incoming contents

Procedure
1. Double-click the tDropboxPut component linked with tFileInputRaw to open its Component view.
2. Select the Use existing connection check box to reuse the connection created by tDropboxConnec
tion.
3. In the Path field, enter the path pointing to the file you need to write data in, with a slash (/) at
the beginning of the path. For example, enter /architecture.png.
4. In the Upload mode area, select Rename if existing.
5. Select the Upload incoming content as file radio button. This displays the Edit schema button to
allow you to view the read-only schema of this component.
Executing the Job

Then you can press F6 to run this Job.
Once done, check the uploaded files in the Dropbox App folder of your Dropbox, in this scenario, the
talenddrop folder.
660
tDTDValidator
tDTDValidator
Helps at controlling data and structure quality of the file to be processed
Validates the XML input file against a DTD file and sends the validation log to the defined output.
tDTDValidator Standard properties

These properties are used to configure tDTDValidator running in the Standard Job framework.
The Standard tDTDValidator component belongs to the XML family.
Basic settings
component.
The schema of this component is read-only. It contains
standard information regarding the file validation.
DTD file Filepath to the reference DTD file.
XML file Filepath to the XML file to be validated.
If XML is valid, display If XML is invalid, display Type in a message to be displayed in the Run console based
on the result of the comparison.
Print to console Select this check box to display the validation message.
Advanced settings
Global Variables

check box.
DIFFERENCE: the result of the validation. This is a Flow
VALID: the validation result. This is a Flow variable and it
returns a boolean.
use from it.
661
tDTDValidator

User Guide.
Usage
Usage rule This component can be used as standalone component but

it is usually linked to an output component to gather the log
data.
Validating XML files

This scenario describes a Job that validates the specified type of files from a folder, displays the
validation result on the Run tab console, and outputs the log information for the invalid files into a
delimited file.
Validating XML files

Procedure
1. Drop the following components from the Palette to the design workspace: tFileList,
tDTDValidator, tMap, tFileOutputDelimited.
2. Connect the tFileList to the tDTDValidator with an Iterate link and the remaining component
using a main row.
3. Set the tFileList component properties, to fetch an XML file from a folder.
Click the plus button to add a filemask line and enter the filemask: *.xml. Remember Java code
requires double quotes.
Set the path of the XML files to be verified.
Select No from the Case Sensitive drop-down list.
662
tDTDValidator
4. In the tDTDValidate Component view, the schema is read-only as it contains standard log
information related to the validation process.
In the Dtd file field, browse to the DTD file to be used as reference.
5. Click in the XML file field, press Ctrl+Space bar to access the variable list, and double-click the
current filepath global variable: tFileList.CURRENT_FILEPATH.
6. In the various messages to display in the Run tab console, use the jobName variable to recall
the job name tag. Recall the filename using the relevant global variable: ((String)globa
lMap.get("tFileList_1_CURRENT_FILE")). Remember Java code requires double quotes.
Select the Print to Console check box.
7. In the tMap component, drag and drop the information data from the standard schema that you
want to pass on to the output file.
8. Once the Output schema is defined as required, add a filter condition to only select the log
information data when the XML file is invalid.
Follow the best practice by typing first the wanted value for the variable, then the operator based
on the type of data filtered then the variable that should meet the requirement. In this case: 0 ==
row1.validate.
9. Then connect (if not already done) the tMap to the tFileOutputDelimited component using a Row
> Main connection. Name it as relevant, in this example: log_errorsOnly.
10. In the tFileOutputDelimited Basic settings, define the destination filepath, the field delimiters and
the encoding.
11. Save your Job and press F6 to run it.
663
tDTDValidator
On the Run console the messages defined display for each of the files. At the same time the
output file is filled with the log data for invalid files.
664
tDynamoDBInput
tDynamoDBInput
Retrieves data from an Amazon DynamoDB table and sends them to the component that follows for
transformation.
tDynamoDBInput Standard properties

These properties are used to configure tDynamoDBInput running in the Standard Job framework.
The Standard tDynamoDBInput component belongs to the Big Data and the Databases NoSQL
families.
Fabric.
Basic settings
Access Key Enter the access key ID that uniquely identifies an AWS
Account. For further information about how to get your
Access Key and Secret Key, see Getting Your AWS Access
Keys.
Secret Key Enter the secret access key, constituting the security
credentials in combination with the access Key.
settings.

Use End Point Select this check box and in the Server Url field displayed,
specify the Web service URL of the DynamoDB database
service.
AWS Region, see Regions and Endpoints.
Action Select the operation to be performed from the drop-down

list, either Query or Scan. For more information, see Query
and Scan Operations in DynamoDB.
665
tDynamoDBInput

available:
only.
If a column stores JSON documents, select JSON from the
DB Type drop-down list.
Table Name Specify the name of the table to be queried or scanned.
Use advanced key condition expression Select this check box and in the Advanced key condition
expression field displayed, specify the key condition
expressions used to determine the items to be read from the
table or index.
Key condition expression Specify the key condition expressions used to determine the
items to be read. Click the [+] button to add as many rows
as needed, each row for a key condition expression, and set
the following attributes for each expression:
• Key Column: Enter the name of the key column.
• Function: Select the function for the key condition
expression.
• Value1: Specify the value used in the key condition
expression.
• Value2: Specify the second value used in the key
condition expression if needed, depending on the
function you selected.
Note that only the items that meet all the key conditions
defined in this table can be returned.
This table is not available when the Use advanced key
condition expression check box is selected.
Use filter expression Select this check box to use the filter expression for the
query or scan operation.
Use advanced filter expression Select this check box and in the Advanced filter expression
field displayed, specify the filter expressions used to refine
the data after it is queried or scanned and before it is
returned to you.
666
tDynamoDBInput
This check box is available when the Use filter expression

Filter expression Specify the filter expressions used to refine the results
returned to you. Click the [+] button to add as many rows
as needed, each row for a filter expression, and set the fol
lowing attributes for each expression:
• Column: Enter the name of the column used to refine
the results.
• Function: Select the function for the filter expression.
• Value1: Specify the value used in the filter expression.
• Value2: Specify the second value used in the filter
expression if needed, depending on the function you
selected.
Note that only the items that meet all the filter conditions
defined in this table can be returned.
This table is available when the Use filter expression check
box is selected and the Use advanced filter expression
check box is cleared.
Value mapping Specify the placeholders for the expression attribute values.
• value: Enter the expression attribute value.
• placeholder: Specify the placeholder for the
corresponding value.
For more information, see Expression Attribute Values.
Name mapping Specify the placeholders for the attribute names that
conflict with the DynamoDB reserved words.
• name: Enter the name of the attribute that conflicts
with a DynamoDB reserved word.
• placeholder: Specify the placeholder for the
corresonding attribute name.
For more information, see Expression Attribute Names.
Advanced settings
retrieved from.
Global Variables

667
tDynamoDBInput

check box.
use from it.
User Guide.
Usage

Writing and extracting JSON documents from DynamoDB

Use tDynamoDBOutput to write a JSON document to a DynamoDB table and then use
tDynamoDBInput to extract a child element of this JSON element.
Prerequisites:
• A Talend Studio with Big Data
• Your AWS credentials that have been granted the access to your Amazon DynamoDB.
The sample data to be used reads like this:
21058;{"accountId" : "900" , "accountName" : "xxxxx" , "action" : "Create",

"customerOrderNumber" : { "deliveryCode" : "261" , "deliveryId" : "313"}}
21059;{"accountId" : "901" , "accountName" : "xxxxy" , "action" : "Delete",
This data has two columns: DeliverID and EventPayLoad, seperated by a semicolon (;). The JSON
document itself is stored in the EventPayLoad column.
668
tDynamoDBInput
Designing the data flow around the DynamoDB components

Drop tFixedflowInput, tDynamoDBOutput, tDynamoDBInput and tLogRow on the design space of your
Studio to create the Job.
Procedure
1. In the Integration perspective of the Studio, create an empty Standard Job from the Job Designs
node in the Repository tree view.
the list that appears. In this scenario, the components are tFixedflowInput, tDynamoDBOutput,
tDynamoDBInput and tLogRow.
The tFixedFlowInput component is used to load the sample data into the data flow. In the real-
world practice, use the input component specific to the data format or the source system to be
used instead of tFixedFlowInput.
3. Connect tFixedFlowInput to tDynamoDBOutput and connect tDynamoDBInput to tLogRow using
the Row > Main link.
4. Connect tFixedFlowInput to tDynamoDBInput using the Trigger > On Subjob Ok link.
Writing the sample JSON documents to DynamoDB

Configure tFixedFlowInput to load the sample data in the data flow and configure tDynamoDBOutput
to write this data in a DynamoDB table.
About this task
Procedure
1. Double-click tFixedFlowInput in its Component view.
Example
2. Click the ... button next to Edit schema to open the schema editor.
669
tDynamoDBInput
Example
3. Click the + button twice to add two rows, each representing a column of the sample data, and in
the Column column, name these columns to DeliveryId and EventPayload, respectively.
4. On the row for the DeliveryId column, select the check box in the Key column to use
this DeliveryID column as the partition key column of the DynamoDB table to be used. A
DynamoDB table requires a partition key column.
5. Click OK to validate these changes and once prompted, accept the propagation of the schema to
the connected component, tDynamoDBOutput.
6. In the Mode area, select the Use Inline content radio box and enter the sample data in the field
that is displayed:
Example
21058;{"accountId" : "900" , "accountName" : "xxxxx" , "action" : "Create",

21059;{"accountId" : "901" , "accountName" : "xxxxy" , "action" : "Delete",
7. Double-click tDynamoDBOutput to open its Component view.
670
tDynamoDBInput
Example
8. Click the ... button next to Edit schema to open the schema editor. This component should have
retrieved the schema from tFixedFlowInput.
Example
9. In the DB Type column, select JSON for the EventPayload column, as this is the column in
which the JSON documents are stored.
10. In the Access key and Secret key fields, enter the credentials of the AWS account to be used to
access your DynamoDB database.
11. From the Region drop-down list, select the AWS region to be used. If you do not know which
region to select, ask the administrator of your AWS system for more information.
12. From the Action on table drop-down list, select Drop table is exists and create.
13. From the Action on data drop-down list, select Insert.
14. In the Table name field, enter the name to be used for the DynamoDB table to be created.
671
tDynamoDBInput
15. In the Partition Key field, enter the name of the column to be used to provide parition keys. In this
example, it is DeliveryId.
Extracting a JSON document using advanced filters

Configure tDynamoDBInput to use an advanced filter to read a JSON document from DynamoDB and
use tLogRow to output this document in the console of the Studio.
About this task
Procedure
1. Double-click tDynamoDBInput to open its Component view.
Example
2. Click the ... button next to Edit schema to open the schema editor.
672
tDynamoDBInput
Example
3. Click the + button twice to add two rows, each representing a column of the sample data, and in
the Column column, name these columns to DeliveryId and EventPayload, respectively.
4. On the row for the DeliveryId column, select the check box in the Key column to use
this DeliveryID column as the partition key column of the DynamoDB table to be used. A
DynamoDB table requires a partition key column.
5. In the DB Type column, select JSON for the EventPayload column, as this is the column in
which the JSON documents are stored.
6. In the Access key and Secret key fields, enter the credentials of the AWS account to be used to
access your DynamoDB database.
7. From the Region drop-down list, select the same region as you selected in the previous steps for
tDynamoDBOutput.
8. From the Action drop-down list, select Scan.
9. In the Table Name field, enter the name of the DynamoDB table to be created by
tDynamoDBOutput.
10. Select the Use filter expression check box and then the Use advanced filter expression check box.
11. In the Advanced filter expression field, enter the filter to be used to select JSON documents.
Example
"EventPayload.customerOrderNumber.deliveryCode = :value"
The part on the left of the equals sign reflects the structure within a JSON document of the samp
le data, in the EventPayload column. The purpose is to use the value of deliveryCode
element to filter the document to be read.
You need to define the :value placeholder in the Value mapping table.
12. Under the Value mapping table, click the + button to add one row and do the following:
a) In the value column, enter the value of the JSON element to be used as a filter.
Example
In this example, this element is deliveryCode and you need to extract the JSON document
in which the value of the deliveryCode element is 261. As this value is a string, enter 261
within double quotation marks.
673
tDynamoDBInput
If this value is an integer, do not use any quotation marks.

b) In the Placeholder column, enter the name of the placeholder to be defined, without any
quoation marks. In this example, it is :value, as you have put in the Advanced filter
expression.
A placeholder name must start with a colon (:).
13. Double-click tLogRow to open its Component view and select the Table radio box to display the
extracted data in a table in the console of the Studio.
14. Press Ctrl+S to save the Job and press F6 to run it.
Results
Once done, the retrieved JSON document is displayed in the console of the Run view of the Studio.
In the created DynamoDB table, you can see the both of the sample JSON documents.
674
tDynamoDBOutput
tDynamoDBOutput
Creates, updates or deletes data in an Amazon DynamoDB table.
tDynamoDBOutput Standard properties

These properties are used to configure tDynamoDBOutput running in the Standard Job framework.
The Standard tDynamoDBOutput component belongs to the Big Data and the Databases NoSQL
families.
Fabric.
Basic settings
Access Key Enter the access key ID that uniquely identifies an AWS
Account. For further information about how to get your
Access Key and Secret Key, see Getting Your AWS Access
Keys.
Secret Key Enter the secret access key, constituting the security
credentials in combination with the access Key.
settings.

Use End Point Select this check box and in the Server Url field displayed,
specify the Web service URL of the DynamoDB database
service.
AWS Region, see Regions and Endpoints.

• Default: No operation is carried out.
created again.
675
tDynamoDBOutput
does not exist.
• Insert: Insert new items from the input flow.
• Update: Update existing items according to the input
flow.
• Delete: Remove existing items according to the input
flow.

available:
only.
If a column stores JSON documents, select JSON from the
DB Type drop-down list.
Table Name Specify the name of table to be written.
Partition Key Specify the partition key of the specified table.
Sort Key Specify the sorted key of the specified table.
Advanced settings
retrieved from.
676
tDynamoDBOutput
Read Capacity Unit Specify the number of read capacity units. For more
information, see Amazon DynamoDB Provisioned
Throughput.
Write Capacity Unit Specify the number of write capacity units. For more
information, see Amazon DynamoDB Provisioned
Throughput.
Global Variables

check box.
use from it.
User Guide.
Usage

Related scenarios
677
tEDIFACTtoXML
tEDIFACTtoXML
Transforms an EDIFACT message file into the XML format for better readability to users and
compatibility with processing tools.
This component reads a United Nations/Electronic Data Interchange For Administration, Commerce
and Transport (UN/EDIFACT) message and transforms it into the XML format according to the
EDIFACT version and the EDIFACT family.
tEDIFACTtoXML Standard properties

These properties are used to configure tEDIFACTtoXML running in the Standard Job framework.
The Standard tEDIFACTtoXML component belongs to the XML family.
Basic settings
component.
The schema of this component is fixed and read-only, with
only one column: document.
EDI filename Filepath to the EDIFACT message file to be transformed.
EDI version Select the EDIFACT version of the input file.
Ignore new line Select this check box to skip carriage returns in the input
file.
Die on error Select this check box to stop Job execution when an error
is encountered. By default, this check box is cleared, and
therefore illegal rows are skipped and the process is
completed for the error free rows.
Advanced settings
Global Variables

check box.
678
tEDIFACTtoXML

use from it.
User Guide.
Usage
Usage rule This component is usually linked to an output component to

gather the transformation result.
Reading an EDIFACT message file and saving it to XML

This scenario describes a simple Job that reads a UN/EDIFACT Customs Cargo (CUSCAR) message file
and saves it as an XML file.

Procedure
1. Drop the tEDIFACTtoXML component and the tFileOutputXML component from the Palette to the
design workspace.
2. Connect the tEDIFACTtoXML component and the tFileOutputXML component using a Row > Main
connection.
Results

Procedure
1. Double-click the tEDIFACTtoXML component to show its Basic settings view.
2. Fill the EDI filename field with the full path to the input EDIFACT message file.
In this use case, the input file is 99a_cuscar.edi.
679
tEDIFACTtoXML
3. From EDI version list, select the EDIFACT version of the input file, D99A in this use case.
4. Select the Ignore new line check box to skip the carriage return characters in the input file during
the transformation.
5. Leave the other parameters as they are.
6. Double-click the tFileOutputXML component to show its Basic settings view.
7. Fill the File Name field with the full path to the output XML file you want to generate.
In this use case, the output XML is 99a_cuscar.xml.

Procedure
Results
The input EDIFACT CUSCAR message file is transformed into the XML format and the output XML file
is generated as defined.
680
tEDIFACTtoXML
681
tELTGreenplumInput
tELTGreenplumInput
Adds as many Input tables as required for the most complicated Insert statement.
The three ELT Greenplum components are closely related, in terms of their operating conditions.
These components should be used to handle Greenplum DB schemas to generate Insert statements,
including clauses, which are to be executed in the DB output table defined.
Provides the table schema to be used for the SQL statement to execute.
tELTGreenplumInput Standard properties

These properties are used to configure tELTGreenplumInput running in the Standard Job framework.
The Standard tELTGreenplumInput component belongs to the ELT family.
Basic settings
available:
only.

Guide.

Studio User Guide.
Default Table Name Type in the default table name.
Default Schema Name Type in the default schema name.
Advanced settings
682
tELTGreenplumInput
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTGreenplumInput is to be used along with the

tELTGreenplumMap. Note that the Output link to be used
with these components must correspond strictly to the
syntax of the table name
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Mapping data using a simple implicit join on page 686
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
683
tELTGreenplumMap
tELTGreenplumMap
Uses the tables provided as input to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
Helps you to build the SQL statement graphically, using the table provided as input.
tELTGreenplumMap Standard properties

These properties are used to configure tELTGreenplumMap running in the Standard Job framework.
The Standard tELTGreenplumMap component belongs to the ELT family.
Basic settings
connection.
Guide.
ELT Greenplum Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.
Style link Select the way in which links are displayed.

Auto: By default, the links between the input and output
schemas and the Web service parameters are in the form of
curves.
Bezier curve: Links between the schema and the Web
service parameters are in the form of curve.
Line: Links between the schema and the Web service
parameters are in the form of straight lines.
This option slightly optimizes performance.
684
tELTGreenplumMap

data.

settings.
Advanced settings

Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTGreenplumMap is used along with tELTGreenplumInput

and tELTGreenplumOutput. Note that the Output link to be
used with these components must correspond strictly to the
syntax of the table name.
685
tELTGreenplumMap
Note:
unusable.
Guide.
Mapping data using a simple implicit join

In this scenario, a tELTGreenplumMap component is deployed to retrieve the data from the source
table employee_by_statecode, compares its statecode column against the table statecode, and then maps
the desired columns from the two tables to the output table employee_by_state.
Before the Job execution, the three tables, employee_by_statecode, statecode and employee_by_state
look like:
686
tELTGreenplumMap
Dropping components
Procedure
1. Add the following components from the Palette to the workspace:
• tGreenplumConnection
• two tELTGreenplumInput
• tELTGreenplumMap
• tELTGreenplumOutput
• tGreenplumCommit
• tGreenplumInput
• tLogRow
2. Rename the following components:
• tGreenplumConnection to connect_to_greenplum_host
• two tELTGreenplumInput to employee+statecode and statecode
• tELTGreenplumMap to match+map
• tELTGreenplumOutput to map_data_output
• tGreenplumCommit to commit_to_host
• tGreenplumInput to read_map_output_table
• tLogRow to show_map_data
3. Connect the components in the Job:
• link tGreenplumConnection to tELTGreenplumMap using an OnSubjobOk trigger
• link tELTGreenplumMap to tGreenplumCommit using an OnSubjobOk trigger
• link tGreenplumCommit to tGreenplumInput using an OnSubjobOk trigger
• link tGreenplumInput to tLogRow using a Row > Main connection
The two tELTGreenplumInput components and tELTGreenplumOutput will be linked to
tELTGreenplumMap later once the relevant tables have been defined.
687
tELTGreenplumMap

Procedure
1. Double-click tGreenplumConnection to open its Basic settings view in the Component tab.
a) In the Host and Port fields, enter the context variables for the Greenplum server.
b) In the Database field, enter the context variable for the Greenplum database.
c) In the Username and Password fields, enter the context variables for the authentication
credentials.
For more information on context variables, see Talend Studio User Guide.
2. Double-click employee+statecode to open its Basic settings view in the Component tab.
a) In the Default table name field, enter the name of the source table, namely employee_by_st
atecode.
b) Click the [...] button next to the Edit schema field to open the schema editor.
c) Click the [+] button to add three columns, namely id, name and statecode, with the data type as
INT4, VARCHAR, and INT4 respectively.
d) Click OK to close the schema editor.
688
tELTGreenplumMap
e) Link employee+statecode to tELTGreenplumMap using the output employee_by_statecode.

3. Double-click statecode to open its Basic settings view in the Component tab.
a) In the Default table name field, enter the name of the lookup table, namely statecode.
4. Click the [...] button next to the Edit schema field to open the schema editor.
a) Click the [+] button to add two columns, namely state and statecode, with the data type as
VARCHAR and INT4 respectively.
b) Click OK to close the schema editor.
c) Link statecode to tELTGreenplumMap using the output statecode.
5. Click tELTGreenplumMap to open its Basic settings view in the Component tab.
a) Select the Use an existing connection check box.

6. Click the [...] button next to the ELT Greenplum Map Editor field to open the map editor.
689
tELTGreenplumMap
7. Click the [+] button on the upper left corner to open the table selection box.
a) Select tables employee_by_statecode and statecode in sequence and click Ok. The tables appear
on the left panel of the editor.
8. On the upper right corner, click the [+] button to add an output table, namely employee_by_state.
a) Click Ok to close the map editor.
9. Double-click tELTGreenplumOutput to open its Basic settings view in the Component tab.
690
tELTGreenplumMap
a) In the Default table name field, enter the name of the output table, namely employee_by_state.
a) Click the [+] button to add three columns, namely id, name and state, with the data type as
INT4, VARCHAR, and VARCHAR respectively.
b) Click OK to close the schema editor.
c) Link tELTGreenplumMap to tELTGreenplumOutput using the table output employee_by_state.
d) Click OK on the pop-up window below to retrieve the schema of tELTGreenplumOutput.
ow the map editor's output table employee_by_state shares the same schema as that of
tELTGreenplumOutput.
11. Double-click tELTGreenplumMap to open the map editor.
D.
Drop the columns id and name from table employee_by_statecode as well as the column statecode
from table statecode to their counterparts in the output table employee_by_state.
Click Ok to close the map editor.
691
tELTGreenplumMap
a) Drop the column statecode from table employee_by_statecode to its counterpart of the table
statecode, looking for the records in the two tables that have the same statecode values.
12. Double-click tGreenplumInput to open its Basic settings view in the Component tab.
a) Select the Use an existing connection check box.

b) In the Table name field, enter the name of the source table, namely employee_by_state.
c) In the Query field, enter the query statement, namely "SELECT * FROM \"employee_by_
state\"".
13. Double-click tLogRow to open its Basic settings view in the Component tab.
a) In the Mode area, select Table (print values in cells of a table for a better display.
Executing the Job

Procedure
As shown above, the desired employee records have been written to the table employee_by_state,
presenting clearer geographical information about the employees.
692
tELTGreenplumMap
Related scenarios
• Mapping data using a subquery on page 800, a related scenario using subquery
693
tELTGreenplumOutput
tELTGreenplumOutput
Executes the SQL Insert, Update and Delete statement to the Greenplum database
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.
tELTGreenplumOutput Standard properties

These properties are used to configure tELTGreenplumOutput running in the Standard Job framework.
The Standard tELTGreenplumOutput component belongs to the ELT family.
Basic settings
Action on data On the data of the table defined, you can perform the
following operation:
Insert: Adds new entries to the table.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.
available:
only.

Guide.

Studio User Guide.
694
tELTGreenplumOutput
Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.
Default Table Name Enter the default table name, between double quotation
marks.
Default Schema Name Enter the default schema name,between double quotation
marks.
Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.
Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field which appears.
Mapping Specify the metadata mapping file for the database to

be used. The metadata mapping file is used for the data
type conversion between database and Java. For more
information about the metadata mapping, see the related
documentation for Type mapping.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTGreenplumOutput is to be used along with the

tELTGreenplumMap. Note that the Output link to be used
695
tELTGreenplumOutput
Note:
Related scenarios
696
tELTHiveInput
tELTHiveInput
Replicates the schema, which the tELTHiveMap component that follows will use, of the input Hive
table.
The three ELT Hive components are closely related, in terms of their operating conditions. These
components should be used to handle Hive DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
This component provides, for the tELTHiveMap component that follows, the input schema of the Hive
table to be used.
tELTHiveInput Standard properties

These properties are used to configure tELTHiveInput running in the Standard Job framework.
The Standard tELTHiveInput component belongs to the ELT family.
Fabric.
Basic settings
Schema A schema is a row description. It defines the number

component only.
Job designs.
Edit schema Click Edit schema to make changes to the schema. If the
available:
only.
Default table name Enter the name of the input table to be used.
Default schema name Enter the name of the database schema to which the input
table to be used is related.
697
tELTHiveInput
Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTHiveMap is used along with a tELTHiveInput and

tELTHiveOutput. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name.
If the Studio used to connect to a Hive database is operated
on Windows, you must manually create a folder called tmp
in the root of the disk where this Studio is installed.
Note:
The ELT components do not handle actual data flow but
only schema information.
Usage with Dataproc The ELT Hive components require Tez to be installed on the
Google Cloud Dataproc cluster to be used.
• Use the initialization action explained in this Google
Cloud Platform documentation: Apache Tez on
Dataproc.
• For more details about the general concept of the
initialization actions in a Google Cloud Dataproc
cluster, see the related Google documentation:
Initialization actions.
Related scenarios
• Joining table columns and writing them into Hive on page 710
698
tELTHiveMap
tELTHiveMap
Builds graphically the Hive QL statement in order to transform data.
This component uses the tables provided as input, to feed the parameter in the built statement. The
statement can include inner or outer joins to be implemented between tables or between one table
and its aliases.
tELTHiveMap Standard properties

These properties are used to configure tELTHiveMap running in the Standard Job framework.
The Standard tELTHiveMap component belongs to the ELT family.
Fabric.
Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:
API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.
Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.
Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.
• When you use this component with Google Dataproc:
Project identifier Enter the ID of your Google Cloud Platform project.
699
tELTHiveMap
If you are not certain about your project ID, check it in the
Manage Resources page of your Google Cloud Platform
services.
Cluster identifier Enter the ID of your Dataproc cluster to be used.
Region From this drop-down list, select the Google Cloud region
to be used.
Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.
Database Fill this field with the name of the database.
Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.
• When you use this component with HDInsight:
WebHCat configuration Enter the address and the authentication information

of the Microsoft HD Insight cluster to be used. For
example, the address could be your_hdinsight
_cluster_name.azurehdinsight.net and the
authentication information is your Azure account name:
ychen. The Studio uses this service to submit the Job to
the HD Insight cluster.
In the Job result folder field, enter the location in which
you want to store the execution result of a Job in the
Azure Storage to be used.
HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.
• The Password is defined when creating your
HDInsight cluster for authentication to this cluster.
Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.
700
tELTHiveMap
In the Hostname field, enter the Primary Blob Service

Endpoint of your Azure Storage account without the
https:// part. You can find this endpoint in the Properties
blade of this storage account.
In the Username field, enter the name of the Azure
Storage account to be used.
In the Password field, enter the access key of the Azure
Storage account to be used. This key can be found in the
Access keys blade of this storage account.
• When you use the other distributions:
Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.
Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.
Note:
This field is not available when you select Embedded
from the Connection mode list.

password field, and then in the pop-up dialog box enter
the password between double quotes and click OK to
save the settings.
Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave
701
tELTHiveMap
both the Force MapR ticket authentication check box

and the Use Kerberos authentication check box clear,
and then MapR should be able to automatically find
that ticket on the fly.
The values of the following parameters can be found in
the hive-site.xml file of the Hive system to be used.
1. Hive principal uses the value of hive.metastore
.kerberos.principal. This is the service principal of the
Hive Metastore.
2. HiveServer2 local user principal uses the value of
hive.server2.authentication.kerberos.principal.
3. HiveServer2 local user keytab uses the value of
hive.server2.authentication.kerberos.keytab
4. Metastore URL uses the value of javax.jdo.opti
on.ConnectionURL. This is the JDBC connection string
to the Hive Metastore.
5. Driver class uses the value of javax.jdo.opti
on.ConnectionDriverName. This is the name of the
driver for the JDBC connection.
6. Username uses the value of javax.jdo.opti
on.ConnectionUserName. This, as well as the
Password parameter, is the user credential for
connecting to the Hive Metastore.
7. Password uses the value of javax.jdo.opti
on.ConnectionPassword.
For the other parameters that are displayed, please
consult the Hadoop configuration files they belong to.
For example, the Namenode principal can be found in
the hdfs-site.xml file or the hdfs-default.xml file of the
distribution you are using.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.
For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the
keytab file to be used.
Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog
702
tELTHiveMap
box enter the password between double quotes and

click OK to save the settings.
This feature is available only to the HiveServer2 in the
Standalone mode of the following distributions:
• Hortonworks Data Platform 2.0 +
• Cloudera CDH4 +
• Pivotal HD 2.0 +
• Amazon EMR 4.0.0 +
Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their
hostnames. This actually sets the dfs.client.use
.datanode.hostname property to true. When
connecting to a S3N filesystem, you must select this
check box.
For further information about these parameters, see
the documentation or contact the administrator of the
Hadoop cluster to be used.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.
Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you
703
tELTHiveMap
have chosen a machine called masternode as the

NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.
The other properties:

are stored.
connection.
Guide.
ELT Hive Map editor The ELT Map editor helps you to define the output schema
as well as build graphically the Hive QL statement to be
If you use context variables in the Expression column in the
Map editor to map the input and the output schemas, put
single quotation marks around these context variables, for
example, 'context.v_erpName'.

curves.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
704
tELTHiveMap
you are using. Among these options, the following ones

requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
For a step-by-step example about how to connect to

a custom distribution and share this connection, see
Hortonworks.
705
tELTHiveMap
Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Execution engine Select this check box and from the drop-down list, select
the framework you need to use to run the Job.
This list is available only when you are using the Embedded
mode for the Hive connection and the distribution you are
working with is:
• Custom: this option allows you connect to a
distribution supporting Tez but not officially supported
by Talend .
Before using Tez, ensure that the Hadoop cluster you are
using supports Tez. You will need to configure the access to
the relevant Tez libraries via the Advanced settings view of
this component.
For further information about Hive on Tez, see Apache's
related documentation in https://cwiki.apache.org/con
fluence/display/Hive/Hive+on+Tez. Some examples are
presented there to show how Tez can be used to gain
performance over MapReduce.
When you need to enable Hive components to access HBase:

These parameters are available only when the Use an existing connection check box is clear.
Store by HBase Select this check box to display the parameters to be set to
allow the Hive components to access HBase tables:
• Once this access is configured, you will be able to use,
in tHiveRow and tHiveInput, the Hive QL statements to
read and write data in HBase.
• If you are using the Kerberos authentication, you
need to define the HBase related principals in the
corresponding fields that are displayed.
For further information about this access involving Hive and
HBase, see Apache's Hive documentation about Hive/HBase
integration.
Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.
Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.
Define the jars to register for HBase Select this check box to display the Register jar for HBase
table, in which you can register any missing jar file required
by HBase, for example, the Hive Storage Handler, by default,
registered along with your Hive installation.
Register jar for HBase Click the [+] button to add rows to this table, then, in the
Jar name column, select the jar file(s) to be registered and
706
tELTHiveMap
in the Jar path column, enter the path(s) pointing to that or

those jar file(s).
Advanced settings
Tez lib Select how the Tez libraries are accessed:

• Auto install: at runtime, the Job uploads and deploys
the Tez libraries provided by the Studio into the
directory you specified in the Install folder in HDFS
field, for example, /tmp/usr/tez.
If you have set the tez.lib.uris property in the properties
table, this directory overrides the value of that
property at runtime. But the other properties set in the
properties table are still effective.
• Use exist: the Job accesses the Tez libraries already
deployed in the Hadoop cluster to be used. You need
to enter the path pointing to those libraries in the Lib
path (folder or file) field.
• Lib jar: this table appears when you have selected Auto
install from the Tez lib list and the distribution you are
using is Custom. In this table, you need to add the Tez
libraries to be uploaded.
Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
707
tELTHiveMap
override those default ones. For further information for

Hive dedicated properties, see https://cwiki.apache.org/con
fluence/display/Hive/AdminManual+Configuration.
• If you need to use Tez to run your Hive Job, add
hive.execution.engine to the Properties column and
Tez to the Value column, enclosing both of these
strings in double quotation marks.
Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are
both 1000 which are normally appropriate for running the
computations.
Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.
level.
Global Variables

check box.
use from it.
User Guide.
708
tELTHiveMap
Usage

the table name.
Note:
Dataproc.
unusable.
Guide.
Prerequisites The Hadoop distribution must be properly installed, so as to

guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
709
tELTHiveMap
from MapR: http://www.mapr.com/blog/basic-notes-

on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
Joining table columns and writing them into Hive

This scenario uses a four-component Job to join the columns selected from two Hive tables and write
them into another Hive table.
Preparing the Hive tables

Procedure
1. Create the Hive table you want to write data in. In this scenario, this table is named as agg_result,
and you can create it using the following statement in tHiveRow: create table agg_result
(id int, name string, address string, sum1 string, postal string,
state string, capital string, mostpopulouscity string) partitioned by
(type string) row format delimited fields terminated by ';' location
'/user/ychen/hive/table/agg_result'
In this statement, '/user/ychen/hive/table/agg_result' is the directory used in this scenario to store
this created table in HDFS. You need to replace it with the directory you want to use in your
environment.
For further information about tHiveRow, see tHiveRow on page 1634.
710
tELTHiveMap
2. Create two input Hive tables containing the columns you want to join and aggregate these
columns into the output Hive table, agg_result. The statements to be used are: create table
customer (id int, name string, address string, idState int, id2 int,
regTime string, registerTime string, sum1 string, sum2 string) row
format delimited fields terminated by ';' location '/user/ychen/
hive/table/customer' and create table state_city (id int, postal
string, state string, capital int, mostpopulouscity string) row format
delimited fields terminated by ';' location '/user/ychen/hive/table/
state_city'
3. Use tHiveRow to load data into the two input tables, customer and state_city. The statements to
be used are: "LOAD DATA LOCAL INPATH 'C:/tmp/customer.csv' OVERWRITE INTO
TABLE customer" and "LOAD DATA LOCAL INPATH 'C:/tmp/State_City.csv'
OVERWRITE INTO TABLE state_city"
The two files, customer.csv and State_City.csv, are two local files we created for this scenario. You
need to create your own files to provide data to the input Hive tables. The data schema of each
file should be identical with their corresponding table.
You can use tRowGenerator and tFileOutputDelimited to create these two files easily. For
further information about these two components, see tRowGenerator on page 3134 and
tFileOutputDelimited on page 1113.
For further information about the Hive query language, see https://cwiki.apache.org/confluence/
display/Hive/LanguageManual.

Procedure
1. In the Integration perspective of Talend Studio , create an empty Job from the Job Designs node
in the Repository tree view.
2. Drop two tELTHiveInput components and tELTHiveMap and tELTHiveOutput onto the workspace.
Each time when you connect two components, a wizard pops up to prompt you to name the
link you are creating. This name must be the same as that of the Hive table you want the active
component to process. In this scenario, the input tables the two tELTHiveInput components will
handle are customer and state_city and the output table tELTHiveOutput will handle is agg_result.
Configuring the input schemas

Procedure
1. Double-click the tELTHiveInput component using the customer link to open its Component view.
711
tELTHiveMap
3. Click the button as many times as required to add columns and rename them to replicate the
schema of the customer table we created earlier in Hive.
4. In the Default table name field, enter the name of the input table, customer, to be processed by
this component.
5. Double-click the other tELTHiveInput component using the state_city link to open its Component
view.
712
tELTHiveMap
7. Click the button as many times as required to add columns and rename them to replicate the
schema of the state_city table we created earlier in Hive.
8. In the Default table name field, enter the name of the input table, state_city, to be processed by
this component.
Mapping the input and the output schemas

Configuring the connection to Hive
Procedure
1. Click tELTHiveMap, then, click Component to open its Component view.
2. In the Version area, select the Hadoop distribution you are using and the Hive version.
3. In the Connection mode list, select the connection mode you want to use. If your distribution is
HortonWorks, this mode is Embedded only.
713
tELTHiveMap
4. In the Host field and the Port field, enter the authentication information for the component to
connect to Hive. For example, the host is talend-hdp-all and the port is 9083.
5. Select the Set Jobtracker URI check box and enter the location of the Jobtracker. For example,
talend-hdp-all:50300.
6. Select the Set NameNode URI check box and enter the location of the NameNode. For example,
hdfs://talend-hdp-all:8020. If you are using WebHDFS, the location should be webhdfs://mast
ernode:portnumber; WebHDFS with SSL is not supported yet.
Mapping the schemas
Procedure
1. Click ELT Hive Map Editor to map the schemas
2. On the input side (left in the figure), click the Add alias button to add the table to be used.
3. In the pop-up window, select the customer table, then click OK.
4. Repeat the operations to select the state_city table.
5. Drag and drop the idstate column from the customer table onto the id column of the state_city
table. Thus an inner join is created automatically.
6. On the output side (the right side in the figure), the agg_result table is empty at first. Click
at the bottom of this side to add as many columns as required and rename them to replicate the
schema of the agg_result table you created earlier in Hive.
714
tELTHiveMap
Note:
The type column is the partition column of the agg_result table and should not be replicated in
this schema. For further information about the partition column of the Hive table, see the Hive
manual.
7. From the customer table, drop id, name, address, and sum1 to the corresponding columns in the
agg_result table.
8. From the state_city table, drop postal, state, capital and mostpopulouscity to the corresponding
columns in the agg_result table.
In this scenario, context variables are not used in the Expression column in the Map editor. If you
use context variables, put them in single quotation marks. For example:
Configuring the output schema

Procedure
1. Double-click tELTHiveOutput to open its Component view.
715
tELTHiveMap
2. If this component does not have the same schema of the preceding component, a warning icon
appears. In this case, click the Sync columns button to retrieve the schema from the preceding one
and once done, the warning icon disappears.
3. In the Default table name field, enter the output table you want to write data in. In this example,
it is agg_result.
4. In the Field partition table, click to add one row. This allows you to write data in the partition
column of the agg_result table.
This partition column was defined the moment we created the agg_result table using
partitioned by (type string) in the Create statement presented earlier. This partition
column is type, which describes the type of a customer.
5. In Partition column, enter type without any quotation marks and in Partition value, enter
prospective in single quotation marks.
Executing the Job

Procedure
Press F6 to run this Job.
Results
Once done, verify agg_result in Hive using, for example,
select * from agg_result;
716
tELTHiveMap
This figure present only a part of the table. You can find that the selected input columns are
aggregated and written into the agg_result table and the partition column is filled with the value
prospective.
Related scenarios
• Joining table columns and writing them into Hive on page 710
717
tELTHiveOutput
tELTHiveOutput
Works alongside tELTHiveMap to write data into the Hive table.
This component executes the query built by the preceding tELTHiveMap component to write data into
the specified Hive table.
tELTHiveOutput Standard properties

These properties are used to configure tELTHiveOutput running in the Standard Job framework.
The Standard tELTHiveOutput component belongs to the ELT family.
Fabric.
Basic settings
Action on data Select the action to be performed on the data to be written

in the Hive table.
With the Insert option, the data to be written in the Hive
table will be appended to the existing data if there is any.
Schema A schema is a row description. It defines the number

in the Repository.
component only.
Job designs.
Edit schema Click Edit schema to make changes to the schema. If the
available:
only.
Default table name Enter the default name of the output table you want to
write data in.
718
tELTHiveOutput
Default schema name Enter the name of the default database schema to which the
output table to be used is related to.
variable.
field that appears.
If this table is related to a different database schema from
the default one, you also need to enter the name of that
database schema. The syntax is schema_name.table_name.
The target table uses the Parquet format If the table in which you need to write data is a PARQUET
table, select this check box.
Then from the Compression list that appears, select the
compression mode you need to use to handle the PARQUET
file. The default mode is Uncompressed.
Field Partition In Partition Column, enter the name, without any quotation
marks, of the partition column of the Hive table you want to
write data in.
In Partition Value, enter the value you want to use, in single
quotation marks, for its corresponding partition column.

Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
719
tELTHiveOutput
Usage

the table name.
Note:
Dataproc.
Related scenarios
• Joining table columns and writing them into Hive on page 710.
720
tELTInput
tELTInput
Adds as many Input tables as required for the SQL statement to be executed.
The three ELT components are closely related, in terms of their operating conditions. These
components should be used to handle DB schemas to generate Insert statements, including clauses,
which are to be executed in the DB output table defined.
Note that it is highly recommended to use the ELT components for a specific type of database
(if any) instead of the ELT components. For example, for Teradata, it is recommended to use the
tELTTeradataInput, tELTTeradataMap and tELTTeradataOutput components instead.
tELTInput Standard properties

These properties are used to configure tELTInput running in the Standard Job framework.
The Standard tELTInput component belongs to the ELT family.
Basic settings
Schema and Edit schema A schema is a row description, it defines the nature and
number of fields to be processed. The schema is either
built-in or remotely stored in the Repository. The Schema
defined is then passed on to the ELT Mapper to be included
to the Insert SQL statement.
available:
only.

Guide.

Studio User Guide.

721
tELTInput

Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTInput is to be used along with the tELTJDBCMap. Note

that the Output link to be used with these components must
correspond strictly to the syntax of the table name.
Note:
Related scenarios
722
tELTMap
tELTMap
Uses the tables provided as input to feed the parameter in the built SQL statement. The statement
can include inner or outer joins to be implemented between tables or between one table and its
aliases.
tELTMap Standard properties

These properties are used to configure tELTMap running in the Standard Job framework.
The Standard tELTMap component belongs to the ELT family.
Basic settings
connection.
Guide.
ELT Map Editor The ELT Map editor allows you to define the output schema

curves.
723
tELTMap


• Built-In: No property data stored centrally.
• Repository: Select the repository file where the
properties are stored.
JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.
Driver JAR Complete this table to load the driver JARs needed. To do
this, click the [+] button under the table to add as many
rows as needed, each row for a driver JAR, then select
the cell and click the [...] button at the right side of the
cell to open the Module dialog box from which you can
select the driver JAR to be used. For example, the driver jar
RedshiftJDBC41-1.1.13.1013.jar for the Redshift
database.
Class name Enter the class name for the specified driver between
double quotation marks. For example, for the
RedshiftJDBC41-1.1.13.1013.jar driver, the name
to be entered is com.amazon.redshift.jdbc41.D
river.

settings.

Advanced settings

Global Variables

check box.
724
tELTMap

use from it.
User Guide.
Usage
Usage rule tELTMap is used along with tELTInput and tELTOutput.

Note that the Output link to be used with these components
must correspond strictly to the syntax of the table name.
Note:
unusable.
Guide.
Aggregating Snowflake data using context variables as

table and connection names
This scenario shows you an example of aggregating Snowflake data from two source tables STUDENT
and TEACHER to one target table FULLINFO using the ELT components. In this example, set all input
and output table names and connection names to context variables.
725
tELTMap
Creating the Job
Before you begin

• A new Job has been created and the context variables SourceTableS with the value STUDENT,
SourceTableT with the value TEACHER, and TargetTable with the value FULLINFO have
been added to the Job. For more information about how to use context variables, see the related
documentation about using contexts and variables.
• The source table STUDENT with three columns, SID and TID of NUMBER(38,0) type and SNAME
of VARCHAR(50) type, has been created in Snowflake, and the following data has been written
into the table.
#SID;SNAME;TID
11;Alex;22
12;Mark;23
13;Stephane;21
14;Cedric;22
15;Bill;21
16;Jack;23
17;John;22
18;Andrew;23
• The source table TEACHER with three columns, TID of NUMBER(38,0) type and TNAME and
TPHONE of VARCHAR(50) type, has been created in Snowflake, and the following data has been
written into the table.
#TID;TNAME;TPHONE
21;Peter;+86 15812343456
22;Michael;+86 13178964532
23;Candice;+86 13923187456
Procedure
1. Add a tSnowflakeConnection component, a tSnowflakeClose component, two tELTInput
components, a tELTMap component, and a tELTOutput component to your Job.
2. On the Basic setting view of the first tELTInput component, enter the name of the first
source table in the Default Table Name field. In this example, it is the context variable
context.SourceTableS.
726
tELTMap
3. Repeat step 2 to set the value of the default table name for the second tELTInput component
and the tELTOutput component to context.SourceTableT and context.TargetTable
respectively.
4. Link the first tELTInput component to the tELTMap component using the Link > context.Source
TableS (Table) connection.
5. Link the second tELTInput component to the tELTMap component using the Link > context.Source
TableT (Table) connection.
6. Link the tELTMap component to the tELTOutput component using the Link > *New Output*
(Table) connection. The link will be renamed automatically to context.TargetTable
(Table).
7. Link the tSnowflakeConnection component to the tELTMap component using a Trigger > On
8. Link the tELTMap component to the tSnowflakeClose component.
Connecting to Snowflake
Configure the tSnowflakeConnection component to connect to Snowflake.
Procedure
1. Double-click the tSnowflakeConnection component to open its Basic settings view.
2. In the Account field, enter the account name assigned by Snowflake.
3. In the Snowflake Region field, select the region where the Snowflake database locates.
4. In the User Id and the Password fields, enter the authentication information accordingly.
Note that this user ID is your user login name. If you do not know your user login name yet, ask
the administrator of your Snowflake system for details.
5. In the Warehouse field, enter the name of the data warehouse to be used in Snowflake.
6. In the Schema field, enter the name of the database schema to be used.
7. In the Database field, enter the name of the database to be used.
Configuring the input components
Procedure
1. Double-click the first tELTInput component to open its Basic settings view.
727
tELTMap
2. Click the [...] button next to Edit schema and in the schema dialog box displayed, define the
schema by adding three columns, SID and TID of INT type and SNAME of VARCHAR type.
3. Select Mapping Snowflake from the Mapping drop-down list.
4. Repeat the previous steps to configure the second tELTInput component, and define its schema by
adding three columns, TID of INT type and TNAME and TPHONE of VARCHAR type.
Configuring the output component
Procedure
1. Double-click the tELTOutput component to open the Basic settings view.
2. Select Create table from the Action on table drop-down list to create the target table.
3. Select the Table name from connection name is variable check box.
Configuring the map component for aggregating Snowflake data
Procedure
1. Click the tELTMap component to open its Basic settings view.
2. Select the Use an existing connection check box and from the Component List displayed, select
the connection component you have configured to open the Snowflake connection.
4. Click the [...] button next to ELT Map Editor to open its map editor.
5. Add the first input table context.SourceTableS by clicking the [+] button in the upper left
corner of the map editor and then selecting the relevant table name from the drop-down list in
the pop-up dialog box.
6. Do the same to add the second input table context.SourceTableT.
7. Drag the column TID from the first input table context.SourceTableS and drop it onto the
corresponding column TID in the second input table context.SourceTableT.
8. Drag all columns from the input table context.SourceTableS and drop them onto the output
table context.TargetTable in the upper right panel.
9. Do the same to drag two columns TNAME and TPHONE from the input table context.Source
TableT and drop them onto the bottom of the output table. When done, click OK to close the
map editor.
10. Click the Sync columns button on the Basic settings view of the tELTOutput component to set its
schema.
728
tELTMap
Closing the Snowflake connection

Configure the tSnowflakeClose component to close the connection to Snowflake.
Procedure
1. Double-click the tSnowflakeClose component to open the Component tab.
2. From the Connection Component drop-down list, select the component that opens the connection
you need to close, tSnowflakeConnection_1 in this example.
Executing the Job
Procedure
As shown above, Talend Studio executes the Job successfully and inserts eight rows into the
target table.
You can then create and run another Job to retrieve data from the target table by using the
tSnowflakeInput component and the tLogRow component. You will find that the aggregated data
are displayed on the console as shown in below screenshot.
For more information about how to retrieve data from Snowflake, see Writing data into and
reading data from a Snowflake table on page 3407.
Related scenarios
• Aggregating table columns and filtering on page 745.
• Mapping date using using an Alias table on page 749.
729
tELTOutput
tELTOutput
defined in the ELT Mapper.
tELTOutput Standard properties

These properties are used to configure tELTOutput running in the Standard Job framework.
The Standard tELTOutput component belongs to the ELT family.
Basic settings

created again.
does not exist.
• Clear table: The table content is deleted. You have the
• Truncate table: The table content is deleted. You do
not have the possibility to rollback the operation.
• Insert: Adds new entries to the table. If duplicates are
found, Job stops.
• Update: Updates entries in the table.
• Delete: Deletes the entries which correspond to the
entry flow.
in the Repository.
available:
only.
730
tELTOutput


Guide.

Studio User Guide.
marks.
Default Schema Name Enter the default schema name, between double quotation
marks.
variable.

Advanced settings
Use update statement without subqueries Select this option to generate an UPDATE statement for the
database.
This option is available when Update is selected from the
Action on data drop-down list in the Basic settings view.
Clause SET Select the column names that will be used to generate the
SET clauses.
SET clauses will not be generated for the columns that are
not selected.
This field appears when Update is selected from the Action
on data drop-down list in the Basic settings view.
731
tELTOutput
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTOutput is to be used along with the tELTMap. Note that
the Output link to be used with these components must
Limitation Avoid using any keyword for the database as the table/
column name or using any special character in the table/
column name. If you want to, you can enclose the table/
column name in a pair of \" to see whether it works. For
example, when you want to use the keyword number as an
Oracle database column name, you can have the Db Column
value in the schema editor set to \"number\". But note
that this solution does not always work.
Related scenarios
732
tELTMSSqlInput
tELTMSSqlInput
The three ELT MSSql components are closely related, in terms of their operating conditions. These
components should be used to handle MSSql DB schemas to generate Insert statements, including
tELTMSSqlInput Standard properties

These properties are used to configure tELTMSSqlInput running in the Standard Job framework.
The Standard tELTMSSqlInput component belongs to the ELT family.
Basic settings
available:
only.

Guide.

Studio User Guide.
733
tELTMSSqlInput
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTMySSqlInput is to be used along with the

tELTMSSsqlMap. Note that the Output link to be used with
the table name.
Note:
Related scenarios
734
tELTMSSqlMap
tELTMSSqlMap
tELTMSSqlMap Standard properties

These properties are used to configure tELTMSSqlMap running in the Standard Job framework.
The Standard tELTMSSqlMap component belongs to the ELT family.
Basic settings
connection.
Guide.
ELT MSSql Map Editor The ELT Map editor allows you to define the output schema

curves.
735
tELTMSSqlMap

data.

settings.
Advanced settings

Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTMSSqlMap is used along with a tELTMSSqlInput and

tELTMSSqlOutput. Note that the Output link to be used with
the table name.
736
tELTMSSqlMap
Note:
unusable.
Guide.
Related scenarios
737
tELTMSSqlOutput
tELTMSSqlOutput
Executes the SQL Insert, Update and Delete statement to the MSSql database
defined the ELT Mapper.
tELTMSSqlOutput Standard properties

These properties are used to configure tELTMSSqlOutput running in the Standard Job framework.
The Standard tELTMSSqlOutput component belongs to the ELT family.
Basic settings
Insert: Adds new entries to the table. If duplicates are found,
Job stops.
flow.
available:
only.

Guide.

Studio User Guide.
738
tELTMSSqlOutput
marks.
marks.
variable.

Advanced settings
MSSql database.
Global Variables

check box.
use from it.
User Guide.
739
tELTMSSqlOutput
Usage
Usage rule tELTMSSqlOutput is to be used along with the

tELTMSSqlMap. Note that the Output link to be used with
the table name.
Note:

Related scenarios
740
tELTMysqlInput
tELTMysqlInput
The three ELT Mysql components are closely related, in terms of their operating conditions. These
components should be used to handle Mysql DB schemas to generate Insert statements, including
tELTMysqlInput provides the table schema to be used for the SQL statement to execute.
tELTMysqlInput Standard properties

These properties are used to configure tELTMysqlInput running in the Standard Job framework.
The Standard tELTMysqlInput component belongs to the ELT family.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the nature and
available:
only.

Guide.

Studio User Guide.
marks.
Advanced settings
level.
741
tELTMysqlInput
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTMysqlInput is to be used along with the tELTMysqlMap.

Note:
Related scenarios
742
tELTMysqlMap
tELTMysqlMap
tELTMysqlMap helps to graphically build the SQL statement using the table provided as input.
tELTMysqlMap Standard properties

These properties are used to configure tELTMysqlMap running in the Standard Job framework.
The Standard tELTMysqlMap component belongs to the ELT family.
Basic settings
connection.
Guide.
ELT Mysql Map editor The ELT Map editor allows you to define the output schema
as well as build graphically the SQL statement to be

curves.
743
tELTMysqlMap

data.

settings.
Advanced settings
Additional JDBC Specify additional JDBC parameters for the database connection created.
Parameters
This property is not available when the Use an existing connection check box in the Basic settings
view is selected.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTMysqlMap is used along with a tELTMysqlInput and

tELTMysqlOutput. Note that the Output link to be used with
the table name.
744
tELTMysqlMap
Note:
unusable.
Guide.
Aggregating table columns and filtering

This scenario describes a Job that gathers together several input DB table schemas and implementing
a clause to filter the output using an SQL statement.
Building a Job
Procedure
1. Add the following components from the Palette onto the design workspace. Label these
components to best describe their functionality.
• three tELTMysqlInput components
• a tELTMysqlMap
• a tELTMysqlOutput
2. Double-click the first tELTMysqlInput component to display its Basic settings view.
745
tELTMysqlMap
3. Select Repository from the Schema list, click the three dot button preceding Edit schema, and
select your DB connection and the desired schema from the Repository Content dialog box.
The selected schema name appears in the Default Table Name field automatically.
In this use case, the DB connection is Talend_MySQL and the schema for the first input component
is owners.
4. Set the second and third tELTMysqlInput components in the same way but select cars and resellers
respectively as their schema names.
Note: In this use case, all the involved schemas are stored in the Metadata node of the
Repository tree view for easy retrieval. For further information concerning metadata, see
You can also select the three input components by dropping the relevant schemas from
the Metadata area onto the design workspace and double-clicking tELTMysqlInput from
the Components dialog box. Doing so allows you to skip the steps of labeling the input
components and defining their schemas manually.
5. Connect the three tELTMysqlInput components to the tELTMysqlMap component using links
named following strictly the actual DB table names: owners, cars and resellers.
6. Connect the tELTMysqlMap component to the tELTMysqlOutput component and name the link
agg_result, which is the name of the database table you will save the aggregation result to.
7. Click the tELTMysqlMap component to display its Basic settings view.
8. Select Repository from the Property Type list, and select the same DB connection that you use for
the input components.
All the database details are automatically retrieved.
Tip: Leave all the other settings as they are.
9. Double-click the tELTMysqlMap component to launch the ELT Map editor to set up joins between
the input tables and define the output flow.
746
tELTMysqlMap
10. Add the input tables by clicking the green plus button at the upper left corner of the ELT Map
editor and selecting the relevant table names in the Add a new alias dialog box.
11. Drop the ID_Owner column from the owners table to the corresponding column of the cars table.
12. In the cars table, select the Explicit join check box in front of the ID_Owner column.
As the default join type, INNER JOIN is displayed on the Join list.
13. Drop the ID_Reseller column from the cars table to the corresponding column of the resellers
table to set up the second join, and define the join as an inner join in the same way.
14. Select the columns to be aggregated into the output table, agg_result.
15. Drop the ID_Owner, Name, and ID_Insurance columns from the owners table to the output table.
16. Drop the Registration, Make, and Color columns from the cars table to the output table.
17. Drop the Name_Reseller and City columns from the resellers table to the output table.
With the relevant columns selected, the mappings are displayed in yellow and the joins are
displayed in dark violet.
18. Set up a filter in the output table. Click the Add filter row button on top of the output table to
display the Additional clauses expression field, drop the City column from the resellers table to the
expression field, and complete a WHERE clause that reads resellers.City ='Augusta'.
747
tELTMysqlMap
19. Click the Generated SQL Select query tab to display the corresponding SQL statement.
20. Click OK to save the ELT Map settings.

21. Double-click the tELTMysqlOutput component to display its Basic settings view.
22. Select an action from the Action on data list as needed.

23. Select Repository as the schema type, and define the output schema in the same way as you
defined the input schemas. In this use case, select agg_result as the output schema, which is the
name of the database table used to store the mapping result.
748
tELTMysqlMap
Note: You can also use a built-in output schema and retrieve the schema structure from the
preceding component; however, make sure that you specify an existing target table having the
same data structure in your database.
Tip: Leave all the other settings as they are.
Running the Job
Procedure
1. Save your Job.
2. Press F6 to launch it.
All selected data is inserted in the agg_result table as specified in the SQL statement.
Mapping date using using an Alias table

This scenario describes a Job that maps information from two input tables and an alias table, serving
as a virtual input table, to an output table. The employees table contains employees' IDs, their
department numbers, their names, and the IDs of their respective managers. The managers are also
considered as employees and hence included in the employees table. The dept table contains the
department information. The alias table retrieves the names of the managers from the employees
table.
749
tELTMysqlMap
Building a Job
Procedure
1. Drop two tELTMysqlInput components, a tELTMysqlMap component, and a tELTMysqlOutput
component to the design workspace, and label them to best describe their functionality.
2. Double-click the first tELTMysqlInput component to display its Basic settings view.
3. Select Repository from the Schema list, and define the DB connection and schema by clicking the
three dot button preceding Edit schema.
The DB connection is Talend_MySQL and the schema for the first input component is employees.
Note:
In this use case, all the involved schemas are stored in the Metadata node of the Repository
tree view for easy retrieval. For further information concerning metadata, see Talend Studio
User Guide.
4. Set the second tELTMysqlInput component in the same way but select dept as its schema.
5. Double-click the tELTMysqlOutput component to display its Basic settings view.
6. Select an action from the Action on data list as needed, Insert in this use case.
7. Select Repository as the schema type, and define the output schema in the same way as you
defined the input schemas. In this use case, select result as the output schema, which is the name
of the database table used to store the mapping result.
The output schema contains all the columns of the input schemas plus a ManagerName column.
Note: Leave all the other parameters as they are.
750
tELTMysqlMap
Connecting the components
Procedure
1. Connect the two tELTMysqlInput components to the tELTMysqlMap component using Link
connections named strictly after the actual input table names, employees and dept in this use case.
2. Connect the tELTMysqlMap component to the tELTMysqlOutput component using a Link
connection. When prompted, click Yes to allow the ELT Mapper to retrieve the output table
structure from the output schema.
3. Click the tELTMysqlMap component and select the Component tab to display its Basic settings
view.
4. Select Repository from the Property Type list, and select the same DB connection that you use for
the input components.
All the DB connection details are automatically retrieved.
Note: Leave all the other parameters as they are.
Configuring the Job
Procedure
1. Click the three-dot button next to ELT Mysql Map Editor or double-click the tELTMysqlMap
component on the design workspace to launch the ELT Map editor.
With the tELTMysqlMap component connected to the output component, the output table is
displayed in the output area.
2. Add the input tables, employees and dept, in the input area by clicking the green plus button and
selecting the relevant table names in the Add a new alias dialog box.
3. Create an alias table based on the employees table by selecting employees from the Select the
table to use list and typing in Managers in the Type in a valid alias field in the Add a new alias
dialog box.
751
tELTMysqlMap
4. Drop the DeptNo column from the employees table to the dept table.
5. Select the Explicit join check box in front of the DeptNo column of the dept table to set up an
inner join.
6. Drop the ManagerID column from the employees table to the ID column of the Managers table.
7. Select the Explicit join check box in front of the ID column of the Managers table and select LEFT
OUTER JOIN from the Join list to allow the output rows to contain Null values.
8. Drop all the columns from the employees table to the corresponding columns of the output table.
9. Drop the DeptName and Location columns from the dept table to the corresponding columns of
the output table.
10. Drop the Name column from the Managers table to the ManagerName column of the output table.
752
tELTMysqlMap
11. Click on the Generated SQL Select query tab to display the SQL query statement to be executed.
Running the Job
Procedure
1. Save your Job.
2. Press F6 to run it.
The output database table result contains all the information about the employees, including the
names of their respective managers.
Related scenarios
753
tELTMysqlOutput
tELTMysqlOutput
tELTMysqlOutput executes the SQL Insert, Update and Delete statement to the Mysql database
tELTMysqlOutput carries out the action on the table specified and inserts the data according to the
output schema defined the ELT Mapper.
tELTMysqlOutput Standard properties

These properties are used to configure tELTMysqlOutput running in the Standard Job framework.
The Standard tELTMysqlOutput component belongs to the ELT family.
Basic settings
Job stops.
flow.
in the Repository.
available:
only.

Guide.

Studio User Guide.
754
tELTMysqlOutput
Default Table Name Enter the default table name, between inverted commas.
Note that the table must exist already. If it does not exist,
you can use tCreateTable to create one first. For more
information about tCreateTable, see tCreateTable on page
540.
variable.

Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTMysqlOutput is to be used along with the

tELTMysqlMap. Note that the Output link to be used with
the table name.
755
tELTMysqlOutput
Note:
Related scenarios
756
tELTNetezzaInput
tELTNetezzaInput
Allows you to add as many Input tables as required for the most complicated Insert statement.
The three ELT Netezza components are closely related, in terms of their operating conditions. These
components should be used to handle Netezza database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
tELTNetezzaInput Standard properties

These properties are used to configure tELTNetezzaInput running in the Standard Job framework.
The Standard tELTNetezzaInput component belongs to the ELT family.
Basic settings
component.
available:
only.

component only. Related topic: see Talend Studio User Guide
User Guide.

Studio User Guide.
Advanced settings
757
tELTNetezzaInput
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTNetezzaInput is to be used along with the

tELTNetezzaMap. Note that the Output link to be used with
the table name
Note:
Related scenarios
758
tELTNetezzaMap
tELTNetezzaMap
Uses the tables provided as input, to feed the parameter in the built statement. The statement can
tELTNetezzaMap Standard properties

These properties are used to configure tELTNetezzaMap running in the Standard Job framework.
The Standard tELTNetezzaMap component belongs to the ELT family.
Basic settings
connection.
Guide.
ELT Netezza Map Editor The ELT Map editor allows you to define the output schema

curves.
Line (fastest): Links between the schema and the Web
service parameters are in the form of straight lines.
759
tELTNetezzaMap

stored. The following fields are filled in using fetched data.

settings.
Advanced settings

Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTNetezzaMap is used along with tELTNetezzaInput and

tELTNetezzaOutput. Note that the Output link to be used
760
tELTNetezzaMap
Note:
unusable.
Guide.
Related scenarios
761
tELTNetezzaOutput
tELTNetezzaOutput
Performs the action (insert, update or delete) on data in the specified Netezza table through the SQL
statement generated by the tELTNetezzaMap component.
tELTNetezzaOutput Standard properties

These properties are used to configure tELTNetezzaOutput running in the Standard Job framework.
The Standard tELTNetezzaOutput component belongs to the ELT family.
Basic settings
Insert: Adds new entries to the table.
flow.
component.
available:
only.

Guide.

Studio User Guide.
762
tELTNetezzaOutput
marks.
variable.
field that appears.

Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTNetezzaOutput is to be used along with the

tELTNetezzaMap. Note that the Output link to be used with
the table name.
763
tELTNetezzaOutput
Note:
Related scenarios
764
tELTOracleInput
tELTOracleInput
Provides the Oracle table schema that will be used by the tELTOracleMap component to generate the
SQL SELECT statement.
The three ELT Oracle components are closely related, in terms of their operating conditions. These
components should be used to handle Oracle database table schemas to generate SQL statements,
tELTOracleInput Standard properties

These properties are used to configure tELTOracleInput running in the Standard Job framework.
The Standard tELTOracleInput component belongs to the ELT family.
Basic settings
available:
only.

Guide.

Studio User Guide.
marks.
marks.
765
tELTOracleInput
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTOracleInput is to be used along with the

tELTOracleMap. Note that the Output link to be used with
these components must must correspond strictly to the
Note:
Related scenarios
• Updating Oracle database entries on page 769
766
tELTOracleMap
tELTOracleMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTOracleInput
components.
tELTOracleMap Standard properties

These properties are used to configure tELTOracleMap running in the Standard Job framework.
The Standard tELTOracleMap component belongs to the ELT family.
Basic settings
connection.
Guide.
ELT Oracle Map Editor The ELT Map editor allows you to define the output schema
Style link Auto: By default, the links between the input and output
curves.
767
tELTOracleMap

data.
Connection type Drop-down list of the available drivers.
DB Version Select the Oracle version you are using.

settings.
Mapping Automatically set mapping parameter.
Advanced settings

Use Hint Options Select this check box to activate the hint configuration
area to help you optimize a query's execution. In this area,
parameters are:
- HINT: specify the hint you need, using the syntax /*+
*/. - POSITION: specify where you put the hint in a SQL
statement.
Global Variables

check box.
use from it.
768
tELTOracleMap

User Guide.
Usage
Usage rule tELTOracleMap is used along with a tELTOracleInput and

tELTOracleOutput. Note that the Output link to be used with
the table name.
Note:
unusable.
Guide.
Updating Oracle database entries

This scenario is based on the data aggregation scenario, Aggregating table columns and filtering on
page 745. As the data update action is available in Oracle database, this scenario describes a Job that
updates particular data in the Agg_Result table.
769
tELTOracleMap
Adding components
As described in Aggregating table columns and filtering on page 745, configure a Job for data
aggregation using the corresponding ELT components for Oracle database - tELTOracleInput,
tELTOracleMap, and tELTOracleOutput. Execute the Job to save the aggregation result in a database
table named Agg_Result.
Note:
When defining filters in the ELT Map editor, note that strings are case sensitive in Oracle database.
Procedure
1. Launch the ELT Map editor and add a new output table named update_data.
2. Add a filter row to the update_data table to set up a relationship between input and output tables:
owners.ID_OWNER = agg_result.ID_OWNER.
3. Drop the MAKE column from the cars table to the update_data table.
4. Drop the NAME_RESELLER column from the resellers table to the update_data table.
5. Add a model enclosed in single quotation marks, 'A8' in this use case, to the MAKE column from
the cars table, preceded by a double pipe.
6. Add Sold by enclosed in single quotation marks in front of the NAME_RESELLER column from
the resellers table, with a double pipe in between.
770
tELTOracleMap
7. Check the Generated SQL select query tab to be executed.
8. Click OK to validate the changes in the ELT Mapper.

9. Deactivate the tELTOracleOutput component labeled Agg_Result by right-clicking it and selecting
Deactivate Agg_Result from the contextual menu.
10. Drop a new tELTOracleOutput component from the Palette to the design workspace, and label it
Update_Data to better identify its functionality.
11. Connect the tELTOracleMap component to the new tELTOracleOutput component using the link
corresponding to the new output table defined in the ELT Mapper, update_data in this use
case.
12. Double-click the new tELTOracleOutput component to display its Basic settings view.
13. From the Action on data list, select Update.

14. Check the schema, and click Sync columns to retrieve the schema structure from the preceding
component if necessary.
15. In the WHERE clauses area, add a clause that reads agg_result.MAKE = 'Audi' to update
data relating to the make of Audi in the database table agg_result.
16. Fill the Default Table Name field with the name of the output link, update_data in this use
case.
17. Select the Use different table name check box, and fill the Table name field with the name of the
database table to be updated, agg_result in this use case. Leave the other parameters as they
are.
Running the Job
Procedure
1. Save your Job.
771
tELTOracleMap
2. Click Run to execute the Job.

The relevant data in the database table is updated as defined:
Related scenario
• Updating Oracle database entries on page 769
772
tELTOracleOutput
tELTOracleOutput
Performs the action (insert, update, delete, or merge) on data in the specified Oracle table through the
SQL statement generated by the tELTOracleMap component.
tELTOracleOutput Standard properties

These properties are used to configure tELTOracleOutput running in the Standard Job framework.
The Standard tELTOracleOutput component belongs to the ELT family.
Basic Settings
the Job stops.
flow.
MERGE: Updates and/or adds data to the table. Note
that the options available for the MERGE operation are
different to those available for the Insert, Update or Delete
operations.
Note:
Following global variables are available:
• NB_LINE_INSERTED: Number of lines inserted
during the Insert operation.
• NB_LINE_UPDATED: Number of lines updated during
the Update operation.
• NB_LINE_DELETED: Number of lines deleted during
the Delete operation.
• NB_LINE_MERGED: Number of lines inserted and/or
updated during the MERGE operation.
in the Repository.
available:
only.
773
tELTOracleOutput


Guide.

Studio User Guide.
Use Merge Update (for MERGE) Select this check box to update the data in the output table.
Column : Lists the columns in the entry flow.
Update : Select the check box which corresponds to the
name of the column you want to update.
Use Merge Update Where Clause : Select this check box and
enter the WHERE clause required to filter the data to be
updated, if necessary.
Use Merge Update Delete Clause: Select this check box and
deleted and updated, if necessary.
Use Merge Insert (for MERGE) Select this check box to insert the data in the table.
Column: Lists the entry flow columns.
Check All: Select the check box corresponding to the name
of the column you want to insert.
Use Merge Update Where Clause: Select this check box and
inserted.
Default Table Name Enter a default name for the table, between double
quotation marks.
Default Schema Name Enter a name for the default Oracle schema, between
double quotation marks.
variable.

774
tELTOracleOutput
Advanced settings
Use Hint Options Select this check box to activate the hint configuration
area when you want to use a hint to optimize a query's
execution. In this area, parameters are:
- HINT: specify the hint you need, using the syntax /*+ */.
statement.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTOracleOutput is to be used along with the

tELTOracleInput and tELTOracleMap components. Note that
the Output link to be used with these components must
Note:
Managing data using the Oracle MERGE function

The sample Job described in this scenario allows you to add new customer information and update
existing customer information in a database table using the Oracle MERGE command.
775
tELTOracleOutput

Procedure
1. Add the following components from the Palette to the design workspace: tELTOracleInput,
tELTOracleMap, and tELTOracleOutput.
2. Label tELTOracleInput as new_customer, tELTOracleMap as ELT Mapper, and tELTOracleOutp
ut as merge_data.
3. Link tELTOracleInput to tELTOracleMap using a Row > New Output (table) connection.
4. When prompted, enter NEW_CUSTOMER as the table name, which should be the actual database
table name.
5. Link tELTOracleMap to tELTOracleOutput using a Row > New Output (table) connection.
6. When prompted, enter customers_merge as the name of the database table, which holds the
merge results.

Procedure
1. Double-click the tELTOracleInput component to display its Basic settings view.
2. Select Repository from the Schema list and click the [...] button preceding Edit schema.
3. Select your database connection and the desired schema from the Repository Content dialog box.
The selected schema name appears in the Default Table Name field automatically.
• In this use case, the database connection is Talend_Oracle and the schema is
new_customers.
• In this use case, the input schema is stored in the Metadata node of the Repository tree view
for easy retrieval. For further information concerning metadata, see Talend Studio User Guide.
776
tELTOracleOutput
• You can also select the input component by dropping the relevant schema from the Metadata
area onto the design workspace and double-clicking tELTOracleInput from the Components
dialog box. Doing so allows you to skip the steps of labeling the input component and
defining its schema manually.
4. Click the tELTOracleMap component to display its Basic settings view.
5. Select Repository from the Property Type list, and select the same database connection that you
use for the input components.
Remember: All the database details are automatically retrieved. Leave the other settings as
they are.
6. Double-click the tELTOracleMap component to launch the ELT Map editor for setingup the data
transformation flow.
Display the input table by clicking the green plus button at the upper left corner of the ELT Map
editor and selecting the relevant table name in the Add a new alias dialog box.
In this use case, the only input table is new_customers.
777
tELTOracleOutput
7. Select all the columns in the input table and drop them to the output table.
8. Click the Generated SQL Select query tab to display the query statement to be executed.
Click OK to validate the ELT Map settings and close the ELT Map editor.
9. Double-click the tELTOracleOutput component to display its Basic settings view.
a) From the Action on data list, select MERGE.
b) Click the Sync columns button to retrieve the schema from the preceding component.
c) Select the Use Merge Update check box to update the data using Oracle's MERGE function.
10. In the table that appears, select the check boxes for the columns you want to update.
In this use case, youupdate all the data according to the customer ID. Therefore, select all the
check boxes except the one for the ID column.
Warning: The columns defined as the primary key cannot and must not be made subject to
updates.
11. Select the Use Merge Insert check box to insert new data while updating the existing data by
leveraging the OracleMERGE function.
12. In the table that appears, select the check boxes for the columns into which you want to insert
new data.
778
tELTOracleOutput
In this use case, insert all the new customer data. Therefore, select all the check boxes by clicking
the Check All check box.
13. Fill the Default Table Name field with the name of the target table already existing in your
database. In this example, fill in customers_merge.
Executing the Job

Procedure
1. Save the Job.
2. Click Run to execute the Job.
The data is updated and inserted in the database. The query used is displayed on the console.
779
tELTPostgresqlInput
tELTPostgresqlInput
Provides the Postgresql table schema that will be used by the tELTPostgresqlMap component to
generate the SQL SELECT statement.
The three ELT Postgresql components are closely related, in terms of their operating conditions.
These components should be used to handle Postgresql database table schemas to generate SQL
statements, including clauses, which are to be executed in the database output table defined.
tELTPostgresqlInput Standard properties

These properties are used to configure tELTPostgresqlInput running in the Standard Job framework.
The Standard tELTPostgresqlInput component belongs to the ELT family.
Basic settings
available:
only.

Guide.

Studio User Guide.
marks.
Default Schema Name Enter the default schema name, between double quotation
marks.
780
tELTPostgresqlInput
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTPostgresqlInput is to be used along with the

tELTPostgresqlMap. Note that the Output link to be used
Note:
Related scenarios
781
tELTPostgresqlMap
tELTPostgresqlMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTPostgresql
Input components.
tELTPostgresqlMap Standard properties

These properties are used to configure tELTPostgresqlMap running in the Standard Job framework.
The Standard tELTPostgresqlMap component belongs to the ELT family.
Basic settings
connection.
Guide.
ELT Postgresql Map Editor The ELT Map editor allows you to define the output schema

curves.
782
tELTPostgresqlMap

data.

settings.
Advanced settings

Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTPostgresqlMap is used along with a tELTPostgresql

Input and tELTPostgresqlOutput. Note that the Output link
to be used with these components must correspond strictly
to the syntax of the table name.
783
tELTPostgresqlMap
Note:
unusable.
Guide.
Related scenarios
784
tELTPostgresqlOutput
Performs the action (insert, update or delete) on data in the specified Postgresql table through the
SQL statement generated by the tELTPostgresqlMap component.
tELTPostgresqlOutput Standard properties

These properties are used to configure tELTPostgresqlOutput running in the Standard Job framework.
The Standard tELTPostgresqlOutput component belongs to the ELT family.
Basic settings
Job stops.
flow.
Schema and Edit schema A schema is a row description, that is to say, it defines the
number of fields to be processed and passed on to the next
in the Repository.
available:
only.

Guide.

Studio User Guide.
785
Default Table Name Enter the default table name between double quotation
marks.
Default Schema Name Enter the default schema name between double quotation
marks
variable.
Use different table name Select this check box to enter a different output table name,
between double quotation marks, in the Table name field
which appears.

Advanced settings
database.
Global Variables

check box.
use from it.
User Guide.
786
Usage
Usage rule tELTPostgresqlOutput is to be used along with the

tELTPostgresqlMap. Note that the Output link to be used
Note:
Related scenarios
787
tELTSybaseInput
tELTSybaseInput
Provides the Sybase table schema that will be used by the tELTSybaseMap component to generate the
The three ELT Sybase components are closely related, in terms of their operating conditions. These
components should be used to handle Sybase database table schemas to generate SQL statements,
tELTSybaseInput Standard properties

These properties are used to configure tELTSybaseInput running in the Standard Job framework.
The Standard tELTSybaseInput component belongs to the ELT family.
Basic settings
Schema and Edit schema A schema is a row description, it defines the number and
nature of the fields to be processed. The schema is either
built-in (local) or stored remotely in the Repository. The
Schema defined is then passed on to the ELT Mapper for
inclusion in the Insert SQL statement.
available:
only.

Guide.

Repository. Hence, it can be re-used for other projects and
Jobs. Related topic: see Talend Studio User Guide.
quotation marks.
Default Schema Name Enter a default name for the Sybase schema, between
788
tELTSybaseInput
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTSybaseInput is intended for use with tELTSybaseMap.

Note:
ELT components only handle schema information. They
do not handle actual data flow..
Limitation This component requires installation of its related jar files.
Related scenarios
789
tELTSybaseMap
tELTSybaseMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTSybaseInpu
t components.
tELTSybaseMap Standard properties

These properties are used to configure tELTSybaseMap running in the Standard Job framework.
The Standard tELTSybaseMap component belongs to the ELT family.
Basic settings
connection.
Guide.
ELT Sybase Map Editor The ELT Map editor allows you to define the output schema

curves.
790
tELTSybaseMap
Property type Can be either Built-in or Repository.
Built-in : No property data is stored centrally.
Repository : Select the Repository file where the component

properties are stored. The following fields are pre-filled
using collected data.
DB Version Select the version of the Sybase database to be used from

the drop-down list.

settings.
Advanced settings
at a Job level as well as at component level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTSybaseMap is intended for use with tELTSybaseInpu

t and tELTSybaseOutput. Note that the Output link to be
used with these components must correspond strictly to the
Note:
The ELT components only handle schema information.
They do not handle actual data flow.
791
tELTSybaseMap
unusable.
Guide.
Related scenarios
792
tELTSybaseOutput
tELTSybaseOutput
Performs the action (insert, update or delete) on data in the specified Sybase table through the SQL
statement generated by the tELTSybaseMap component.
tELTSybaseOutput Standard properties

These properties are used to configure tELTSybaseOutput running in the Standard Job framework.
The Standard tELTSybaseOutput component belongs to the ELT family.
Basic settings
the Job stops.
flow.
number and nature of the fields to be processed and passed
on to the next component. The schema is either Built-in
(local) or stored remotely in the Repository . The Schema
defined is then passed on to the ELT Mapper for inclusion in
the Insert SQL statement.
available:
only.

Guide.

Repository. Hence, it can be re-used for other projects and
Jobs. Related topic: see Talend Studio User Guide.
793
tELTSybaseOutput
quotation marks.
540.
Default Schema Name Enter a default name for the Sybase schema, between
variable.
which appears.

Advanced settings
Global Variables

check box.
use from it.
User Guide.
794
tELTSybaseOutput
Usage
Usage rule tELTSybaseOutput is intended for use with the

tELTMysqlInput and tELTSybaseMap components. Note
correspond strictly to the syntax of the table name..
Note:
ELT components only handle schema information. They
do not handle actual data flow.
Related scenarios
795
tELTTeradataInput
tELTTeradataInput
Provides the Teradata table schema that will be used by the tELTTeradataMap component to generate
the SQL SELECT statement.
The three ELT Teradata components are closely related, in terms of their operating conditions. These
components should be used to handle Teradata database table schemas to generate SQL statements,
tELTTeradataInput Standard properties

These properties are used to configure tELTTeradataInput running in the Standard Job framework.
The Standard tELTTeradataInput component belongs to the ELT family.
Basic settings
nature and number of fields to be processed. The schema
is either built-in or remotely stored in the Repository. The
Schema defined is then passed on to the ELT Mapper to be
included to the Insert SQL statement.
available:
only.

Guide.

Studio User Guide.
quotation marks.
Advanced settings
796
tELTTeradataInput
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTTeradataInput is to be used along with the

tELTTeradataMap. Note that the Output link to be used with
the table name
Note:
Related scenarios
797
tELTTeradataMap
tELTTeradataMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTTeradataIn
put components.
tELTTeradataMap Standard properties

These properties are used to configure tELTTeradataMap running in the Standard Job framework.
The Standard tELTTeradataMap component belongs to the ELT family.
Basic settings
connection.
Guide.
ELT Teradata Map editor The ELT Map editor allows you to define the output schema

curves.
798
tELTTeradataMap

data.

settings.
Advanced settings
Query band Select this check box to use the Teradata Query Banding
feature to add metadata to the query to be processed,
such as the user running the query. This can help you, for
example, identify the origin of this query.
Once selecting the check box, the Query Band parameters
table is displayed, in which you need to enter the metadata
information to be added. This information takes the form of
key/value pairs, for example, DpID in the Key column and
Finance in the Value column.
This check box actually generates the SET QUERY_BAND
FOR SESSION statement with the key/value pairs declared
in the Query Band parameters table. For further information
about this statement, see https://docs.teradata.com/search/
all?query=End+logging+syntax.
This check box is not available when you have selected
the Using an existing connection check box. In this
situation, if you need to use the Query Band feature, set it
in the Advanced settings tab of the Teradata connection
component to be used.
level.
Global Variables

check box.
use from it.
799
tELTTeradataMap

User Guide.
Usage
Usage rule tELTTeradataMap is used along with a tELTTeradataInput

and tELTTeradataOutput. Note that the Output link to be
used with these components must faithfully reflect the
name of the tables.
Note:
unusable.
Guide.
Mapping data using a subquery

The sample Job described in this scenario maps the data from two input tables, PreferredSubject and
CourseScore, to the output table, TotalScoreOfPreferredSubject, using a subquery.
Prerequisite
Ensure that you have added an Oracle database connection in the Metadata > Db Connections section
prior to creating the Job. For more information, see the Centralizing database metadata section of the
Talend Data Integration Studio User Guide.
The Standard Job and the Prejob design

In this scenario, design the Standard Job such as the following:
800
tELTTeradataMap
Design the Prejob that includes the data in this scenario as follows:
The PreferredSubject table contains the student's preferred subject data. To reproduce this scenario,
you can load the following data to the Oracle table from a CSV file:
SeqID;StuName;Subject;Detail
1;Amanda;art;Amanda prefers art.
2;Ford;science;Ford prefers science.
3;Kate;art;Kate prefers art.
801
tELTTeradataMap
The CourseScore table contains the student's subject score data. To reproduce this scenario, you can
load the following data to the Oracle table from a CSV file:
SeqID;StuName;Subject;Course;Score;Detail
1;Amanda;science;math;85;science score
2;Amanda;science;physics;75;science score
3;Amanda;science;chemistry;80;science score
4;Amanda;art;chinese;85;art score
5;Amanda;art;history;95;art score
6;Amanda;art;geography;80;art score
7;Ford;science;math;95;science score
8;Ford;science;physics;85;science score
9;Ford;science;chemistry;80;science score
10;Ford;art;chinese;75;art score
11;Ford;art;history;80;art score
12;Ford;art;geography;85;art score
13;Kate;science;math;65;science score
14;Kate;science;physics;75;science score
15;Kate;science;chemistry;80;science score
16;Kate;art;chinese;85;art score
17;Kate;art;history;80;art score
18;Kate;art;geography;95;art score
Before the Job execution, the output table TotalScoreOfPreferredSubject does not contain any data:
SeqID;StuName;PreferredSubject;TotalScore
Creating the Prejob

Create the Prejob that contains the data that you wish to load to the Oracle table.
See the Prejob design image in The Standard Job and the Prejob design section.
Procedure
1. Create a Standard Job.
2. Add the following components:
• Prejob
• two tFixedFlowInput components
• two tOracleOutput components
• two tOracleInput components
• one tCreateTable component
• two tLogRow components
3. Configure the first tFixedFlowInput component:
a) Select the tFixedFlowInput component to display the Basic settings view.
b) Select Use Inline Content(delimited file) from the Mode options.
c) Add the following data to the Content field:
1;Amanda;art;Amanda prefers art.

2;Ford;science;Ford prefers science.
3;Kate;art;Kate prefers art.
d) Click ... next to the Edit Schema field to open the Schema Editor.
e) Add four columns with the following names and corresponding parameters:
802
tELTTeradataMap
4. Configure the second tFixedFlowInput component:

a) Repeat steps 3a and 3b.
b) Add the following data to the Content field:
1;Amanda;science;math;85;science score
2;Amanda;science;physics;75;science score
3;Amanda;science;chemistry;80;science score
4;Amanda;art;chinese;85;art score
5;Amanda;art;history;95;art score
6;Amanda;art;geography;80;art score
7;Ford;science;math;95;science score
8;Ford;science;physics;85;science score
9;Ford;science;chemistry;80;science score
10;Ford;art;chinese;75;art score
11;Ford;art;history;80;art score
12;Ford;art;geography;85;art score
13;Kate;science;math;65;science score
14;Kate;science;physics;75;science score
15;Kate;science;chemistry;80;science score
16;Kate;art;chinese;85;art score
17;Kate;art;history;80;art score
18;Kate;art;geography;95;art score
c) Click ... next to the Edit Schema field to open the Schema Editor.
d) Add six columns with the following names and corresponding parameters:
5. Select the first tOracleOutput component to open the Basic settings view.
a) Select Repository from the Property Type drop-down list.
b) Specify the Oracle database connection the you have previously added by clicking .... This
automatically populates the database information in the fields provided.
Repeat step 6 and steps 6a-6b to configure the second tOracleOutput component.
6. Select the tCreateTable component to open the Basic settings view.
a) Select Oracle from the Database Type drop-down list.
803
tELTTeradataMap
b) Select Repository from the Property Type drop-down list.

c) Specify the Oracle database connection that you have previously added by clicking .... This
automatically populates the database information in the fields provided.
d) Enter TotalScoreOfPreferredSubject in the Table Name field.
e) Select Drop table if exists and create from the Table Action drop-down list.
f) Click ... next to the Edit schema field to open the Schema editor.
g) Add four columns with the following corresponding names and parameters:
Adding the components

Procedure
1. Add the following components by typing their names in the design workspace or dropping them
from the Palette:
• twotELTOracleInput components
• two tELTOracleMap components
• one tELTOracleOutput component
• one tOracleInput component
• one tLogRow component
2. Rename the tELTOracleMap components to SubqueryMap and ELTMap.
Configuring the input components

Procedure
1. Select the first tELTOracleInput component to display the Basic settings tab.
2. Enter "PreferredSubject" in the Default Table Name field.
3. Click [...] next to Edit schema to define the schema of the input table PreferredSubject in the
schema editor.
4. Click [+] to add four columns:
• SeqID with the DB Type set to INTEGER
• StuName, Subject, and Detail with the DB Type set to VARCHAR
804
tELTTeradataMap
Click OK to validate these changes and close the schema editor.

5. Connect the first tELTOracleInput component to the second tELTOracleMap component using the
Link > PreferredSubject(Table).
6. Select the second tELTOracleInput component to display the Basic settings tab.
7. Enter "CourseScore" in the Default Table Name field.
8. Click [...] next to Edit schema to define the schema of the input table CourseScore in the schema
editor.
9. Click the [+] button to add six columns:
• SeqID and Score with the DB Type set to INTEGER
• StuName, Subject, Course, and Detail with the DB Type set to VARCHAR

10. Connect the second tELTOracleInput component to the first tELTOracleMap component using the
Link > CourseScore(Table).
805
tELTTeradataMap

Procedure
1. Select the tELTOracleOutput component to display the Basic settings view.
2. Enter "TotalScore OfPreferredSubject" in the Default Table Name field.

3. Click [...] next to Edit schema to define the schema of the output table in the schema editor.
4. Click [+] to add four columns:
• SeqID and TotalScore with the DB Type set to INTEGER
• StuName and PreferredSubject with the DB Type set to VARCHAR

5. Click Sync columns to sychronize the Input and Output tables of the tELTOracleOutput
component.
806
tELTTeradataMap
Configuring data mapping to generate a subquery

Procedure
1. Click the SubqueryMap component (next to the second tELTOracleInput) to open its Basic settings
view.
Note: Specify the Oracle database connection information in the second ELTMap component in
the Job.
2. Click [...] next to ELT Oracle Map Editor to open its map editor.
3. Add the input table CourseScore by clicking [+] in the upper left corner of the map editor and
then selecting the relevant table name from the drop-down list in the pop-up dialog box.
4. Add an output table by clicking [+] in the upper right corner of the map editor and then entering
the table name TotalScore in the corresponding field in the pop-up dialog box.
5. Drag StuName, Subject, and Score columns in the input table and then drop them to the output
table.
6. Click the Add filter row button in the upper right corner of the output table and select Add an
other(GROUP...) clause from the pop-up menu. Then in the Additional other clauses (GROUP/
807
tELTTeradataMap
ORDER BY...) field displayed, enter the clause GROUP BY CourseScore.StuName,

CourseScore.Subject.
Add the aggregate function SUM for the column Score of the output table by changing the
expression of this column to SUM(CourseScore.Score).
7. Click the Generated SQL Select query for 'table1' output tab at the bottom of the map editor to
display the corresponding generated SQL statement.
This SQL query will appear as a subquery in the SQL query generated by the ELTMap component.
8. Click OK to validate these changes and close the map editor.
9. Connect the first SubqueryMap to ELTMap using the Link > TotalScore (table1) link. Note that
the link is renamed automatically to TotalScore (Table_ref) since the output table TotalScore is a
reference table.
Mapping the input and output schemas

Procedure
1. Right-click ELTMap and select Link > *New Output* (Table) from the contextual menu.
2. Click TotalScoreOfPreferredSubject. In the pop-up dialog box, click Yes to get the schema from
the target component.
3. Click ELTMap to open its Basic settings view.
4. Select Repository from the Property Type drop-down list. Specify the Oracle database you
previously added to automatically propagate the database connection information.
5. Click [...] next to ELT Oracle Map Editor to open its map editor.
6. Add the input table PreferredSubject by clicking the [+] button in the upper left corner
of the map editor and selecting the relevant table name from the drop-down list in the pop-up
dialog box.
Repeat the step to add another input table TotalScore.
808
tELTTeradataMap
7. Drag the StuName column in the input table PreferredSubject and drop it to the corresponding
column in the input table TotalScore. Then select the Explicit join check box for the StuName
column in the input table TotalScore.
Repeat the step for the Subject column.
8. Drag the SeqID column in the input table PreferredSubject and drop it to the corresponding
column in the output table.
Repeat the step to drag the StuName and Subject columns in the input table PreferredSubject and
the Score column in the input table TotalScore and drop them to the corresponding column in the
output table.
9. Click the Generated SQL Select query for "table2" output tab at the bottom of the map editor to
display the corresponding generated SQL statement.
The SQL query generated in the SubqueryMap component appears as a subquery in the SQL query
generated by this component. Alias will be automatically added for the selected columns in the
subquery.
10. Click OK to validate these changes and close the map editor.
Executing the Job

Procedure
Click Run to execute the Job.
The select statement is generated and the mapping data are written into the output table.
Related scenarios
809
tELTTeradataOutput
tELTTeradataOutput
Performs the action (insert, update or delete) on data in the specified Teradata table through the SQL
statement generated by the tELTTeradataMap component.
tELTTeradataOutput Standard properties

These properties are used to configure tELTTeradataOutput running in the Standard Job framework.
The Standard tELTTeradataOutput component belongs to the ELT family.
Basic settings
Job stops.
flow.
available:
only.

Guide.

Studio User Guide.
810
tELTTeradataOutput
quotation marks.
540.
variable.
which appears.

Advanced settings
Clause SET Select the column names that will be used to generate the
SET clauses.
SET clauses will not be generated for the columns that are
not selected.
This field appears when Update is selected from the Action
on data drop-down list in the Basic settings view.
Global Variables

check box.
use from it.
811
tELTTeradataOutput

User Guide.
Usage
Usage rule tELTTeradataOutput is to be used along with the

tELTTeradataMap. Note that the Output link to be used with
the table name.
Note:

Related scenarios
812
tELTVerticaInput
tELTVerticaInput
Provides the Vertica table schema that will be used by the tELTVerticaMap component to generate the
The three ELT Vertica components are closely related, in terms of their operating conditions. These
components should be used to handle Vertica database table schemas to generate SQL statements,
tELTVerticaInput Standard properties

These properties are used to configure tELTVerticaInput running in the Standard Job framework.
The Standard tELTVerticaInput component belongs to the ELT family.
Basic settings
component only.

Job designs.

available:
only.
Advanced settings
813
tELTVerticaInput
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTVerticaInput is used along with tELTVerticaMap. Note

Note:
Related scenarios
• Mapping data using a subquery on page 800
814
tELTVerticaMap
tELTVerticaMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTVerticaInput
components.
tELTVerticaMap Standard properties

These properties are used to configure tELTVerticaMap running in the Standard Job framework.
The Standard tELTVerticaMap component belongs to the ELT family.
Basic settings
DB Version Select the version of the Vertica database being used.
connection.
Guide.
ELT Vertica Map Editor The ELT Map editor allows you to define the output schema
executed. The column names of the schema can be different
Style link Select a way in which links are displayed.

• Auto: By default, the links between the input and
output schemas and the Web service parameters are in
the form of curves.
• Bezier curve: The links between the schema and the
Web service parameters are in the form of curve.
• Line (fastest): The links between the schema and the
Web service parameters are in the form of straight
lines.
815
tELTVerticaMap

properties are stored. The database connection fields that
follow are completed automatically using the data retrieved.
Host Type in the IP address or hostname of the database.
Port Type in the listening port number of the database.
Additional JDBC Parameters Specify additional connection properties for the database

settings.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tELTVerticaMap is used along with tELTVerticaInput and

tELTVerticaOutput. Note that the Output link to be used
816
tELTVerticaMap
Note:
Related scenarios
817
tELTVerticaOutput
tELTVerticaOutput
Performs the action (insert, update or delete) on data in the specified Vertica table through the SQL
statement generated by the tELTVerticaMap component.
tELTVerticaOutput Standard properties

These properties are used to configure tELTVerticaOutput running in the Standard Job framework.
The Standard tELTVerticaOutput component belongs to the ELT family.
Basic settings
found, Job stops.
• Update: Updates entries in the table.
• Delete: Deletes entries which correspond to the entry
flow.
component only.

Job designs.

available:
only.
Sync columns Click this button to retrieve the schema from the previous
818
tELTVerticaOutput
Where clauses (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operation.
This field is available only when Update or Delete is
selected from the Action on data drop-down list.
variable.
Use different table name Select this check box to use a different output table name.
Table name Type in the output table name.

This field is available only when the Use different table
name check box is selected.
Advanced settings
Direct Select this check box to write the data directly to disk,
bypassing memory.
This check box is not visible when the Set SQL Label check
box is selected.
Set SQL Label Select this check box and specify the label that identifies
the query. For more information, see How to label queries
for profiling.
This check box is not visible when the Direct check box is
selected.
Global Variables

check box.
use from it.
User Guide.
819
tELTVerticaOutput
Usage
Usage rule tELTVerticaOutput is used along with the tELTVerticaMap.

Note:
Related scenarios
• Mapping data using a subquery on page 800
820
tESBConsumer
tESBConsumer
Calls the defined method from the invoked Web service and returns the class as defined, based on the
given parameters.
tESBConsumer Standard properties

These properties are used to configure tESBConsumer running in the Standard Job framework.
The Standard tESBConsumer component belongs to the ESB family.
Basic settings
Service configuration Description of Web service bindings and configuration. The

Endpoint field gets filled in automatically upon completion
of the service configuration.
Input Schema and Edit schema A schema is a row description. It defines the number of
available:
only.

Guide.

Studio User Guide.
Response Schema and Edit schema A schema is a row description. It defines the number of
available:
only.
821
tESBConsumer


Guide.

Studio User Guide.
Fault Schema and Edit schema A schema is a row description. It defines the number of
available:
only.

Guide.

Studio User Guide.
Use Service Registry This option is only available if you subscribed to Talend
Enterprise ESB solutions.
Select this check box to enable the Service Registry. It
provides dynamic endpoint lookup and allows services to
be redirected based upon information retrieved from the
registry. It works in runtime only.
Enter the authentication credentials in the Username and
Password field.
If SAML token is registered in the service registry, you need
to specify the client's role in the Role field. You can also
select the Propagate Credentials check box to make the call
on behalf of an already authenticated user by propagating
the existing credentials. You can enter the username and
the password to authenticate via STS to propagate using
username and password, or provide the alias, username
822
tESBConsumer
and the password to propagate using certificate. For more

information, see the Use Authentication option. Select
the Encryption/Signature body check box to enable XML
Encryption/XML Signature. For more information, see the
chapter about XKMS Service in the Talend ESB Infrastructure
Services Configuration Guide.
In the Correlation Value field, specify a correlation ID or
leave this field empty. For more information, see the Use
Business Correlation option.
For more information about how to set up and use the
Service Registry, see the Talend Administration Center User
Guide and Talend ESB Infrastructure Services Configuration
Guide.
Use Service Locator Maintains the availability of the service to help meet
demands and service level agreements (SLAs).
This option will not show if the Use Service Registry check
box is selected.
Use Service Activity Monitor Captures events and stores this information to facilitate
in-depth analysis of service activity and track-and-trace
of messages throughout a business transaction. This can
be used to analyze service response times, identify traffic
patterns, perform root cause analysis and more.
This option is disabled when the Use Service Registry check
box is selected if you subscribed to Talend Enterprise ESB
solutions.
Use Authentication Select this check box to enable the authentication option.
Select from Basic HTTP, HTTP Digest, Username Token,
and SAML Token (ESB runtime only). Enter a username
and a password in the corresponding fields as required.
Authentication with Basic HTTP, HTTP Digest, and
Username Token work in both the studio and runtime.
Authentication with the SAML Token works in runtime only.
When SAML Token (ESB runtime only) is selected, you can
either provide the user credentials to send the request or
make the call on behalf of an already authenticated user by
propagating the existing credentials. Select from:
-: Enter the username and the password in the
corresponding fields to access the service.
Propagate using U/P: Enter the user name and the password
used to authenticate against STS.
Propagate using Certificate: Enter the alias and the
password used to authenticate against STS.
settings.
This option will not show if the Use Service Registry check
box is selected.
Use Business Correlation Select this check box to create a correlation ID in this
component.
You can specify a correlation ID in the Correlation Value
field. In this case the correlation ID will be passed on to the
service it calls so that chained service calls will be grouped
823
tESBConsumer
under this correlation ID. If you leave this field empty, this
value will be generated automatically at runtime.
When this option is enabled, tESBConsumer will also extract
the correlation ID from the response header and store it in
the component variable for further use in the flow.
This option will be enabled automatically when the Use
Service Registry check box is selected.
Use GZip Compress Select this check box to compress the incoming messages
into GZip format before sending.
Die on error Select this check box to kill the Job when an error occurs.
Advanced settings
Log messages Select this check box to log the message exchange
between the service provider and the consumer.
Service Locator Custom Properties This table appears when Use Service Locator is selected.
You can add as many lines as needed in the table to
customize the relevant properties. Enter the name and the
value of each property between double quotation marks
in the Property Name field and the Property Value field r
espectively.
Service Activity Custom Properties This table appears when Use Service Activity Monitor is
selected. You can add as many lines as needed in the table
to customize the relevant properties. Enter the name and
the value of each property between double quotation marks
in the Property Name field and the Property Value field r
espectively.
Connection time out(second) Set a value in seconds for Web service connection time out.
This option only works in the studio. To use it after the
component is deployed in runtime:
1. Create a configuration file with the name
org.apache.cxf.http.conduits-
<endpoint_name>.cfg in the <TalendRuntime
Path>/container/etc/ folder.
2. Specify the url of the Web service and the
client.ConnectionTimeout parameter in
milliseconds in the configuration file. If you need
to use the Receive time out option, specify the
client.ReceiveTimeout in milliseconds too.
The url can be a full endpoint address or a regular
expression containing wild cards, for example:
url = http://localhost:8040/*
client.ConnectionTimeout=10000000
client.ReceiveTimeout=20000000
, in which http://localhost:8040/* matches all

urls starting with http://localhost:8040/.
Receive time out(second) Set a value in seconds for server answer.

This option only works in the studio. For how to use it after
the component is deployed in runtime, see the Connection
time out option.
824
tESBConsumer
Disable Chunking Select this check box to disable encoding the payload
as chunks. In general, chunking will perform better as
the streaming can take place directly. But sometimes the
payload is truncated with chunking enabled. If you are
getting strange errors when trying to interact with a service,
try turning off chunking to see if that helps.
Trust server with SSL/TrustStore file and TrustStore Select this check box to validate the server certificate to
password the client via an SSL protocol and fill in the corresponding
fields:
TrustStore file: Enter the path (including filename) to
the certificate TrustStore file that contains the list of
certificates that the client trusts.
TrustStore password: Enter the password used to check the
integrity of the TrustStore data.
Use http proxy/Proxy host, Proxy port, Proxy user, and Select this check box if you are using a proxy server and fill
Proxy password in the necessary information.
settings.
HTTP Headers Click [+] as many times as required to add the name-value
pair(s) for HTTP headers to define the parameters of the
requested HTTP operation.
Global Variables

CORRELATION_ID: the correlation ID by which chained
service calls will be grouped. This is a Flow variable and it
returns a string.
check box.
HTTP_RESPONSE_CODE: HTTP response status code. This is
an After variable and it returns an Integer.
HTTP_HEADERS: the set of HTTP headers from response.
This is a Flow variable and it returns map object
java.util.Map<String, java.util.List<?>>.
Header name is represented by map key. Header values are
represented by java.util.List<?>.
use from it.
825
tESBConsumer

User Guide.
Usage
Usage rule This component can be used as an intermediate component.

It requires to be linked to an output component.
Code field with a context variable to turn on or off the Use
Authentication or Use HTTP proxy option dynamically at
runtime. You can add two rows in the table to set both
options.
Once a dynamic parameter is defined, the corresponding
option becomes highlighted and unusable in the Basic
settings view or Advanced settings view.
Guide.
Limitation A JDK is required for this component to operate.
Using tESBConsumer to retrieve the valid email

This scenario describes a Job that uses a tESBConsumer component to retrieve the valid email.

Procedure
1. Drop the following components from the Palette onto the design workspace: a tFixedFlowInput, a
tESBConsumer, two tXMLMap, and two tLogRow components.
2. Right-click the tFixedFlowInput component, select Row > Main from the contextual menu and
click the first tXMLMap component.
3. Right-click the tXMLMap component, select Row > *New Output* (Main) from the contextual
menu and click the tESBConsumer component. Enter payload in the popup dialog box to name
826
tESBConsumer
this row and accept the propagation that prompts you to get the schema from the tESBConsumer
component.
4. Right-click the tESBConsumer component, select Row > Response from the contextual menu and
click the second tXMLMap component.
5. Right-click the second tXMLMap component, select Row > *New Output* (Main) from the
contextual menu and click the second tLogRow component. Enter response in the popup dialog
box to name this row.
6. Right-click the tESBConsumer component again, select Row > Fault from the contextual menu
and click the other tLogRow component.

The tLogRow components will monitor the exchanges from the response and fault messages and does
not need any configuration. Press Ctrl+S to save your Job.
Configuring the tESBConsumer component
About this task

In this scenario, a public web service which is available at http://www.webservicex.net/V
alidateEmail.asmx will be called by the tESBConsumer component to returns true or false for an
email address. You can view the WSDL definition of the service at http://www.webservicex.net/V
alidateEmail.asmx?WSDL for the service description.
Procedure
1. In the design workspace, double-click the tESBConsumer component to open its Basic settings
view in the Component tab.
2. Click the three-dot button next to Service configuration.
827
tESBConsumer
3. In the dialog box that appears, type in: http://www.webservicex.net/ValidateEmail.asmx?WSDL in

the WSDL field and click the refresh button to retrieve port name and operation name. In the Port
Name list, select the port you want to use, ValidateEmailSoap in this example.
Select the Populate schema to repository on finish to retrieve the schema from the WSDL
definition, which will be used by the tFixedFlowInput component. This option is only available to
users of Talend Studio with ESB. If you don't have this option, please ignore it. The schema can
be created manually in the tFixedFlowInput component, which will be shown later.
Click Finish to validate your settings and close the dialog box.
4. Click the Advanced settings view in the Component tab.
5. Select the Log messages check box to show the exchange log in the execution console.
Configuring the tFixedFlowInput component
Procedure
1. Double-click the tFixedFlowInput component to open its Basic settings view in the Component
tab.
828
tESBConsumer
2. For users of Talend Studio with ESB who have retrieved the schema from the service WSDL
definition in the configuration of the tESBConsumer component, select Repository from the
Schema list. Then click the [...] of the next field to show the Repository Content dialog box. Select
the metadata under the IsValidEmail node to use it as the schema of the input message. Click OK
to close the dialog box.
For users of Talend Studio without ESB, please go to the next step.
829
tESBConsumer
3. For users of Talend Studio without ESB, the schema need to be created manually. Select Built-In
from the Schema list.
Click the three-dot button next to Edit Schema. In the schema dialog box, click the plus button to
add a new line of String type and name it Email. Click OK to close the dialog box.
4. In the Number of rows field, set the number of rows as 1.

5. In the Mode area, select Use Single Table and input the following request in double quotation
marks into the Value field:
[email protected]
Configuring the tXMLMap component in the input flow
About this task

Talend data integration uses schemas based on rows and columns since it has roots in relational data
warehouse integration. But SOAP messages uses the XML format. XML is hierarchical and supports
richer structure than rows or columns. So we need the tXMLMap to convert from the relational row/
column structure to the schema expected by the SOAP service.
830
tESBConsumer
Procedure
1. In the design workspace, double-click the tXMLMap component to open the Map Editor.
2. In the output table, right-click the root node and select Rename from the contextual menu. Enter
IsValidEmail in the dialog box that appears.
3. Right-click the IsValidEmail node and select Set A Namespace from the contextual menu. Enter
http://www.webservicex.net in the dialog box that appears.
4. Right-click the IsValidEmail node again and select Create Sub-Element from the contextual menu.
Enter Email in the dialog box that appears.
5. Right-click the Email node and select As loop element from the contextual menu.
6. Click the Email node in the input table and drop it to the Expression column in the row of the
Email node in the output table.
7. Click OK to validate the mapping and close the Map Editor.
Configuring the tXMLMap component in the output flow
About this task

The tXMLMap in the output flow will convert the response message from the XML format to the row/
column structure.
Procedure
1. In the design workspace, double-click the tXMLMap component in the output flow to open the
Map Editor.
2. In the input table, right-click the root node and select Rename from the contextual menu. Enter
IsValidEmailResponse in the dialog box that appears.
3. Right-click the IsValidEmailResponse node and select Set A Namespace from the contextual menu.
Enter http://www.webservicex.net in the dialog box that appears.
831
tESBConsumer
4. Right-click the IsValidEmailResponse node again and select Create Sub-Element from the cont
extual menu. Enter IsValidEmailResult in the dialog box that appears.
5. Right-click the IsValidEmailResult node and select As loop element from the contextual menu.
6. On the lower right part of the map editor , click [+] to add a row of String type to the output t
able and name it response.
7. Click the IsValidEmailResult node in the input table and drop it to the Expression column in the
row of the response node in the output table.
Executing the Job

Click the Run view to display it and click the Run button to launch the execution of your Job. You can
also press F6 to execute it. In the execution log you will see:
832
tESBConsumer
The email address [email protected] is returned as false. The input and output SOAP
messages in XML are also shown in the console.
Using tESBConsumer with custom SOAP Headers

This scenario is similar to the previous one. It describes a Job that uses a tESBConsumer component to
retrieve a valid email address with custom SOAP headers in the request message.
833
tESBConsumer

Procedure
1. Drop the following components from the Palette onto the design workspace: a tESBConsumer, a
tMap, two tFixedFlowInput, three tXMLMap, and two tLogRow.
2. Connect each of the tFixedFlowInput with a tXMLMap using the Row > Main connection.
3. Right-click the first tXMLMap, select Row > *New Output* (Main) from the contextual menu and
click tMap. Enter payload in the popup dialog box to name this row.
Repeat this operation to connect another tXMLMap to tMap and name the output row header.
4. Right-click the tMap component, select Row > *New Output* (Main) from the contextual menu and
click the tESBConsumer component. Enter request in the popup dialog box to name this row and
accept the propagation that prompts you to get the schema from the tESBConsumer component.
5. Right-click the tESBConsumer component, select Row > Response from the contextual menu and
click the third tXMLMap component.
6. Right-click the third tXMLMap component, select Row > *New Output* (Main) from the contextual
menu and click one of the tLogRow components. Enter response in the popup dialog box to name
this row.
7. Right-click the tESBConsumer component again, select Row > Fault from the contextual menu
and click the other tLogRow component.

The tLogRow components will monitor the exchanges from the response and fault messages and does
not need any configuration. Press Ctrl+S to save your Job.
Configuring the tESBConsumer component
About this task

In this scenario, a public web service which is available at http://www.webservicex.net/V
alidateEmail.asmx will be called by the tESBConsumer component to returns true or false for an
email address. You can view the WSDL definition of the service at http://www.webservicex.net/V
alidateEmail.asmx?WSDL for the service description.
834
tESBConsumer
Procedure
1. In the design workspace, double-click the tESBConsumer component to open its Basic settings
view in the Component tab.
2. Click the [...] button next to Service configuration.
3. In the dialog box that appears, type in: http://www.webservicex.net/ValidateEmail.asmx?WSDL in

the WSDL field and click the refresh button to retrieve port name and operation name. In the Port
Name list, select the port you want to use, ValidateEmailSoap in this example. Click OK to validate
your settings and close the dialog box.
Select the Populate schema to repository on finish to retrieve the schema from the WSDL
definition, which will be used by the tFixedFlowInput component. This option is only available to
users of Talend Studio with ESB. If you don't have this option, please ignore it. The schema can
be created manually in the tFixedFlowInput component, which will be shown later.
835
tESBConsumer
Click Finish to validate your settings and close the dialog box.
4. In the Advanced settings view, select the Log messages check box to log the content of the
messages.
Configuring the tFixedFlowInput components
Procedure
1. Double-click the first tFixedFlowInput component to open its Basic settings view in the
Component tab.
2. For users of Talend Studio with ESB who have retrieved the schema from the service WSDL
definition in the configuration of the tESBConsumer component, select Repository from the
Schema list. Then click the [...] of the next field to show the Repository Content dialog box. Select
the metadata under the IsValidEmail node to use it as the schema of the input message. Click OK
to close the dialog box.
For users of Talend Studio without ESB, please go to the next step.
836
tESBConsumer
3. For users of Talend Studio without ESB, the schema need to be created manually. Select Built-In
from the Schema list.
Click the [...] button next to Edit Schema. In the schema dialog box, click the [+] button to add a
new line of String type and name it Email. Click OK to close the dialog box.
837
tESBConsumer
4. In the Number of rows field, set the number of rows as 1.

5. In the Mode area, select Use Single Table and enter "[email protected]" into the Value
field, which is the payload of the request message.
6. Configure the second tFixedFlowInput as the first one, except for its schema.
Add two rows of String type to the schema and name them id and company respectively.
Give the value Hello world! to id and Talend to company, which are the headers of the request
message.
838
tESBConsumer
Configuring the tXMLMap components in the input flow
About this task

Talend data integration uses schemas based on rows and columns since it has roots in relational data
warehouse integration. But SOAP messages uses the XML format. XML is hierarchical and supports
richer structure than rows or columns. So we need the tXMLMap to convert from the relational row/
column structure to the schema expected by the SOAP service.
Procedure
1. In the design workspace, double-click the first tXMLMap component to open the Map Editor.
2. In the output table, right-click the root node and select Rename from the contextual menu. Enter
IsValidEmail in the dialog box that appears.
3. Right-click the IsValidEmail node and select Set A Namespace from the contextual menu. Enter
http://www.webservicex.net in the dialog box that appears.
4. Right-click the IsValidEmail node again and select Create Sub-Element from the contextual menu.
Enter Email in the dialog box that appears.
5. Right-click the Email node and select As loop element from the contextual menu.
6. Click the Email node in the input table and drop it to the Expression column in the row of the
Email node in the output table.
839
tESBConsumer

8. Configure the other tXMLMap in the same way. Add a row of Document type to the output table
and name it header. Create two sub-elements to it, id and company. Map the id and the company
nodes in the input table to the corresponding nodes in the output table.
840
tESBConsumer
Configuring the tMap component
Procedure
1. In the design workspace, double-click tMap to open the Map Editor.
2. On the lower right part of the map editor, click [+] to add two rows of Document type to the outp
ut table and name them payload and headers respectively.
3. Click the payload node in the input table and drop it to the Expression column in the row of the
payload node in the output table.
4. Click the header node in the input table and drop it to the Expression column in the row of the
headers node in the output table.
Configuring the tXMLMap component in the output flow
About this task

The tXMLMap in the output flow will convert the response message from the XML format to the row/
column structure.
Procedure
1. In the design workspace, double-click the tXMLMap component in the output flow to open the
Map Editor.
2. In the input table, right-click the root node and select Rename from the contextual menu. Enter
IsValidEmailResponse in the dialog box that appears.
3. Right-click the IsValidEmailResponse node and select Set A Namespace from the contextual menu.
Enter http://www.webservicex.net in the dialog box that appears.
841
tESBConsumer
4. Right-click the IsValidEmailResponse node again and select Create Sub-Element from the cont
extual menu. Enter IsValidEmailResult in the dialog box that appears.
5. Right-click the IsValidEmailResult node and select As loop element from the contextual menu.
6. On the lower right part of the map editor, click [+] to add a row of String type to the output t
able and name it response.
7. Click the IsValidEmailResult node in the input table and drop it to the Expression column in the
row of the response node in the output table.
Executing the Job

Click the Run view to display it and click the Run button to launch the execution of your Job. You can
also press F6 to execute it.
842
tESBConsumer
As shown in the execution log, the email address [email protected] is returned as false. The
input and output SOAP messages in XML is also shown in the console. The SOAP header is sent with
the request to the service.
843
tESBProviderFault
tESBProviderFault
Serves a Talend Job cycle result as a Fault message of the Web service in case of a request response
communication style.
It acts as Fault message of the Web Service response at the end of a Talend Job cycle.
tESBProviderFault Standard properties

These properties are used to configure tESBProviderFault running in the Standard Job framework.
The Standard tESBProviderFault component belongs to the ESB family.
This component is relevant only when used with one of the Talend solutions with ESB, as it should be
used with the Service Repository node and the Data Service creation related wizard(s).
Basic settings
available:
only.

Guide.

Studio User Guide.
EBS service settings Fault title: Value of the faultString column in the Fault
message.
Note:
The Row > Fault flow of tESBConsumer has a pre-defined
schema whose column, faultString, is filled up with the
content of the field Fault title of tESBProviderFault.
844
tESBProviderFault
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component should only be used with the

tESBProviderRequest component.
Requesting airport names based on country codes

This scenario applies only to Talend Open Studio for ESB, Talend Data Services Platform and Talend
Data Fabric.
This scenario involves two Jobs, one as the data service provider and the other as data service
consumer. The former listens to the requests from the consumer via tESBProviderRequest, matches
the country code wrapped in the request against a MySQL database table that has the country code/
airport pairs via tMap, and finally returns the correct airport name via tESBProviderResponse or if
no matches are found, the error message via tESBProviderFault. The Consumer sends requests to the
Provider and receives the airport information or error reminders via tESBConsumer.
Building the data service provider to publish a service

The data service airport has already been defined under the Services node of the Repository tree view.
Its schema has three major elements as shown below:
845
tESBProviderFault
For how to define a service in the Studio, see Talend Studio User Guide.
Assigning a Job to the defined service
Procedure
1. Right-click getAirportInformationByISOCountryCode under the Web service airport and from the
contextual menu, select Assign Job.
2. In the Operation Choice window, select Create a new Job and Assign it to this Service Operation.
3. Click Next to open the Job description window. The Job name airportSoap_getAirportInform
ationByISOCountryCode is automatically filled in.
846
tESBProviderFault
4. Click Finish to create the Job and open it in the workspace. Three components are already
available.
Adding components to arrange the data flow
Procedure
1. Drop tXMLMap and tMysqlInput from the Palette to the workspace.
2. Link tESBProviderRequest to tXMLMap using a Row > Main connection.
3. Link tMysqlInput to tXMLMap using a Row > Main connection.
4. Link tXMLMap to tESBProviderResponse using a Row > *New Output*(Main) connection.
In the new Output name pop-up window, enter the output table name, airport_response.
Click OK in the pop-up window that asks whether to get the schema of the target component.
847
tESBProviderFault
5. Link tXMLMap to tESBProviderFault using a Row > *New Output*(Main) connection.

In the new Output name pop-up window, enter the output table name, fault_message.
Configuring how requests are processed
Procedure
1. Double-click tMysqlInput to display its Basic settings view.
2. Fill up the basic settings for the Mysql connection and database table.
Click the [...] button to open the schema editor.
848
tESBProviderFault
3. Click the [+] button to add two columns, id and name, with the type of string.
Click OK to close the editor.
Click Guess Query to retrieve the SQL query.
4. Double-click tXMLMap to open its mapper.
5. In the main : row1 table of the input flow side (left), right-click the column name payload and from
the contextual menu, select Import from Repository. Then the Metadata wizard is opened.
849
tESBProviderFault
Select the schema of the request message and click OK to validate this selection. In this example,
the schema is getAirportInformationByISOCountryCode.
6. Do the same to import the hierarchical schemas for the response/fault messages (right). In this
example, these schemas are getAirportInformationByISOCountryCodeResponse and getAirportInfo
rmationByISOCountryCodeFault respectively.
7. Then to create the join to the lookup data, drop the CountryAbbrviation node from the main flow
onto the id column of the lookup flow.
8. On the lookup flow table, click the wrench icon on the upper right corner to open the setting
panel.
Set Lookup Model as Reload at each row, Match Model as All matches and Join Model as Inner
join.
9. On the airport_response output flow table, click the wrench icon on the upper right corner to open
the setting panel.
Set the All in one option as true. This ensures that only one response is returned for each request
if multiple airport matches are found in the database.
10. On the fault_message output flow table, click the wrench icon on the upper right corner to open
the setting panel.
Set the Catch Lookup Inner Join Reject option as true to monitor the mismatches between the
country code in the request and the records in the database table. Once such a situation occurs, a
fault message will be generated by tESBConsumer and outputted via its Row > Fault flow.
850
tESBProviderFault
Note:
The Row > Fault flow of tESBConsumer has a predefined schema in which the faultString
column is filled with the content of the field Fault title of tESBProviderFault.
11. Drop the name column in the lookup flow onto the Expression area next to the tns:getAirport
InformationByISOCountryCodeResult node in the airport_response output flow.
Drop the tns:CountryAbbreviation node in the main flow onto the Expression area next to the
tns:getAirportInformationByISOCountryCodeFaultString node in the fault_message output flow. This
way, the incorrect country code in the request will be shown in the faultDetail column of the Row
> Fault flow of tESBConsumer.
Click OK to close the editor and validate this configuration.
12. Double-click tESBProviderFault to display its Basic settings view:
13. In the field Fault title, enter the context variable context.fault_message.
For how to define context variables, see Talend Studio User Guide.
Publishing the service to listen to requests
Procedure
1. Press Ctrl +S to save the Job.
Results
The data service is published and will listen to all the requests until you click the Kill button to stop it
as by default, the Keep listening option of tESBProviderRequest is selected automatically.
Now is the time to configure the consumer Job that interacts with the data service.
851
tESBProviderFault
Building the data service consumer to request the service

Built upon tESBConsumer, the consumer Job sends two requests that contain the country codes to the
Web service for the relevant airport names. If wrong country code is wrapped in the request, the error
message will be returned. The country codes and the MySQL database records are as follows:
Procedure
1. Drop a tFileInputDelimited, a tXMLMap, a tESBConsumer and two tLogRow from the Palette to
the workspace.
2. Rename one tLogRow as response and the other as fault_message.
3. Link tFileInputDelimited to tXMLMap using a Row > Main connection.
4. Link tXMLMap to tESBConsumer using a Row > *New Output*(Main) connection.
In the new Output name pop-up window, enter the output table name, for example request.
5. Link tESBConsumer to response using the Row > Response connection.
6. Link tESBConsumer to fault_message using the Row > Fault connection.
Procedure
852
tESBProviderFault
2. In the File name/stream field, enter the context variable for the file that has the country codes,
context.filepath.
3. Click the [...] button to open the schema editor.
4. Click the [+] button to add a column, country_code, for example, with the type of string.
5. Double-click tXMLMap to open its Map editor.
853
tESBProviderFault
6. In the request table of the output flow side, right-click the column name payload and from the
contextual menu, select Import from Repository. Then the Metadata wizard is opened.
Select the schema of the request message and click OK to validate this selection. In this example,
the schema is getAirportInformationByISOCountryCode.
7. Drop the country_code column in the main flow onto the Expression area next to the
tns:CountryAbbreviation node in the request output flow.
Click OK to close the editor and validate this configuration.
8. Double-click tESBConsumer to open its service configuration wizard:
854
tESBProviderFault
9. Click the Browse... button to select the desired WSDL file. The Port name and Operation are
automatically filled up once the WSDL file is selected.
Click OK to close the wizard.
10. Double-click response to open its Basic settings view:
11. Select Vertical (each row is a key/value list) and then Print label for a better view of the results.
Do the same to the other tLogRow, fault_message.
Executing the Job
Procedure
1. Press Ctrl +S to save the Job.
855
tESBProviderFault
As shown above, two messages are returned, one giving the airport name that matches the
country code CN and the other giving the error details caused by the country code CC.
856
tESBProviderRequest
tESBProviderRequest
Wraps Talend Job as web service.
It waits for a request message from a consumer and passes it to the next component.
tESBProviderRequest Standard properties

These properties are used to configure tESBProviderRequest running in the Standard Job framework.
The Standard tESBProviderRequest component belongs to the ESB family.
Basic settings
Property Type Either Built-in or Repository .
Built-in: No WSDL file is configured for the Job.
Repository: Select the desired web service from the

Repository, to the granularity of the port name and
operation.
available:
only.

Guide.
Repository: The schema is created and stored in the

Studio User Guide.
Keep listening Check this box when you want to ensure that the provider
(and therefore Talend Job) will continue listening for
requests after processing the first incoming request.
857
tESBProviderRequest
Advanced settings
Log messages (Studio only) Select this check box to log the message exchange
between the service provider and the consumer. This option
works in the Studio only.
Response timeout, sec Specify the time limit in seconds for sending response to
the consumer. This parameter is necessary to avoid locking
of message exchanges.
Request processing queue size Specify the maximum number of received requests that
can be processed in parallel by the components between
tESBProviderRequest and tESBProviderResponse. Note that
this parameter is different from the queueSize in the
<TalendRuntimePath>\etc\org.apache.cxf.wor
kqueues-default.cfg which defines pool configuration
for incoming requests on CXF level.
Request processing timeout, sec Specify the time limit in seconds for requests to be
processed by the components between the tESBProviderRe
quest and the tESBProviderResponse.
Global Variables

CORRELATION_ID: the correlation ID by which chained
service calls will be grouped. This is a Flow variable and it
returns a string.
SECURITY_TOKEN: the user identity information in the
request header. This is a Flow variable and it returns an
XML node.
HEADERS_SOAP: the headers of the SOAP request. This is a
Flow variable and it returns all SOAP request headers.
HEADERS_HTTP: the headers of the HTTP request. This is a
Flow variable and it returns all HTTP request headers.
check box.
use from it.
User Guide.
858
tESBProviderRequest
Usage
Usage rule This component covers the possibility that a Talend Job can
be wrapped as a service, with the ability to input a request
to a service into a Job and return the Job result as a service
response.
The tESBProviderResponse component can both deliver the
payload of a SOAP message and also access the HTTP and
SOAP headers of a service.
The tESBProviderRequest component should be used
with the tESBProviderResponse component to provide
a Job result as a response, in case of a request-response
communication style.
When the SAML Token or the Service Registry is enabled in
the service runtime options and if the SAML Token exists
in the request header, the tESBProviderRequest compo
nent will get and store the SAML Token in the component
variable for further use in the flow.
The tESBProviderRequest component will get the
Correlation Value in the request header if it exists and
stored it in the component variable. When the Business
Correlation or the Service Registry is enabled in the service
runtime options, the Correlation Value will also be added to
the response. In this case, tESBProviderRequest will create a
Correlation Value if it does not exist.
Note that the Service Registry option is only available if you
subscribed to Talend Enterprise ESB solutions. For more
information about how to set the runtime options, see the
corresponding section in the Talend Studio User Guide.
Code field with a context variable to turn on or off the Keep
listening option dynamically at runtime.
When a dynamic parameter is defined, the corresponding
Keep listening option in the Basic settings view becomes
unusable.
Guide.
Sending a message without expecting a response

Data Fabric.
The Jobs, which are built upon the components under the ESB/Web Services family, act as the
implementations of web services defined in the Services node of the Repository. They require the
creation of and association with relevant services. For more information about services, see the
related topics in the Talend Studio User Guide.
859
tESBProviderRequest
In this scenario, a provider Job and a consumer Job are needed. In the meantime, the related
service should already exist in the Services node, with the WSDL URI being http://127.0.0.1.8088/
esb/provider/?WSDL, the port name being TEST_ProviderJobSoapBinding and the operation being
invoke(anyType):anyType.
The provider Job consists of a tESBProviderRequest, a tXMLMap, and two tLogRow components.
• Drop the following components from the Palette onto the design workspace: a tESBProviderRe
quest, a tXMLMap, and two tLogRow.
• Double-click tESBProviderRequest_1 in the design workspace to display its Component view and
set its Basic settings.
• Select Repository from the Property Type list and click the three-dot button to choose the service,
to the granularity of port name and operation.
• Click OK.
• Click the three-dot button next to Edit schema to view the schema of tESBProviderRequest_1.
860
tESBProviderRequest
• Click OK.
• Connect tESBProviderRequest_1 to tLogRow_1.
• Double-click tLogRow_1 in the design workspace to display its Component view and set its Basic
settings.
• Click the three-dot button next to Edit schema. and define the schema as follow.
• Connect tLogRow_1 to tXMLMap_1.

• Connect tXMLMap_1 to tLogRow_2 and name this row as payload.
• In the design workspace, double-click tXMLMap_1 to open the Map Editor.
861
tESBProviderRequest
• On the lower right part of the map editor, click the plus button to add one row to the payload
table and name this row as payload.
• In the Type column of this payload row, select Document as the data type. The corresponding XML
root is added automatically to the top table on the right side which represents the output flow.
• In the payload table, right-click root to open the contextual menu.
• From the contextual menu, select Create Sub-Element and type in response in the popup dialog
box.
• Right-click the response node and select As loop element from the contextual menu.
• Repeat this operation to create a sub-element request of the root node in the input table and set
the request node as loop element.
• Click the request node in the input table and drop it to the Expression column in the row of the
response node in the output table.
• Click OK to validate the mapping and close the map editor.

• Double-click tLogRow_2 in the design workspace to display its Component view and set its Basic
settings.
862
tESBProviderRequest
• Click the three-dot button next to Edit Schema and define the schema as follow.
• Save the Job.

The consumer Job consists of a tFixedFlowInput, a tXMLMap, a tESBConsumer, and two tLogRow
components.
• Drop the following components from the Palette onto the design workspace: a tFixedFlowInput, a
tXMLMap, a tESBConsumer, and two tLogRow.
• Double-click tFixedFlowInput_1 in the design workspace to display its Component view and set its
Basic settings.
863
tESBProviderRequest
• Edit the schema of the tFixedFlowInput_1 component.
• Click the plus button to add a new line of string type and name it payloadString.
• Click OK.
• In the Number of rows field, set the number of rows as 1.
• In the Mode area, select Use Single Table and input world in quotations into the Value field.
• Connect tFixedFlowInput_1 to tXMLMap_1.
• Connect tXMLMap_1 to tESBConsumer_1 and name this row as payload.
• In the output table, right-click the root node to open the contextual menu.
• From the contextual menu, select Create Sub-Element and type in request in the popup dialog
box.
• Right-click the request node and select As loop element from the contextual menu.
• Click the payloadstring node in the input table and drop it to the Expression column in the row of
the request node in the output table.
864
tESBProviderRequest
• Click OK to validate the mapping and close the Map Editor.

• Start the Provider Job. In the executing log you can see:
...
web service [endpoint: http://127.0.0.1:8088/esb/provider] published
...
• In the tESBConsumer_1 Component view, set its Basic settings.
• Click the three-dot button next to the Service Configuration to open the editor.
865
tESBProviderRequest
• In the WSDL field, type in: http://127.0.0.1:8088/esb/provider?WSDL.

• Click the Refresh button to retrieve port name and operation name.
• Click OK.
• In the Basic settings of the tESBConsumer, set the Input Schema as follow:
• Set the Response Schema as follow:
• Set the Fault Schema as follow:
• Connect tESBConsumer_1 to tLogRow_1 and tLogRow_2.

• In the design workspace, double-click the tLogRow_1 component to display its Component view
and set its Basic settings.
866
tESBProviderRequest
• Click the three-dot button next to Edit Schema and define the schema as follow:
• In the Job Design, double-click tLogRow_2 to display its Component view and set its Basic
settings.
• Save the Job.
867
tESBProviderRequest
• Run the provider Job. In the execution log you will see:
INFO: Setting the server's publish address to be http://127.0.0.1:8088/esb/provider

2011-04-21 14:14:36.793:INFO::jetty-7.2.2.v20101205
2011-04-21 14:14:37.856:INFO::Started
[email protected]:8088
• Run the consumer Job. In the execution log of the Job you will see:
Starting job CallProvider at 14:15 21/04/2011.
[statistics] connecting to socket on port 3942

[statistics] connected
TEST_ESBProvider2
TEST_ESBProvider2SoapBingding
|
[tLogRow_2] payloadString: <request>world</request>
{http://talend.org/esb/service/job}TEST_ESBProvider2
{http://talend.org/esb/service/job}TEST_ESBProvider2SoapBinding
invoke
[tLogRow_1] payload: null
[statistics] disconnected
Job CallProvider2 ended at 14:16 21/04/2011. [exit code=0]
• In the provider's log you will see the trace log:
web service [endpoint: http://127.0.0.1:8088/esb/provider]

published
[tLogRow_1] payload: <?xml version="1.0" encoding="UTF-8"?>
<request>world</request>
### world
[tLogRow_2] content: world
<response xmlns="http://talend.org/esb/service/job">Hello, world!</response>
web service [endpoint: http://127.0.0.1:8088/esb/provider] unpublished
Job ESBProvider2 ended at 14:16 21/04/2011. [exit code=0]
868
tESBProviderResponse
Serves a Talend Job cycle result as a response message.
It acts as a service provider response builder at the end of each Talend Job cycle.
tESBProviderResponse Standard properties

These properties are used to configure tESBProviderResponse running in the Standard Job framework.
The Standard tESBProviderResponse component belongs to the ESB family.
Basic settings
available:
only.

Guide.

Studio User Guide.
Advanced settings
869
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule The tESBProviderResponse component should only be used

with the tESBProviderRequest component to provide a Job
result as response for a web service provider, in case of a
request-response communication style.
Returning Hello world response

Data Fabric.
The Jobs, which are built upon the components under the ESB/Web Services family, act as the
implementations of web services defined in the Services node of the Repository. They require the
creation of and association with relevant services. For more information about services, see the
related topics in the Talend Studio User Guide.
In this scenario, a provider Job and a consumer Job are needed. In the meantime, the related
service should already exist in the Services node, with the WSDL URI being http://127.0.0.1.8088/
esb/provider/?WSDL, the port name being TEST_ProviderJobSoapBinding and the operation being
invoke(anyType):anyType.
The provider Job consists of a tESBProviderRequest, a tESBProviderResponse, a tXMLMap, and two
tLogRow components.
870
• Drop the following components from the Palette onto the design workspace: a tESBProviderRe
quest, a tESBProviderResponse, a tXMLMap, and two tLogRow.
• In the design workspace, double-click tESBProviderRequest_1 to display its Component view and
• Select Repository from the Property Type list and click the three-dot button to choose the service,
to the granularity of port name and operation.
• Click OK.
• Click the three-dot button next to Edit schema to view its schema.
871
• Connect tESBProviderRequest_1 to tLogRow_1.

• Double-click tLogRow_1 to display its Component view and set its Basic settings.
• Click the three-dot button next to Edit schema and define the schema as follow.
• Connect tLogRow_1 to tXMLMap_1.

• Connect tXMLMap_1 to tLogRow_2 and name this row as payload.
• On the lower right part of the map editor, click the plus button to add one row to the payload
table and name this row as payload.
872
• In the Type column of this payload row, select Document as the data type. The corresponding XML
root is added automatically to the top table on the right side which represents the output flow.
• From the contextual menu, select Create Sub-Element and type in response in the popup dialog
box.
• Right-click the response node and select As loop element from the contextual menu.
• Repeat this operation to create a sub-element request of the root node in the input table and set
the request node as loop element.
• Click the request node in the input table and drop it to the Expression column in the row of the
response node in the output table.
• Click OK to validate the mapping and close the map editor.

• In the design workspace, double-click tLogRow_2 to display its Component view and set its Basic
settings.
873
• Connect tLogRow_2 to tESBProviderResponse_1.

• In the design workspace, double-click tESBProviderResponse_1 to open its Component view and
• Save the provider Job.

The consumer Job consists of a tFixedFlowInput, a tXMLMap, a tESBConsumer, and two tLogRow
components.
874
• Drop the following components from the Palette onto the design workspace: a tFixedFlowInput, a
tXMLMap, a tESBConsumer, and two tLogRow.
• Double-click tFixedFlowInput_1 in the design workspace to display its Component view and set its
Basic settings.
• Click the three-dot button next to Edit schema.
• Click the plus button to add a new line of string type and name it payloadString.
• Click OK.
875
• In the Number of rows field, set the number of rows as 1.

• In the Mode area, select Use Single Table and input world in quotations into the Value field.
• Connect tFixedFlowInput to tXMLMap.
• Connect tXMLMap to tESBConsumer and name this row as payload.
• From the contextual menu, select Create Sub-Element and type in request in the popup dialog
box.
• Right-click the request node and select As loop element from the contextual menu.
• Click the payloadstring node in the input table and drop it to the Expression column in the row of
the request node in the output table.
• Click OK to validate the mapping and close the Map Editor.

• Start the Provider Job. In the executing log you can see:
...
...
• In the tESBConsumer_1 Component view, set its Basic settings.
876
• Click the three-dot button next to the Service Configuration to open the editor.
• In the WSDL field, type in: http://127.0.0.1:8088/esb/provider/?WSDL

• Click the Refresh button to retrieve port name and operation name.
• Click OK.
• In the Basic settings of the tESBConsumer, set the Input Schema as follows:
• Set the Response Schema as follows:
877
• Set the Fault Schema as follows:
• Connect tESBConsumer_1 to tLogRow_1 and tLogRow_2.

• In the design workspace, double-click tLogRow_1 to display its Component view and set its Basic
settings.
• In the Job Design, double-click tLogRow_2 to display its Component view and set its Basic
settings.
878
• Click the three-dot button next to Edit Schema and define the schema as follow:
• Save the consumer Job.

• Run the provider Job. In the execution log you will see:
2011-04-21 15:28:26.874:INFO::jetty-7.2.2.v20101205
2011-04-21 15:28:27.108:INFO::Started
[email protected]:8088
• Run the consumer Job. In the execution log of the Job you will see:
Starting job CallProvider at 15:29 21/04/2011.
[statistics] connecting to socket on port 3690

[statistics] connected
TEST_ProviderJob
TEST_ProviderJobSoapBingding
|
{http://talend.org/esb/service/job}TEST_ProviderJob
{http://talend.org/esb/service/job}TEST_ProviderJobSoapBinding
invoke
Job ConsumerJob ended at 15:29 21/04/2011. [exit code=0]
• In the provider's log you will see the trace log:
879

<request>world</request>
### world
[tLogRow_2] content: world
web service [endpoint: http://127.0.0.1:8088/esb/provider] unpublished
Job ProviderJob ended at 15:29 21/04/2011. [exit code=0]
880
tEXABulkExec
tEXABulkExec
Imports data into an EXASolution database table using the IMPORT command provided by the
EXASolution database in a fast way.
The import will be cancelled after a configurable number of records fail to import. Erroneous records
can be sent to a log table in the same database or to a local log file.
tEXABulkExec Standard properties

These properties are used to configure tEXABulkExec running in the Standard Job framework.
The Standard tEXABulkExec component belongs to the Databases family.
Basic settings
Use an existing connection Select this check box and from the list displayed select the
details you have already defined.
connection.
Guide.

• Repository: Select the repository file in which the
properties are stored. The database connection fields
that follow are completed automatically using the data
retrieved.
Host Enter the host or host list of the EXASol database servers.
EXASol can run in a cluster environment. The valid value
can be a simple IP address (for example, 172.16.173.128
), an IP range list (for example, 172.16.173.128..130 that
represents three servers 172.16.173.128, 172.16.173.129
, and 172.16.173.130), or a comma-separated host list
(for example, server1,server2,server3) of the EXASolution
database cluster.
Port Enter the listening port number of the EXASolution

database cluster.
881
tEXABulkExec
Schema Enter the name of the schema you want to use.
User and Password Enter the user authentication data to access the
EXASolution database.
settings.
Table Enter the name of the table to be written.
Note:
Typically the table names are stored in upper case. If you
need mixed case identifiers, you have to enter the name
in double quotes. For example, "\"TEST_data_LOAD\"".
operations before running the import:
created again.
• Create table if not exists: The table is created if it does
not exist.
• Truncate table: The table content is deleted. You do
Note:
The columns in the schema must be in the same order
as they are in the CSV file. It is not necessary to fill all
columns of the defined table unless the use case or table
definition expects that.

available:
only.
882
tEXABulkExec

Advanced settings
connection you are creating. The properties are separated
by semicolon and each property is a key-value pair, for
example, encryption=1;clientname=Talend.
This field is not available if the Use an existing connection
Column Formats Specify the format for Date and numeric columns if the
default can not be applied.
• Column: The cells in this column are automatically
filled with the defined schema column names.
• Has Thousand Delimiters: Select this check box if
the value of the corresponding numeric column (only
for numeric column) in the file contains thousand
separators.
• Alternative Format: Specify the necessary format
as String value if a special format is expected. The
necessary format will be created from the schema
column length and precision. For more information
about format models, see EXASolution User Manual.
Source table columns If the source is a database, configure the mapping between
the source columns and the target columns in this table.
Specifically configuring the mapping is optional. If you set
nothing here, it is assumed that the source table has the
same structure as the target table.
• Column: The schema column in the target table.
• Source column name: The name of the column in the
source table.
Column Separator Enter the separator for the columns of a row in the local
file.
Column Delimiter Enter the delimiter that encapsulates the field content in
the local file.
Row Separator Enter the char used to separate the rows in the local file.
Null representation Enter the string that represents a NULL value in the local
file. If not specified, NULL values are represented as the
empty string.
Skip rows Enter the number of rows (for example, header or any other
prefix rows) to be omitted.
Encoding Enter the character set used in the local file. By default, it is
UTF8.
Trim column values Specify whether spaces are deleted at the border of CSV
columns.
• No trim: no spaces are trimmed.
• Trim: spaces from both left and right sides are
trimmed.
883
tEXABulkExec
• Trim only left: spaces from only the left side are
trimmed.
• Trim only right: spaces from only the right side are
trimmed.
Default Date Format Specify the format for datetime values. By default, it is
YYYY-MM-DD.
Default Timestamp Format Specify the timestamp format used. By default, it is YYYY-
MM-DD HH24:MI:SS.FF3.
Thousands Separator Specify the character used to separate thousand groups in a

numeric text value. In the numeric format, the character will
be applied to the placeholder G. If the text values contain
this char, you have to configure it also in the Column
Formats table.
Note that this setting affects the connection property
NLS_NUMERIC_CHARACTERS that defines the decimal and
group characters used for representing numbers.
Decimal Separator Specify the character used to separate the integer part
of a number from the fraction. In the numeric format, the
character will be applied to the placeholder D.
Note that this setting affects the connection property
NLS_NUMERIC_CHARACTERS that defines the decimal and
group characters used for representing numbers.
Minimal number errors to reject the transfer Specify the maximum number of invalid rows allowed
during the data loading process. For example, the value 2
means the loading process will stop if the third error occurs.
Log Error Destination Specify the location where error messages will be stored.
• No Logging: error messages will not be saved.
• Local Log File: error messages will be stored in a
specified local file.
• Local Error Log File: specify the path to the local
file that stores error messages.
• Add current timestamp to log file name (before
extension): select this check box to add the
current timestamp before the extension of the file
name for identification reasons in case you use
the same file multiple times.
• Logging Table: error messages will be stored in a
specified table. The table will be created if it does not
exist.
• Error Log Table: enter the name of the table that
stores error messages.
• Use current timestamp to build log table: select
this check box to use the current timestamp to
build the log table for identification reasons in
case you use the same table multiple times.
Transfer files secure Select this check box to transfer the file over HTTPS instead
of HTTP.
Test mode (no statements are executed) Select this check box to have the component running in test
mode, where no statements are executed.
884
tEXABulkExec
Use precision and length from schema Select this check box to check column values that are of
numeric types (that is, Double, Float, BigDecimal, Integer,
Long, and Short) against the Length setting (which sets the
number of integer digits) and the Precision setting (which
sets the number of decimal digits) in the schema. Only
the values with neither their number of integer digits nor
number of decimal digits larger than the Length setting and
the Precision setting are loaded.
For example, with Length set to 4 and Precision set to 3,
the values 8888.8888 and 88888.888 will be dropped;
the values 8888.88 and 888.888 will be loaded.
Global Variables
Global Variables NB_LINE_INSERTED: the number of rows inserted. This is an

ERROR_LOG_FILE: the path to the local log file. This is an
check box.
use from it.
User Guide.
Usage
Usage rule This component is usually used as a standalone component.
885
tEXABulkExec

unusable.
Guide.
Settings for different sources of import data

The settings for this component change depending on the source of your import data.
The component handles data coming from any of the following sources:
• Local file
• Remote file
• EXASol database
• Oracle database
• JDBC-compliant database
Local file
The local file is not transferred by uploading the file. Instead, the driver establishes a (secure)
web service that sends the URL to the database, and the database retrieves the file from this local
web service. Because the port of this service cannot be explicitly defined, this method requires a
transparent network between the local Talend Job and the EXASolution database.
File name Specify the path to the local file to be imported.
Remote file
This method works with a file that is accessible on a server through the following protocols: SCP,
SFTP, FTP, HTTP, or HTTPS.
Use predefined connection It is possible, via the SQL interface, to set up a named connection in the
EXASolution database itself. Select this option if you want to use such a
connection, and provide its name.
To know what connections are available, look at the table SYS.EXA_DBA_CO
NNECTIONS in the database.
The connection must contain a URL with one of the following protocols: SCP,
SFTP, FTP, HTTP, or HTTPS.
The URL must not contain the file name. The file name is always dynamic and
must be provided by the component configuration.
Remote file server URL Specify the URL to the file server, without the file name itself.
File name Specify the name of the file you want to fetch from the server.
Query parameters If the web service depends on query parameters, specify them here.
For example, if you want to get a file from an HDFS file system via the web
service, you need to add some additional parameters such as open=true.
886
tEXABulkExec
Use user authentication Select this check box if you want to use Basic Authentication when connecting to
the web server.
Remote user and Remote users password Enter the user name and password need to access the web server.
EXASol database
An EXASolution database can also serve as a remote source for the data. The source can be a table or
a specific query.
The username and password must by provided by the component and not as part
of the predefined connection.
EXASol database host Specify the host of the remote EXASolution database.
This field can also be used to access a cluster.
Use self defined query Select this check box if you want to use a specific query to get the data.
This method is preferred if, for example, your data needs to be filtered (using a
where condition), joined or converted.
Source query If you want to use a specific query, enter the query in this field.
Database or schema If you are not using a specific query, enter the schema name for the source table
in this field.
Source table If you are not using a specific query, enter the table name in this field.
The mapping between the source table columns and the target table columns
(schema columns) can be set in the advanced settings.
the source database.
Remote user and Remote users password Enter the user name and password needed to access the source database.
Oracle database
An Oracle database can also serve as remote source for the data. Access to an Oracle database
requires an Enterprise license for the EXASolution database and does not work with the free edition.
The source can be a table or a specific query.
Oracle database URL Specify the JDBC URL to the Oracle database.
887
tEXABulkExec
in this field.
JDBC-compliant database
The free edition of the EXASolution database supports MySQL and PostgreSQL databases, and others
are available in the Enterprise edition. The source can be table or a self defined query.
Nearly all enterprise-grade databases provide a JDBC interface.
JDBC database URL Specify the JDBC URL to the source database.
in this field.
888
tEXABulkExec
Importing data into an EXASolution database table from a

local CSV file
This scenario describes a Job that writes employee information into a CSV file, then loads the
data from this local file into a newly created EXASolution database table using the tEXABulkExec
component, and finally retrieves the data from the table and displays it on the console.

Procedure
workspace or dropping them from the Palette: a tFixedFlowInput component, a tFileOutputDel
imited component, a tEXABulkExec component, a tEXAInput component, and a tLogRow
component.
2. Connect the tFixedFlowInput component to the tFileOutputDelimited component using a Row >
Main connection.
3. Do the same to connect the tEXAInput component to the tLogRow component.
4. Connect the tFixedFlowInput component to the tEXABulkExec component using a Trigger > On
5. Do the same to connect the tEXABulkExec component to the tEXAInput component.

Preparing the source data
Procedure
1. Double-click the tFixedFlowInput component to open its Basic settings view.
889
tEXABulkExec
2. Click the [...] button next to Edit schema to open the Schema dialog box.
3. Click the [+] button to add six columns: EmployeeID of the Integer type, EmployeeName, OrgTeam
and JobTitle of the String type, OnboardDate of the Data type with the yyyy-MM-dd date pattern,
and MonthSalary of the Double type.
4. Click OK to close the dialog box and accept schema propagation to the next component.
890
tEXABulkExec
5. In the Mode area, select Use Inline Content (delimited file) and enter the following employee data
in the Content field.
12000;James;Dev Team;Developer;2008-01-01;15000.01
12001;Jimmy;Dev Team;Developer;2008-11-22;13000.11
12002;Herbert;QA Team;Tester;2008-05-12;12000.22
12003;Harry;Doc Team;Technical Writer;2009-03-10;12000.33
12004;Ronald;QA Team;Tester;2009-06-20;12500.44
12005;Mike;Dev Team;Developer;2009-10-15;14000.55
12006;Jack;QA Team;Tester;2009-03-25;13500.66
12007;Thomas;Dev Team;Developer;2010-02-20;16000.77
12008;Michael;Dev Team;Developer;2010-07-15;14000.88
12009;Peter;Doc Team;Technical Writer;2011-02-10;12500.99
6. Double-click the tFileOutputDelimited component to open its Basic settings view.
7. In the File Name field, specify the file into which the input data will be written. In this example, it
is "E:/employee.csv".
8. Click Advanced settings to open the Advanced settings view of the tFileOutputDelimited
component.
9. Select the Advanced separator (for numbers) check box and in the Thousands separator and
Decimal separator fields displayed, specify the separators for thousands and decimal. In this
example, the default values "," and "." are used.
Loading the source data into a newly created EXASolution database table
Procedure
1. Double-click the tEXABulkExec component to open its Basic settings view.
891
tEXABulkExec
2. Fill in the Host, Port, Schema, User and Password fields with your EXASolution database
connection details.
3. In the Table field, enter the name of the table into which the source data will be written. In this
example, the target database table is named "employee" and it does not exist.
4. Select Create table from the Action on table list to create the specified table.
5. In the Source area, select Local file as the source for the input data, and then specify the file that
contains the source data. In this example, it is "E:/employee.csv".
6. Click the [...] button next to Edit schema to open the Schema dialog box and define the schema,
which should be the same as that of the tFixedFlowInput component.
Then click OK to validate these changes and close the dialog box.
7. Click Advanced settings to open the Advanced settings view of the tEXABulkExec component.
8. In the Column Formats table, for the two numeric fields EmployeeID and MonthSalary, select the
corresponding check boxes in the Has Thousand Delimiters column, and then define their format
892
tEXABulkExec
model strings in the corresponding fields of the Alternative Format column. In this example,
"99G999" for EmployeeID and "99G999D99" for MonthSalary.
9. Make sure that the Thousands Separator and Decimal Separator fields have values identical to
those of the tFileOutputDelimited component and keep the default settings for the other options.
Retrieving data from the EXASolution database table
Procedure
1. Double-click the tEXAInput component to open its Basic settings view.
2. Fill in the Host name, Port, Schema name, Username and Password fields with your EXASolution
database connection details.
3. In the Table Name field, enter the name of the table from which the data will be retrieved. In this
example, it is "employee".
4. Click the [...] button next to Edit schema to open the Schema dialog box and define the schema,
which should be the same as that of the tFixedFlowInput component.
Then click OK to close the dialog box and accept schema propagation to the next component.
5. Click the Guess Query button to fill the Query field with the following auto-generated SQL
statement that will be executed on the specified table.
SELECT employee.EmployeeID,
employee.EmployeeName,
employee.OrgTeam,
employee.JobTitle,
employee.OnboardDate,
employee.MonthSalary
FROM employee
6. Double-click the tLogRow component to open its Basic settings view.
893
tEXABulkExec
7. In the Mode area, select the Table (print values in cells of a table) option for better readability of
the output.

Procedure
As shown above, the employee data is written into the specified EXASolution database table and
is then retrieved and displayed on the console.
894
tEXAClose
tEXAClose
Closes an active connection to an EXASolution database instance to release the occupied resources.
tEXAClose Standard properties

These properties are used to configure tEXAClose running in the Standard Job framework.
The Standard tEXAClose component belongs to the Databases family.
Basic settings
Component List Select the tEXAConnection component that opens the

connection you need to close from the list.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is more commonly used with

other EXASolution components, especially with the
tEXAConnection and tEXACommit components.
895
tEXAClose

Guide.
Related scenario
896
tEXACommit
tEXACommit
Validates the data processed through the Job into the connected EXASolution database.
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
tEXACommit Standard properties

These properties are used to configure tEXACommit running in the Standard Job framework.
The Standard tEXACommit component belongs to the Databases family.
Basic settings
Component List Select the tEXAConnection component for which you want
the commit action to be performed.
to close the database connection once the commit is
Warning:
tEXACommit to your Job, your data will be committed row
by row. In this case, do not select the Close Connection
Advanced settings
Global Variables

check box.
897
tEXACommit

use from it.
User Guide.
Usage

tEXAConnection and tEXARollback components.
Guide.
Related scenario
2426.
898
tEXAConnection
tEXAConnection
Opens a connection to an EXASolution database instance that can then be reused by other
EXASolution components.
tEXAConnection Standard properties

These properties are used to configure tEXAConnection running in the Standard Job framework.
The Standard tEXAConnection component belongs to the Databases and the ELT families.
Basic settings

that follow are completed automatically using the data
retrieved.
database cluster.

database cluster.
Schema Enter the name of the schema you want to use.
Username and Password Enter the user authentication data to access the
settings.
899
tEXAConnection

Advanced settings
commit component.
component.
Global Variables

check box.
use from it.
User Guide.
900
tEXAConnection
Usage

EXASolution components, especially with the tEXACommit
and tEXARollback components.
Related scenario
2426.
901
tEXAInput
tEXAInput
Retrieves data from an EXASolution database based on a query with a strictly defined order which
corresponds to the schema definition, and passes the data to the next component.
tEXAInput Standard properties

These properties are used to configure tEXAInput running in the Standard Job framework.
The Standard tEXAInput component belongs to the Databases family.
Basic settings

that follow are completed automatically using the
data retrieved.

connection.
Guide.
902
tEXAInput
Host name Enter the host or host list of the EXASol database servers.
database cluster.

database cluster.
Schema name Enter the name of the schema you want to use.
settings.

available:
only.
Table Name Enter the name of the table to be queried.
Query Type and Query Enter the database query, paying particularly attention to
schema definition.
Guess Query Click the button to generate the query that corresponds to
the table schema in the Query field.
Guess schema Click the button to retrieve the schema from the table.
903
tEXAInput
Advanced settings
Change fetch size Select this check box to change the fetch size which
specifies the amount of resultset data sent during one
single communication step with the database. In the Fetch
size field displayed, you need to enter the size in KB.
whitespaces from all the String/Char columns.
Trim column Select the check box in the Trim column to remove leading
and trailing whitespaces from the corresponding field.
This table is not available if the Trim all the String/Char
columns check box is selected.
Global Variables

check box.
use from it.
User Guide.
Usage

Job or subJob and it needs an output link.
904
tEXAInput

unusable.
Guide.

Related scenario
For a related scenario, see Importing data into an EXASolution database table from a local CSV file on
page 889.
905
tEXAOutput
tEXAOutput
Writes, updates, modifies or deletes data in an EXASolution database by executing the action
defined on the table and/or on the data in the table, based on the flow incoming from the preceding
component.
tEXAOutput Standard properties

These properties are used to configure tEXAOutput running in the Standard Job framework.
The Standard tEXAOutput component belongs to the Databases family.
Basic settings

data retrieved.

connection.
Guide.
906
tEXAOutput
database cluster.

database cluster.
settings.
Table Enter the name of the table to be written. Note that only on
e table can be written at a time.
operations:
created again.
• Create table: The table does not exist and gets
created.
does not exist.
• Drop table if exists and create: The table is removed if
it already exists and created again.
• Clear table: The table content is deleted.
• Truncate table: The table content is deleted. You don
found, Job stops.
• Update: Make changes to existing entries
• Insert or update: Insert a new record. If the record with
the given reference already exists, an update would be
made.
reference. If the record does not exist, a new record
would be inserted.
flow.
907
tEXAOutput
Warning:
which the Update and Delete operations are based. You
can do that by clicking Edit Schema and selecting the
settings view where you can simultaneously define
primary keys for the update and delete operations. To
do that: Select the Use field options check box and then
in the Key in update column, select the check boxes
next to the column name on which you want to base
the update operation. Do the same in the Key in delete
column for the deletion operation.
available:
only.
alend.com).
an error occurs.
Advanced settings
Use commit control Select this box to display the Commit every field in which
you can define the number of rows to be processed before
committing.
908
tEXAOutput
• Name: Enter the name of the column to be modified or
inserted.
• SQL expression: Enter the SQL expression to be
executed to modify or insert data in the corresponding
columns.
• Position: Select Before, After or Replace, depending on
the action to be carried out on the reference column.
• Reference column: Type in a column of reference that
can be used to place or replace the new or altered
column.
Use field options Select this check box to customize a request for the
corresponding column, particularly if multiple actions are
being carried out on the data.
• Key in update: Select the check box for the
corresponding column based on which the data is
updated.
• Key in delete: Select the check box for the
corresponding column based on which the data is
deleted.
• Updatable: Select the check box if the data in the
corresponding column can be updated.
• Insertable: Select the check box if the data in the
corresponding column can be inserted.
Use batch mode Select this check box to activate the batch mode for data
processing, and in the Batch Size field displayed enter the
number of records to be processed in each batch.
Global Variables

909
tEXAOutput

check box.
use from it.
User Guide.
Usage
a table in an EXASolution database. It also allows you to
create a reject flow using a Row > Rejects link to filter data
in error. For a related scenario, see Retrieving data in error
with a Reject link on page 2474.
unusable.
Guide.

910
tEXAOutput
Related scenario
911
tEXARollback
tEXARollback
Cancels the transaction commit in the connected EXASolution database.
It allows you to roll back any changes made in the EXASolution database to prevent partial
transaction commit if an error occurs.
tEXARollback Standard properties

These properties are used to configure tEXARollback running in the Standard Job framework.
The Standard tEXARollback component belongs to the Databases family.
Basic settings
Component List Select the tEXAConnection component for which you want
the rollback action to be performed.
to close the database connection once the rollback is
Advanced settings
Global Variables

check box.
use from it.
User Guide.
912
tEXARollback
Usage

tEXAConnection and tEXACommit components.
Guide.
Related Scenario
For a similar scenario using other database, see Rollback from inserting data in mother/daughter
913
tEXARow
tEXARow
Executes SQL queries on an EXASolution database.
Depending on the nature of the query and the database, tEXARow acts on the actual structure of the
database, or indeed the data, although without modifying them. The Row suffix indicates that it is
used to channel a flow in a Job although it does not produce any output data.
tEXARow Standard properties

These properties are used to configure tEXARow running in the Standard Job framework.
The Standard tEXARow component belongs to the Databases family.
Basic settings

data retrieved.
connection.
Guide.
914
tEXARow

database cluster.

database cluster.
settings.

available:
only.
Table Name Enter the name of the table to be processed.
Query Type Either Built-In or Repository .

• Built-In: Enter the query manually or with the help of
the SQLBuilder.
• Repository: Select the appropriate query from the
Repository. The Query field is then completed
automatically.
Guess Query Click the Guess Query button to generate the query that
corresponds to the table schema in the Query field.
Query Enter the database query paying particularly attention to

schema definition.
915
tEXARow
an error occurs.
Advanced settings
Propagate QUERY's recordset Select this check box to insert the query results in one of
the flow columns. Select the particular column from the use
column list.
Use PreparedStatement Select this check box to use prepared statements and in
the Set PreparedStatement Parameters table displayed,
add as many parameters as needed and set the following
attributes for each parameter:
• Parameter Index: enter the index of the prepared
statement parameter.
• Parameter Type: click in the cell and select the type of
the parameter from the list.
• Parameter Value: enter the value of the parameter.
Commit every Enter the number of rows to be included in each batch

before the data is written. This option guarantees the
quality of the transaction (although there is no rollback
option) and improves performance.
Global Variables
check box.
use from it.
User Guide.
916
tEXARow
Usage
Usage rule This component offers query flexibility as it covers all

possible SQL query requirements.

Related Scenario
• Procedure on page 622,
917
tEXistConnection
tEXistConnection
Opens a connection to an eXist database in order that a transaction may be carried out.
tEXistConnection Standard properties

These properties are used to configure tEXistConnection running in the Standard Job framework.
The Standard tEXistConnection component belongs to the Databases family.
Basic settings
URI URI of the database you want to connect to.
Collection Enter the path to the collection of interest on the database

server.
Driver This field is automatically populated with the standard

driver.
Note:
Users can enter a different driver, depending on their
needs.
Username and Password User authentication information.

settings.
Advanced settings
Usage
Usage rule This component is more commonly used with other tEXist*
components, especially with the tEXistGet and tEXistPut
components. If you set the connection properties in the
tEXistConnection component, you can reuse the connection
for other tEXist* components in the same Job.
eXist-db is an open source database management system
built using XML technology. It stores XML data according
to the XML data model and features efficient, index-based
XQuery processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension,
see XQuery update extension.
918
tEXistConnection

Related scenarios
For tEXistConnection related scenario, see tMysqlConnection on page 2425
919
tEXistDelete
tEXistDelete
Deletes specified resources from a remote eXist database.
tEXistDelete Standard properties

These properties are used to configure tEXistDelete running in the Standard Job framework.
The Standard tEXistDelete component belongs to the Databases family.
Basic settings
Use an existing connection/Component List Select this check box and in the Component List click the

server.

driver.
Note:
needs.

settings.
Target Type Either Resource, Collection, or All.
Files Click the plus button to add the lines you want to use as
filters:
Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions.
Advanced settings
at a job level as well as at each component level.
920
tEXistDelete
Global Variables
Global Variables NB_FILE: the number of files processed. This is an After

check box.
use from it.
User Guide.
Usage
Usage rule This component is typically used as a single component

subJob but can also be used as an output or end object.
XQuery processing.

Related scenarios
921
tEXistGet
tEXistGet
Retrieves selected resources from a remote eXist database to a defined local directory.
tEXistGet Standard properties

These properties are used to configure tEXistGet running in the Standard Job framework.
The Standard tEXistGet component belongs to the Databases family.
Basic settings

server.

driver.
Note:
needs.

settings.
Local directory Path to the file's destination location.
Files Click the plus button to add the lines you want to use as
filters:
wildcharacters (*) or regular expressions
Advanced settings
922
tEXistGet
Global Variables

check box.
use from it.
User Guide.
Usage

XQuery processing.

Retrieving resources from a remote eXist DB server

This is a single-component Job that retrieves data from a remote eXist DB server and download the
data to a defined local directory.
This simple Job requires one component: tEXistGet.
923
tEXistGet
Procedure
Procedure
1. Drop the tEXistGet component from the Palette into the design workspace.
2. Double-click the tEXistGet component to open the Component view and define the properties in
its Basic settings view.
3. Fill in the URI field with the URI of the eXist database you want to connect to.
In this scenario, the URI is xmldb:exist://192.168.0.165:8080/exist/xmlrpc. Note that the URI used in
this use case is for demonstration purposes only and is not an active address.
4. Fill in the Collection field with the path to the collection of interest on the database server, /db/
talend in this scenario.
5. Fill in the Driver field with the driver for the XML database, org.exist.xmldb.DatabaseImpl in this
scenario.
6. Fill in the Username and Password fields by typing in admin and talend respectively in this
scenario.
7. Click the three-dot button next to the Local directory field to set a path for saving the XML file
downloaded from the remote database server.
In this scenario, set the path to your desktop, for example C:/Documents and Settings/galano/Des
ktop/ExistGet.
8. In the Files field, click the plus button to add a new line in the Filemask area, and fill it with a
complete file name to retrieve data from a particular file on the server, or a filemask to retrieve
data from a set of files. In this scenario, fill in dictionary_en.xml.
924
tEXistGet
The XML file dictionary_en.xml is retrieved and downloaded to the defined local directory.
925
tEXistList
tEXistList
Lists the resources stored on a remote eXist database.
tEXistList Standard properties

These properties are used to configure tEXistList running in the Standard Job framework.
The Standard tEXistList component belongs to the Databases family.
Basic settings

server.

driver.
Note:
needs.
Username and Password Server authentication information.

settings.
Files Click the plus button to add the lines you want to use as fil
ters:.
Target Type Either Resource, Collection or All contents:
Advanced settings
926
tEXistList
Global Variables
Global Variables NB_FILE: the number of files iterated upon. This is an After
variable, and it returns an integer.
CURRENT_FILE: the current file name. This is a Flow
CURRENT_FILEPATH: the current file path. This is a Flow
check box.
use from it.
User Guide.
Usage
Usage rule This component is typically used along with a

tEXistGetcomponent to retrieve the files listed, for example.
XQuery processing.

Related scenario
For a related scenario, see Listing and getting files/folders on an FTP directory on page 1230.
927
tEXistPut
tEXistPut
Uploads specified files from a defined local directory to a remote eXist database.
tEXistPut Standard properties

These properties are used to configure tEXistPut running in the Standard Job framework.
The Standard tEXistPut component belongs to the Databases family.
Basic settings
Collection Enter a path to indicate where the resource is to be saved

on the server.

driver.
Note:
needs.

settings.
Local directory Path to the source location of the file(s).
Files Click the plus button to add the lines you want to use as fil
ters:.
Advanced settings
928
tEXistPut
Global Variables

check box.
use from it.
User Guide.
Usage

XQuery processing.
For further information about XQuery, see http://exist-
db.org/exist/apps/doc/documentation.xml.
see http://exist-db.org/exist/apps/doc/update_ext.xml.

Related scenarios
929
tEXistXQuery
tEXistXQuery
Queries XML files located on remote databases using local files containing XPath queries and outputs
the results to an XML file stored locally.
tEXistXQuery Standard properties

These properties are used to configure tEXistXQuery running in the Standard Job framework.
The Standard tEXistXQuery component belongs to the Databases family.
Basic settings
Collection Enter the path to the XML file location on the database.

driver.
Note:
needs.
Username and Password DB server authentication information.

settings.
XQuery Input File Browse to the local file containing the query to be executed.
Local Output Browse to the directory in which the query results should be
saved.
Advanced settings
Global Variables

930
tEXistXQuery

check box.
use from it.
User Guide.
Usage
Usage rule This component is typically used as a single component Job

but can also be used as part of a more complex Job.
XQuery processing.

Related scenarios
931
tEXistXUpdate
tEXistXUpdate
Processes XML file records and updates the existing records on the database server.
tEXistXUpdate Standard properties

These properties are used to configure tEXistXUpdate running in the Standard Job framework.
The Standard tEXistXUpdate component belongs to the Databases family.
Basic settings
Collection Enter the path to the collection and file of interest on the da
tabase server.

driver.
Note:
needs.
Username and Password DB server authentication information.

settings.
Update File Browse to the local file in the local directory to be used to
update the records on the database.
Advanced settings
Global Variables

932
tEXistXUpdate

check box.
use from it.
User Guide.
Usage
Usage rule This component is typically used as a single component Job

but can also be used as part of a more complex Job.
XQuery processing.

Related scenarios
933
tExternalSortRow
tExternalSortRow
Sorts input data based on one or several columns, by sort type and order, using an external sort
application.
tExternalSortRow Standard properties

These properties are used to configure tExternalSortRow running in the Standard Job framework.
The Standard tExternalSortRow component belongs to the Processing family.
Basic settings
in the Repository.
available:
only.

Guide.

File Name Name or path to the file to be processed and/or the variable
to be used.
Field separator Character, string or regular expression to separate fields.
External command "sort" path Enter the path to the external file containing the sorting
algorithm to use.
934
tExternalSortRow
Criteria Click the plus button to add as many lines as required for
the sort to be complete. By default the first column defined
in your schema is selected.
Schema column: Select the column label from your schema,

which the sort will be based on. Note that the order is
essential as it determines the sorting priority.
Sort type: Numerical and Alphabetical order are proposed.

More sorting types to come.
Order: Ascending or descending order.
Advanced settings
Maximum memory Type in the size of physical memory you want to allocate to
sort processing.
Temporary directory Specify the temporary directory to process the sorting

command.
Set temporary input file directory Select the check box to activate the field in which you can
specify the directory to handle your temporary input file.
Add a dummy EOF line Select this check box when using the tAggregateSortedRow
component.
Global Variables

check box.
use from it.
User Guide.
Usage

935
tExternalSortRow
Related scenario
For related use case, see tSortRow on page 3465.
936
tExtractDelimitedFields
Generates multiple columns from a delimited string column.
The extracted fields are written in new columns of the output schema. If you need to keep the original
columns in the output of this component, define these columns in the output schema using the same
column names as the original ones.
tExtractDelimitedFields Standard properties

These properties are used to configure tExtractDelimitedFields running in the Standard Job
framework.
The Standard tExtractDelimitedFields component belongs to the Processing family.
Basic settings
Field to split Select an incoming field from the Field to split list to split.
Ignore NULL as the source data Select this check box to ignore the Null value in the source
data.
Clear this check box to generate the Null records that
correspond to the Null value in the source data.
Field separator Enter character, string or regular expression to separate

Note:
Since this component uses regex to split a filed and the
regex syntax uses special characters as operators, make
sure to precede the regex operator you use as a field
separator by a double backslash. For example, you have
to use "\\|" instead of "|".
available:
only.
937

component only.

Job designs.
Advanced settings
Advanced separator (for number) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Trim column Select this check box to remove leading and trailing
whitespace from all columns.
Check each row structure against schema Select this check box to check whether the total number
of columns in each row is consistent with the schema. If
not consistent, an error message will be displayed on the
console.
Validate date Select this check box to check the date format strictly
against the input schema.
Global Variables

check box.
use from it.
User Guide.
938
Usage

input and output components. It allows you to extract data
from a delimited field, using a Row > Main link, and enables
you to create a reject flow filtering data which type does not
match the defined type.
Extracting a delimited string column of a database table

This scenario describes a Job that writes data including a delimited string column into a MySQL
database table and displays the data on the console, then extracts the delimited string column into
multiple columns and displays the data after extraction on the console.

Procedure
workspace or dropping them from the Palette: a tFixedFlowInput component, a tMysqlOutput
component, a tMysqlInput component, a tExtractDelimitedFields component, two tLogRow
components.
3. Do the same to link tMysqlOutput to the first tLogRow, link tMysqlInput to tExtractDelimi
tedFields, link tExtractDelimitedFields to the second tLogRow.
4. Link tFixedFlowInput to tMysqlInput using a Trigger > On Subjob Ok connection.

Populating data in a MySQL database table
Procedure
939
three columns: Id of Integer type, and Name and DelimitedField of String type.
box.
3. In the Mode area, select Use Inline Content(delimited file). Then in the Content field displayed,
enter the data to write to the database. This input data includes a delimited string column. In this
example, the input data is as follows:
1;Adam;32,Component Team,Developer
2;Bill;28,Component Team,Tester
3;Chris;30,Doc Team,Writer
4;David;35,Doc Team,Leader
5;Eddie;33,QA Team,Tester
4. Double-click tMysqlOutput to open its Basic settings view.
940
5. Fill the Host, Port, Database, Username, Password fields with the MySQL database connection
information.
6. Fill the Table field with the name of the table to be written. In this example, it is employee.
7. Select Drop table if exists and create from the Action on table list.
8. Double-click the first tLogRow to open its Basic settings view.
In the Mode area, select Table (print values in cells of a table) for better readability of the result.
Extracting the delimited string column in the database table into multiple columns
Procedure
1. Double-click tMysqlInput to open its Basic settings view.
941
2. Fill the Host, Port, Database, Username, Password fields with the MySQL database connection
information.
3. Click the [...] button next to Edit schema and in the pop-up window define the schema of the
tMysqlInput component same as the schema of the tMysqlOutput component.
4. In the Table Name field, enter the name of the table into which the data was written. In this
example, it is employee.
5. Click the Guess Query button to fill the Query field with the SQL query statement to be executed
on the specified table. In this example, it is as follows:
SELECT
èmployee`.Ìd`,
èmployee`.`Name`,
èmployee`.`DelimitedField`
FROM èmployee`
6. Double-click tExtractDelimitedFields to open its Basic settings view.
942
7. In the Field to split list, select the delimited string column to be extracted. In this example, it is
DelimitedField.
In the Field separator, enter the separator used to separate the fields in the delimited string
column. In this example, it is ,.
five columns: Id of Integer type, and Name, Age, Team, Title of String type.
In this example, the delimited string column DelimitedField is split into three columns Age, Team
and Title, and the Id and Name columns are kept as well.
box.
9. Double-click the second tLogRow to open its Basic settings view.

Procedure
943
As shown above, the primitive input data and the data after extraction are displayed on the
console, and the delimited string column DelimitedField is extracted into three columns Age, Team,
and Title.
944
tExtractJSONFields
tExtractJSONFields
Extracts the desired data from JSON fields based on the JSONPath or XPath query.
tExtractJSONFields Standard properties

These properties are used to configure tExtractJSONFields running in the Standard Job framework.
The Standard tExtractJSONFields component belongs to the Processing family.
Basic settings

are stored.

becomes built-in.

only.
Read By Select a way of extracting JSON data in the file.

• JsonPath: Extracts JSON data based on the JSONPath
query. With this option selected, you need to select a
JSONPath API version from the API version drop-down
list. It is recommended to read data by JSONPath in
order to gain better performance.
• Xpath: Extracts JSON data based on the XPath query.
JSON field List of the JSON fields to be extracted.
945
tExtractJSONFields
Loop Jasonpath query Enter the path pointing to the node within the JSON field,
on which the loop is based.
Note if you have selected Xpath from the Read by drop-
down list, the Loop Xpath query field is displayed instead.
Mapping Complete this table to map the columns defined in the

schema to the corresponding JSON nodes.
• Column: The Column cells are automatically filled with
the defined schema column names.
• Json query/JSONPath query: Specify the JSONPath
node that holds the desired data. For more information
about JSONPath expressions, see http://goessner.net/
articles/JsonPath/.
This column is available only when JsonPath is
selected from the Read By list.
• XPath query: Specify the XPath node that holds the
desired data.
This column is available only when Xpath is selected
from the Read By list.
• Get Nodes: Select this check box to extract the JSON
data of all the nodes or select the check box next to a
specific node to extract the data of that node.
• Is Array: select this check box when the JSON field to
be extracted is an array instead of an object.
an error occurs.
Advanced settings
Use the loop node as root Select this check box to use the loop node as the root for
querying the file.
The loop node is set in the Loop Json query text frame in
the Basic Settings view. If this option is checked, only the
child elements of the loop node are available for querying;
otherwise, both the parent elements and the child elements
of the loop node can be queried. You can specify a parent
element through JSON path syntax.
This check box is available only when JsonPath is selected
in the Read By drop-down list of the Basic settings view.
docs.oracle.com.
946
tExtractJSONFields
Global Variables


This variable functions only if the Die on error check box is
selected.
Usage
Usage rule This component is an intermediate component. It needs an

input and an output components.

Retrieving error messages while extracting data from JSON

fields
In this scenario, tWriteJSONField wraps the incoming data into JSON fields, data of which is then
extracted by tExtractJSONFields. Meanwhile, the error messages generated due to extraction failure,
which include the concerned JSON fields and errors, are retrieved via a Row > Reject link.

Procedure
1. Drop the following components from the Palette onto the design workspace: tFixedFlowInpu
t, tWriteJSONField, tExtractJSONFields, and tLogRow (X2). The two tLogRow components are
renamed as data_extracted and reject_info.
2. Link tFixedFlowInput and tWriteJSONField using a Row > Main connection.
3. Link tWriteJSONField and tExtractJSONFields using a Row > Main connection.
4. Link tExtractJSONFields and data_extracted using a Row > Main connection.
5. Link tExtractJSONFields and reject_info using a Row > Reject connection.
947
tExtractJSONFields

Setting up the tFixedFlowInput
Procedure
1. Double-click tFixedFlowInput to display its Basic settings view.
2. Click Edit schema to open the schema editor.
948
tExtractJSONFields
Click the [+] button to add three columns, namely firstname, lastname and dept, with the type of
string.
3. Select Use Inline Content and enter the data below in the Content box:
Andrew;Wallace;Doc
John;Smith;R&D
Christian;Dior;Sales
Setting up the tWriteJSONField
Procedure
1. Click tWriteJSONField to display its Basic settings view.
2. Click Configure JSON Tree to open the XML tree editor.
The schema of tFixedFlowInput appears in the Linker source panel.

3. In the Linker target panel, click the default rootTag and type in staff, which is the root node of the
JSON field to be generated.
4. Right-click staff and select Add Sub-element from the context menu.
5. In the pop-up box, enter the sub-node name, namely firstname.
949
tExtractJSONFields
Repeat the steps to add two more sub-nodes, namely lastname and dept.
6. Right-click firstname and select Set As Loop Element from the context menu.
7. Drop firstname from the Linker source panel to its counterpart in the Linker target panel.
In the pop-up dialog box, select Add linker to target node.
Click OK to close the dialog box.

8. Repeat the steps to link the two other items.
Click OK to close the XML tree editor.
10. Click the [+] button in the right panel to add one column, namely staff, which will hold the JSON
data generated.
950
tExtractJSONFields
Setting up the tExtractJSONFields
Procedure
1. Double-click tExtractJSONFields to display its Basic settings view.
3. Click the [+] button in the right panel to add three columns, namely firstname, lastname and dept,
which will hold the data of their counterpart nodes in the JSON field staff.
4. In the pop-up Propagate box, click Yes to propagate the schema to the subsequent components.
951
tExtractJSONFields
5. In the Loop XPath query field, enter "/staff", which is the root node of the JSON data.
6. In the Mapping area, type in the node name of the JSON data under the XPath query part. The
data of those nodes will be extracted and passed to their counterpart columns defined in the
output schema.
7. Specifically, define the XPath query "firstname" for the column firstname, "lastname" for the column
lastname, and "" for the column dept. Note that "" is not a valid XPath query and will lead to
execution errors.
Setting up the tLogRow components
Procedure
1. Double-click data_extracted to display its Basic settings view.
2. Select Table (print values in cells of a table) for a better display of the results.
3. Perform the same setup on the other tLogRow component, namely reject_info.
Executing the Job

Procedure
2. Click F6 to execute the Job.
As shown above, the reject row offers such details as the data extracted, the JSON fields whose
data is not extracted and the cause of the extraction failure.
Collecting data from your favorite online social network

In this scenario, tFileInputJSON retrieves the friends node from a JSON file that contains the data of a
Facebook user and tExtractJSONFields extracts the data from the friends node for flat data output.
952
tExtractJSONFields

Procedure
1. Drop the following components from the Palette onto the design workspace: tFileInputJSON,
tExtractJSONFields and tLogRow.
2. Link tFileInputJSON and tExtractJSONFields using a Row > Main connection.
3. Link tExtractJSONFields and tLogRow using a Row > Main connection.

Procedure
1. Double-click tFileInputJSON to display its Basic settings view.
953
tExtractJSONFields
Click the [+] button to add one column, namely friends, of the String type.
3. Click the [...] button to browse for the JSON file, facebook.json in this case:
{ "user": { "id": "9999912398",

"name": "Kelly Clarkson",
"friends": [
{ "name": "Tom Cruise",
"id": "55555555555555",
"likes": {
"data": [
{ "category": "Movie",
"name": "The Shawshank Redemption",
"id": "103636093053996",
"created_time": "2012-11-20T15:52:07+0000"
},
{ "category": "Community",
"name": "Positiveretribution",
"id": "471389562899413",
"created_time": "2012-12-16T21:13:26+0000"
}
]
}
},
{ "name": "Tom Hanks",
"id": "88888888888888"
"likes": {
"data": [
{ "category": "Journalist",
"name": "Janelle Wang",
"id": "136009823148851",
"created_time": "2013-01-01T08:22:17+0000"
},
{ "category": "Tv show",
"name": "Now With Alex Wagner",
"id": "305948749433410",
"created_time": "2012-11-20T06:14:10+0000"
}
]
}
}
]
}
}
4. Clear the Read by XPath check box.

In the Mapping table, enter the JSONPath query "$.user.friends[*]" next to the friends column,
retrieving the entire friends node from the source file.
5. Double-click tExtractJSONFields to display its Basic settings view.
954
tExtractJSONFields
7. Click the [+] button in the right panel to add five columns, namely id, name, like_id, like_name and
like_category, which will hold the data of relevant nodes in the JSON field friends.
8. In the pop-up Propagate box, click Yes to propagate the schema to the subsequent components.
9. In the Loop XPath query field, enter "/likes/data".
955
tExtractJSONFields
10. In the Mapping area, type in the queries of the JSON nodes in the XPath query column. The data
of those nodes will be extracted and passed to their counterpart columns defined in the output
schema.
11. Specifically, define the XPath query "../../id" (querying the "/friends/id" node) for the column id, "../../
name" (querying the "/friends/name" node) for the column name, "id" for the column like_id, "name"
for the column like_name, and "category" for the column like_category.
12. Double-click tLogRow to display its Basic settings view.
13. Select Table (print values in cells of a table) for a better display of the results.
Executing the Job

Procedure
As shown above, the friends data of the Facebook user Kelly Clarkson is extracted correctly.
Extracting data from a JSON file through looping

This scenario describes a Job that extracts data from a JSON file through multiple loops and displays
the data on the console.
956
tExtractJSONFields
The following lists the content of the JSON file, sample.json.
{
"Guid": "a2hdge9-5517-4e12-b9j6-887ft29e1711",
"Transactions": [
{
"TransactionId": 1,
"Products": [
{
"ProductId": "A1",
"Packs": [
{
"Quantity": 20,
"Price": 40.00,
"Due_Date": "2019/03/01"
}
]
}
]
},
{
"TransactionId": 2,
"Products": [
{
"ProductId": "B1",
"Packs": [
{
"Quantity": 1,
"Price": 15.00,
"Due_Date": "2019/01/01"
},
{
"Quantity": 21,
"Price": 315.00,
"Due_Date": "2019/02/14"
}
]
}
]
},
{
"TransactionId": 3,
"Products": [
{
"ProductId": "C1",
"Packs": [
{
"Quantity": 2,
"Price": 5.00,
"Due_Date": "2019/02/19"
},
{
"Quantity": 3,
"Price": 7.50,
"Due_Date": "2019/05/21"
}
]
}
]
}
]
}
This Job extracts the values of the following elements.

• Guid
• TransactionId
• ProductId
• Quantity
• Price
• Due-Date
957
tExtractJSONFields
Establishing the tExtractJSONFields looping Job
Procedure
1. Create a Job and add a tFileInputJSON component, three tExtractJsonFields components, and a
tLogRow component.
2. Connect the components using Row > Main connections.
Configuring tExtractJSONFields looping input
About this task

This task assumes that you know the structure of the JSON file.
Procedure
1. In the Basic settings view of the tFileInputJSON component, select JsonPath from the Read By
drop-down list.
2. In the filename field, specified the input JSON file, sample.json in this example.
3. In the schema editor, add two columns, Guid (type String) and Transactions (type Object).
958
tExtractJSONFields
4. Click Yes in the subsequent dialog box to propagate the schema to the next component.
The columns just added appear in the Mapping table of the Basic settings view.
5. In the Basic settings view, enter "$" in the Loop Json query text box to loop the elements within
the root elements.
6. In the Json query column of the Mapping table, enter the following Json query expressions in
• $.Guid to extract the value of the Guid element;
• $.Transactions to extract the content of the Transactions element.
Configuring the tExtractJSONFields components for looping
Procedure
1. In the schema editor of the first tExtractJSONFileds component, add the following columns in the
output table.
• Guid, type String;
• TransactionId, type Integer;
• Products, type Object
2. Close the schema editor and click Yes in the subsequent dialog box to propagate the schema to
the next component.
959
tExtractJSONFields
3. Set the other options in the Basic settings view as follows.

• JSON field: Transactions;
• Loop Jsonpath query: "*" (in double quotation marks);
• Guid: empty, for receiving the Guid value from the previous component;
• TransactionId: "TransactionId" (in double quotation marks);
• Products: "Products" (in double quotation marks);
• Others: unchanged
The settings loop all the elements within the Transactions element and extract the values of the
TransactionId and the Products elements.
4. In the schema editor of the second tExtractJSONFileds component, add the following columns in
the output table.
• ProductId, type String;
• Packs, type Object
the next component.
• JSON field: Products;
• TransactionId: empty, for receiving the TransactionId from the previous component;
• ProductId: "ProductId" (in double quotation marks);
• Packs: "Packs" (in double quotation marks);
The settings in the above figure loop all the elements within the Products element and extract
the values of the ProductId and the Packs elements.
960
tExtractJSONFields
7. In the schema editor of the third tExtractJSONFileds component, add the following columns in the
output table.
• ProductId, type String;
• Quantity, type Integer;
• Price, type Float;
• Due_Date, type Date
the next component.
• JSON field: Packs;
• TransactionId: empty, for receiving the TransactionId value from the previous component;
• ProductId: empty, for receiving the ProductId value from the previous component;
• Quantity: "Quantity" (in double quotation marks);
• Price: "Price" (in double quotation marks);
• Due_Date: "Due_Date" (in double quotation marks);
The settings in the above figure loop all the elements within the Packs element and extract the
values of the Quantity, the Price, and the Due_Date elements.
Setting the display for tExtractJSONFields values
Procedure
1. Open the Basic settings view of the tLogRow component.
2. Select the preferred option in the Mode section.
961
tExtractJSONFields
Executing tExtractJSONFields loop Job
Procedure
2. Press F6 to execute the Job. The following figure shows the result.
The values of the Guid element, the TransactionId element, the ProductId element, the Quantity
element, the Price element, and the Due_date element are extracted from the source JSON file
and displayed.
962
tExtractPositionalFields
Extracts data and generates multiple columns from a formatted string using positional fields.
tExtractPositionalFields generates multiple columns from one column using positional fields.
tExtractPositionalFields Standard properties

These properties are used to configure tExtractPositionalFields running in the Standard Job
framework.
The Standard tExtractPositionalFields component belongs to the Processing family.
Basic settings
Field Select an incoming field from the Field list to extract.
Ignore NULL as the source data Select this check box to ignore the Null value in the source
data.
Clear this check box to generate the Null records that co
rrespond to the Null value in the source data.
Customize Select this check box to customize the data format of the
positional file and define the table columns:
Column: Select the column you want to customize.
Size: Enter the column size.
Padding char: Type in between inverted commas the
padding character used, in order for it to be removed from
the field. A space by default.
Alignment: Select the appropriate alignment parameter.
Pattern Enter the pattern to use as basis for the extraction.

A pattern is length values separated by commas,
interpreted as a string between quotes. Make sure the
values entered in this fields are consistent with the schema
defined.
available:
only.
963

component only.

Job designs.
Advanced settings
Trim Column Select this check box to remove leading and trailing
whitespace from all columns.
console.
Global Variables

check box.
The NB_LINE variable is not available to the Map/Reduce
version.
use from it.
User Guide.
964
Usage

Related scenario
For a related scenario, see Extracting name, domain and TLD from e-mail addresses on page 967.
965
tExtractRegexFields
tExtractRegexFields
Extracts data and generates multiple columns from a formatted string using regex matching.
tExtractRegexFields Standard properties

These properties are used to configure tExtractRegexFields running in the Standard Job framework.
The Standard tExtractRegexFields component belongs to the Data Quality and the Processing
families.
Basic settings
Field to split Select an incoming field from the Field to split list to split.
Regex Enter a regular expression according to the programming

language you are using.
available:
only.
Warning:
Make sure that the output schema does not contain any
column with the same name as the input column to be
split. Otherwise, the regular expression will not work as
expected.
component only.

Job designs.
966
tExtractRegexFields
Advanced settings
an error occurs.
console.
Global Variables

check box.
use from it.
User Guide.
Usage

Extracting name, domain and TLD from e-mail addresses

This scenario describes a three-component Job where tExtractRegexFields is used to specify a
regular expression that corresponds to one column in the input data, email. The tExtractRegexF
ields component is used to perform the actual regular expression matching. This regular expression
includes field identifiers for user name, domain name and Top-Level Domain (TLD) name portions in
each e-mail address. If the given e-mail address is valid, the name, domain and TLD are extracted and
displayed on the console in three separate columns. Data in the other two input columns, id and age is
extracted and routed to destination as well.
967
tExtractRegexFields
Setting up the Job

Procedure
1. Drop the following components from the Palette onto the design workspace: tFileInputDelimited,
tExtractRegexFields, and tLogRow.
2. Connect tFileInputDelimited to tExtractRegexFields using a Row > Main link, and do the same to
connect tExtractRegexFields to tLogRow.

Procedure
1. Double-click the tFileInputDelimited component to open its Basic settings view in the Component
tab.
2. Click the [...] button next to the File name/Stream field to browse to the file where you want to
extract information from.
The input file used in this scenario is called test4. It is a text file that holds three columns: id,
email, and age.
id;email;age
1;[email protected];24
For more information, see tFileInputDelimited on page 1015.

3. Click Edit schema to define the data structure of this input file.
4. Double-click the tExtractRegexFields component to open its Basic settings view.
968
tExtractRegexFields
5. Select the column to split from the Field to split list: email in this scenario.
6. Enter the regular expression you want to use to perform data matching in the Regex panel. In
this scenario, the regular expression "([a-z]*)@([a-z]*).([a-z]*)" is used to match the
three parts of an email address: user name, domain name and TLD name.
For more information about the regular expression, see http://en.wikipedia.org/wiki/
Regular_expression.
7. Click Edit schema to open the Schema of tExtractRegexFields dialog box, and click the plus button
to add five columns for the output schema.
In this scenario, we want to split the input email column into three columns in the output flow,
name, domain, and tld. The two other input columns will be extracted as they are.
8. Double-click the tLogRow component to open its Component view.

9. In the Mode area, select Table (print values in cells of a table).

Procedure
969
tExtractRegexFields
Results
The tExtractRegexFields component matches all given e-mail addresses with the defined regular
expression and extracts the name, domain, and TLD names and displays them on the console in three
separate columns. The two other columns, id and age, are extracted as they are.
970
tExtractXMLField
tExtractXMLField
Reads the XML structured data from an XML field and sends the data as defined in the schema to the
following component.
tExtractXMLField Standard properties

These properties are used to configure tExtractXMLField running in the Standard Job framework.
The Standard tExtractXMLField component belongs to the Processing and the XML families.
Basic settings

available:
only.

are stored.
When this file is selected, the fields that follow are pre-
filled in using fetched data.
Schema type and Edit Schema A schema is a row description. It defines the number of
component only.

Job designs.
XML field Name of the XML field to be processed.

Related topic: see Talend Studio User Guide.
Loop XPath query Node of the XML tree, which the loop is based on.
971
tExtractXMLField
Mapping Column: reflects the schema as defined by the Schema type

field.
XPath Query: Enter the fields to be extracted from the
structured input.
Get nodes: Select this check box to recuperate the XML
content of all current nodes specified in the Xpath query
list or select the check box next to specific XML nodes to
recuperate only the content of the selected nodes.
Limit Maximum number of rows to be processed. If Limit is 0, no

rows are read or processed.
an error occurs.
Advanced settings
Ignore the namespaces Select this check box to ignore namespaces when reading
and extracting the XML data.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is an intermediate component. It needs an

input and an output components.
972
tExtractXMLField
Extracting XML data from a field in a database table

This three-component scenario allows to read the XML structure included in the fields of a database
table and then extracts the data.
Procedure
Procedure
1. Drop the following components from the Palette onto the design workspace: tMysqlInput,
tExtractXMLField, and tFileOutputDelimited.
Connect the three components using Main links.
2. Double-click tMysqlInput to display its Basic settings view and define its properties.
3. If you have already stored the input schema in the Repository tree view, select Repository first
from the Property Type list and then from the Schema list to display the Repository Content
dialog box where you can select the relevant metadata.
For more information about storing schema metadata in the Repository tree view, see Talend
Studio User Guide.
If you have not stored the input schema locally, select Built-in in the Property Type and Schema
fields and enter the database connection and the data structure information manually. For more
information about tMysqlInput properties, see tMysqlInput on page 2437.
4. In the Table Name field, enter the name of the table holding the XML data, customerdetails in this
example.
Click Guess Query to display the query corresponding to your schema.
5. Double-click tExtractXMLField to display its Basic settings view and define its properties.
973
tExtractXMLField
6. Click Sync columns to retrieve the schema from the preceding component. You can click the
three-dot button next to Edit schema to view/modify the schema.
The Column field in the Mapping table will be automatically populated with the defined schema.
7. In the Xml field list, select the column from which you want to extract the XML data. In this
example, the filed holding the XML data is called CustomerDetails.
In the Loop XPath query field, enter the node of the XML tree on which to loop to retrieve data.
In the Xpath query column, enter between inverted commas the node of the XML field holding the
data you want to extract, CustomerName in this example.
8. Double-click tFileOutputDelimited to display its Basic settings view and define its properties.
9. In the File Name field, define or browse to the path of the output file you want to write the
extracted data in.
Click Sync columns to retrieve the schema from the preceding component. If needed, click the
three-dot button next to Edit schema to view the schema.
10. Save your Job and click F6 to execute it.
974
tExtractXMLField
Results
tExtractXMLField read and extracted the clients names under the node CustomerName of the
CustomerDetails field of the defined database table.
Extracting correct and erroneous data from an XML field in

a delimited file
This scenario describes a four-component Job that reads an XML structure from a delimited file,
outputs the main data and rejects the erroneous data.
Procedure
Procedure
1. Drop the following components from the Palette to the design workspace: tFileInputDelimited,
tExtractXMLField, tFileOutputDelimited and tLogRow.
Connect the first three components using Row Main links.
Connect tExtractXMLField to tLogRow using a Row Reject link.
2. Double-click tFileInputDelimited to open its Basic settings view and define the component
properties.
975
tExtractXMLField
3. Select Built-in in the Schema list and fill in the file metadata manually in the corresponding
fields.
Click the three-dot button next to Edit schema to display a dialog box where you can define the
structure of your data.
Click the plus button to add as many columns as needed to your data structure. In this example,
we have one column in the schema: xmlStr.
Click OK to validate your changes and close the dialog box.
Note:
If you have already stored the schema in the Metadata folder under File delimited, select
Repository from the Schema list and click the three-dot button next to the field to display the
Repository Content dialog box where you can select the relevant schema from the list. Click Ok
to close the dialog box and have the fields automatically filled in with the schema metadata.
For more information about storing schema metadata in the Repository tree view, see Talend
Studio User Guide.
4. In the File Name field, click the three-dot button and browse to the input delimited file you want
to process, CustomerDetails_Error in this example.
This delimited file holds a number of simple XML lines separated by double carriage return.
Set the row and field separators used in the input file in the corresponding fields, double carriage
return for the first and nothing for the second in this example.
If needed, set Header, Footer and Limit. None is used in this example.
5. In the design workspace, double-click tExtractXMLField to display its Basic settings view and
define the component properties.
976
tExtractXMLField
6. Click Sync columns to retrieve the schema from the preceding component. You can click the
three-dot button next to Edit schema to view/modify the schema.
The Column field in the Mapping table will be automatically populated with the defined schema.
7. In the Xml field list, select the column from which you want to extract the XML data. In this
example, the filed holding the XML data is called xmlStr.
In the Loop XPath query field, enter the node of the XML tree on which to loop to retrieve data.
8. In the design workspace, double-click tFileOutputDelimited to open its Basic settings view and
display the component properties.
9. In the File Name field, define or browse to the output file you want to write the correct data in,
CustomerNames_right.csv in this example.
Click Sync columns to retrieve the schema of the preceding component. You can click the three-
dot button next to Edit schema to view/modify the schema.
10. In the design workspace, double-click tLogRow to display its Basic settings view and define the
component properties.
Click Sync Columns to retrieve the schema of the preceding component. For more information on
this component, see tLogRow on page 1977.
977
tExtractXMLField
Results
tExtractXMLField reads and extracts in the output delimited file, CustomerNames_right, the client
information for which the XML structure is correct, and displays as well erroneous data on the console
of the Run view.
978
tFileArchive
tFileArchive
Creates a new zip, gzip, or tar.gz archive file from one or more files or folders.
The archive file can be compressed using different compression method.
tFileArchive Standard properties

These properties are used to configure tFileArchive running in the Standard Job framework.
The Standard tFileArchive component belongs to the File family.
Basic settings
Directory Specify the directory that contains the files to be added to

the archive file.
This field is available when zip or tar.gz is selected from the
Archive format list.
Warning: Use absolute path (instead of relative path) for

this field to avoid possible errors.
Subdirectories Select this check box if you want to add the files in the
subdirectories to the archive file.
This field is available only when zip is selected from the
Source File Specify the path to the file that you want to add to the
archive file.
This field is available only when gzip is selected from the
Archive file Specify the path to the archive file to be created.

Create directory if does not exist Select this check box to create the destination folder if it
does not exist.
Archive format Select an archive file format from the list: zip, gzip, or tar.gz.
Compress level Select the compression level you want to apply.

• Best: the compression quality will be optimum, but the
compression time will be long.
• Normal: the compression quality and time will be
average.
• Fast (no compression): the compression will be fast, but
the quality will be lower.
All files Select this check box if all files in the specified directory
will be added to the archive file. Clear it to specify the file(s)
you want to add to the archive file in the Files table.
979
tFileArchive
Filemask: type in the file name or the file mask using a

special character or a regular expression.
This check box is available when zip or tar.gz is selected
from the Archive format list.
Encoding Select an encoding type from the list or select CUSTOM

handling.
This list is available when zip is selected from the Archive
format list.
Overwrite Existing Archive This check box is selected by default. This allows you to
save an archive by replacing the existing one. But if you
clear the check box, an error is reported, the replacement
fails and the new archive cannot be saved.
Note:
When the replacement fails, the Job runs.
Encrypt files Select this check box if you want the archive file to be
password protected.
Encrypt method: select an encrypt method from the list, Java
Encrypt, Zip4j AES, or Zip4j STANDARD.
AES Key Strength: select a key strength for the Zip4j AES
method, either AES 128 or AES 256.
Enter Password: enter the encryption password.
settings.
This check box is available only when zip is selected
from the Archive format list. With this check box selected,
the compressed archive file can be decompressed only
by the tFileUnarchive component and not by a common
archiver. For more information about tFileUnarchive, see
tFileUnarchive on page 1168.
ZIP64 mode This option allows for archives with the .zip64 extension to
be created, with three modes available:
• ASNEEDED: archives with the .zip64 extension will be
automatically created based on the file size.
• ALWAYS: archives with the .zip64 extension will be
created, no matter what size the file may be.
• NEVER: no archives with the .zip64 extension will be
created, no matter what size the file may be.
Note that if the file size or the total size of the archive
exceeds 4GB or there are more than 65536 files inside the
archive, you need to set the mode to ALWAYS.
Advanced settings
Use sync flush Select this check box to flush the compressor before
flushing the output stream. Clear this check box to flush
only the output stream.
980
tFileArchive
This check box is available when gzip or tar.gz is selected

from the Archive format list.
Global Variables
Global Variables ARCHIVE_FILEPATH: the path to the archive file. This is an

ARCHIVE_FILENAME: the name of the archive file. This is an
check box.
use from it.
User Guide.
Usage
Usage rule This component must be used as a standalone component.

Row: Main; Reject; Iterate.

Parallelize.

Studio User Guide.
Zipping files using a tFileArchive

This scenario creates a Job with a unique component. It aims at zipping files and recording them in
the selected directory.
981
tFileArchive
Procedure
Procedure
1. Drop the tFileArchive component from the Palette onto the workspace.
2. Double-click it to display its Component view.
3. In the Directory field, click the [...] button, browse your directory and select the directory or the
file you want to compress.
4. Select the Subdirectories check box if you want to include the subfolders and their files in the
archive.
5. Then, set the Archive file field, by filling the destination path and the name of your archive file.
6. Select the Create directory if not exists check box if you do not have a destination directory yet
and you want to create it.
7. In the Compress level list, select the compression level you want to apply to your archive. In this
example, we use the normal level.
8. Clear the All Files check box if you only want to zip specific files.
982
tFileArchive
9. Add a row in the table by clicking the [+] button and click the name which appears. Between two
star symbols (ie. *RG*), type part of the name of the file that you want to compress.
10. Press F6 to execute your Job.
Results
The tFileArchive has compressed the selected file(s) and created the folder in the selected directory.
983
tFileCompare
tFileCompare
Compares two files and provides comparison data based on a read-only schema.
tFileCompare Standard properties

These properties are used to configure tFileCompare running in the Standard Job framework.
The Standard tFileCompare component belongs to the File family.
Basic settings
component.
The schema of this component is read-only.
File to compare Filepath to the file to be checked.

Reference file Filepath to the file, the comparison is based on.

If differences are detected, display and If no difference Type in a message to be displayed in the Run console based
detected, display on the result of the comparison.
Print to console Select this check box to display the message.
Advanced settings
handling.
Global Variables
Global Variables DIFFERENCE: the result of the comparison. This is a Flow

variable and it returns a boolean.
check box.
984
tFileCompare

use from it.
User Guide.
Usage
Usage rule This component can be used as standalone component but

it is usually linked to an output component to gather the log
data.

Row: Main.

component Ok; On Component Error; Synchronize; Paralle
lize.

Studio User Guide.
Comparing unzipped files

This scenario describes a Job unarchiving a file and comparing it to a reference file to make sure it did
not change. The output of the comparison is stored into a delimited file and a message displays in the
console.
Procedure
Procedure
1. Drag and drop the following components: tFileUnarchive, tFileCompare, and tFileOutputDelimited.
2. Link the tFileUnarchive to the tFileCompare with Iterate connection.
3. Connect the tFileCompare to the output component, using a Main row link.
4. In the tFileUnarchive component Basic settings, fill in the path to the archive to unzip.
5. In the Extraction Directory field, fill in the destination folder for the unarchived file.
985
tFileCompare
6. In the tFileCompare Basic settings, set the File to compare. Press Ctrl+Space bar to display the
list of global variables. Select $_globals{tFileUnarchive_1}{CURRENT_FILEPATH} or "((String)glob
alMap.get("tFileUnarchive_1_CURRENT_FILEPATH"))" according to the language you work with, to
fetch the file path from the tFileUnarchive component.
7. And set the Reference file to base the comparison on it.

8. In the messages fields, set the messages you want to see if the files differ or if the files are
identical, for example: "[job " + JobName + "] Files differ".
9. Select the Print to Console check box, for the message defined to display at the end of the
execution.
10. The schema is read-only and contains standard information data. Click Edit schema to have a look
to it.
11. Then set the output component as usual with semi-colon as data separators.
12. Save your Job and press F6 to run it.
The message set is displayed to the console and the output shows the schema information data.
986
tFileCompare
987
tFileCopy
tFileCopy
Copies a source file or folder into a target directory.
tFileCopy Standard properties

These properties are used to configure tFileCopy running in the Standard Job framework.
The Standard tFileCopy component belongs to the File family.
Basic settings
File Name Specify the path to the file to be copied.

This field does not appear when the Copy a directory check
box is selected.

Copy a directory Select this check box to copy a directory including all su
bdirectories and files in it.
Source directory Specify the source directory to copy.

This field appears only when the Copy a directory check box
is selected.

Destination directory Specify the directory to copy the source file or directory to.

Rename Select this check box if you want to rename the file copied
to the destination.
box is selected.
Destination filename Specify a new name for the file to be copied.

This field appears only when the Rename check box is
selected.
Remove source file Select this check box to remove the source file after it is
copied to the destination directory.
box is selected.
Replace existing file Select this check box to overwrite any existing file with the
newly copied file.
988
tFileCopy
box is selected.
Create the directory if it doesn't exist Select this check box to create the specified destination
directory if it does not exist.
box is selected.
Advanced settings
Global Variables
Global Variables DESTINATION_FILENAME: the destination file name. This is

DESTINATION_FILEPATH: the destination file path. This is
SOURCE_DIRECTORY: the source directory. This is an After
DESTINATION_DIRECTORY: the destination directory. This is
check box.
use from it.
User Guide.
Usage

Row: Main.

lize.
989
tFileCopy

Studio User Guide.
Restoring files from bin

This scenario describes a Job that iterates on a list of files in a directory, copies each file to a defined
target directory, and then removes the copied files from the source directory.
Procedure
Procedure
1. Create a new Job and add a tFileList component and a tFileCopy component by typing their names
in the design workspace or dropping them from the Palette.
2. Connect tFileList to tFileCopy using a Row > Iterate link.
3. Double-click tFileList to open its Basic settings view.
4. In the Directory field, browse to or type in the directory to iterated upon.

5. Double-click tFileCopy to open its Basic settings view.
990
tFileCopy
6. In the File Name field, press Ctrl+Space to access the global variable list and select the
tFileList_1.CURRENT_FILEPATH variable from the list to fill the field with ((String)globalMap.get("tFil
eList_1_CURRENT_FILEPATH")).
7. In the Destination directory field, browse to or type in the directory to copy each file to.
8. Select the Remove source file check box to get rid of the files that have been copied.
9. Press Ctrl+S to save your Job and press F6 to execute it.
All the files in the defined source directory are copied to the destination directory and are
removed from the source directory.
991
tFileDelete
tFileDelete
Deletes files from a given directory.
tFileDelete Standard properties

These properties are used to configure tFileDelete running in the Standard Job framework.
The Standard tFileDelete component belongs to the File family.
Basic settings
File Name Path to the file to be deleted. This field is hidden when
you select the Delete folder check box or the Delete file or
folder check box.

Directory Path to the folder to be deleted. This field is available only

when you select the Delete folder check box.

File or directory to delete Enter the path to the file or to the folder you want to
delete. This field is available only when you select the
Delete file or folder check box.

Fail on error Select this check box to prevent the main Job from being
executed if an error occurs, for example, if the file to be
deleted does not exist.
Delete Folder Select this check box to display the Directory field, where
you can indicate the path the folder to be deleted.
Delete file or folder Select this check box to display the File or directory to de
lete field, where you can indicate the path to the file or to
the folder you want to delete.
Advanced settings
Global Variables
Global Variables DELETE_PATH: the path to the deleted file or folder. This is
992
tFileDelete
CURRENT_STATUS: the execution result of the component.

check box.
use from it.
User Guide.
Usage

Row: Main.

lize.

Studio User Guide.
Deleting files
This very simple scenario describes a Job deleting files from a given directory.
Procedure
Procedure
1. Drop the following components: tFileList, tFileDelete, tJava from the Palette to the design
workspace.
2. In the tFileList Basic settings, set the directory to loop on in the Directory field.
993
tFileDelete
3. The filemask is "*.txt" and no case check is to carry out.

4. In the tFileDelete Basic settings panel, set the File Name field in order for the current file in
selection in the tFileList component be deleted. This delete all files contained in the directory, as
specified earlier.
5. press Ctrl+Space bar to access the list of global variables. In Java, the relevant variable to collect
the current file is: ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")).
6. Then in the tJava component, define the message to be displayed in the standard output (Run
console). In this Java use case, type in the Code field, the following script: System.out.println( ((S
tring)globalMap.get("tFileList_1_CURRENT_FILE"))
+ " has been deleted!" );
7. Then save your Job and press F6 to run it.
Results
The message set in the tJava component displays in the log, for each file that has been deleted
through the tFileDelete component.
994
tFileExist
tFileExist
Checks if a file exists or not.
tFileExist Standard properties

These properties are used to configure tFileExist running in the Standard Job framework.
The Standard tFileExist component belongs to the File family.
Basic settings
File name/Stream Path to the file you want to check if it exists or not.

Advanced settings
Global Variables
check box.
use from it.
User Guide.
Usage

995
tFileExist

Row: Iterate.
lize.

Studio User Guide.
Checking for the presence of a file and creating it if it does

not exist
This scenario describes a simple Job that: checks if a given file exists, displays a graphical message
to confirm that the file does not exist, reads the input data in another given file and writes it in an
output delimited file.
A dialog box appears to confirm that the file does not exists.
Click OK to close the dialog box and continue the Job execution process. The missing file, file1 in this
scenario, got written in a delimited file in the defined place.

Procedure
1. Drop the following components from the Palette onto the design workspace: tFileExist,
tFileInputDelimited, tFileOutputDelimited, and tMsgBox.
2. Connect tFileExist to tFileInputDelimited using an OnSubjobOk and to tMsgBox using a Run If
link.
3. Connect tFileInputDelimited to tFileOutputDelimite using a Row Main link.

Procedure
1. In the design workspace, select tFileExist and click the Component tab to define its basic settings.
996
tFileExist
2. In the File name field, enter the file path or browse to the file you want to check if it exists or not.
3. In the design workspace, select tFileInputDelimited and click the Component tab to define its
basic settings.
4. Browse to the input file you want to read to fill out the File Name field.
Warning:
If the path of the file contains some accented characters, you will get an error message when
executing your Job.
5. Set the row and field separators in their corresponding fields.

6. Set the header, footer and number of processed rows as needed. In this scenario, there is one
header in our table.
7. Set Schema to Built-in and click the Edit schema button to define the data to pass on to the
tFileOutputDelimited component. Define the data present in the file to read, file2 in this scenario.
For more information about schema types, see Talend Studio User Guide.
The schema in file2 consists of five columns: Num, Ref, Price, Quant, and tax.
8. In the design workspace, select the tFileOutputDelimited component.
9. Click the Component tab to define the basic settings of tFileOutputDelimited.
997
tFileExist
10. Set property type to Built-in.

11. In the File name field, press Ctrl+Space to access the variable list and select the global variable
FILENAME.
12. Set the row and field separators in their corresponding fields.
13. Select the Include Header check box as file2 in this scenario includes a header.
14. Set Schema to Built-in and click Sync columns to synchronize the output file schema (file1) with
the input file schema (file2).
15. In the design workspace, select the tMsgBox component.

16. Click the Component tab to define the basic settings of tMsgBox.
17. Click the If link to display its properties in the Basic settings view.
18. In the Condition panel, press Ctrl+Space to access the variable list and select the global variable
EXISTS. Type an exclamation mark before the variable to negate the meaning of the variable.
998
tFileExist

Procedure
2. Press F6 or click the Run button in the Run tab to execute it.
999
tFileFetch
tFileFetch
Retrieves a file through the given protocol (HTTP, HTTPS, FTP, or SMB).
tFileFetch Standard properties

These properties are used to configure tFileFetch running in the Standard Job framework.
The Standard tFileFetch component belongs to the Internet family.
Basic settings
Protocol Select the protocol you want to use from the list and fill in
the corresponding fields: http, https, ftp, smb.
The properties differ slightly depending on the type of
protocol selected. The additional fields are defined in this
table, after the basic settings.
URI Type in the URI of the site from which the file is to be
fetched.
Use cache to save resource Select this check box to save the data in the cache.
This option allows you to process the file data flow (in
streaming mode) without saving it on your drive. This is
faster and improves performance.
Domain Enter the Microsoft server domain name.

Available for the smb protocol.
Username and Password Enter the authentication information required to access the
server.
settings.
Available for the smb protocol.
Destination Directory Browse to the destination folder where the file fetched is to
be placed.

Destination Filename Enter a new name for the file fetched.

If the Upload file option in the Advanced settings view is
selected, the upload response will be saved in this file.

1000
tFileFetch
Create full path according to URI It allows you to reproduce the URI directory path. To save
the file at the root of your destination directory, clear the
check box.
Available for the http, https and ftp protocols.
Add header Select this check box if you want to add one or more HTTP
request headers as fetch conditions. In the Headers table,
enter the name(s) of the HTTP header parameter(s) in the
Name field and the corresponding value(s) in the Value
field.
Available for the http and https protocols.
POST method This check box is selected by default. It allows you to use
the POST method. In the Parameters table, enter the name
of the variable(s) in the Name field and the corresponding
value in the Value field.
Clear the check box if you want to use the GET method.
Die on error Clear this check box to skip the rows in error and to
complete the process for the error free rows
Read Cookie Select this check box for tFileFetch to load a web
authentication cookie.
Available for the http, https, ftp and smb protocols.
Save Cookie Select this check box to save the web page authentication
cookie. This means you will not have to log on to the same
web site in the future.
Cookie file Type in the full path to the file which you want to use to
save the cookie or click [...] and browse to the desired file to
save the cookie.
Cookie policy Choose a cookie policy from this drop-down list. Four
options are available, BROWSER_COMPATIBILITY, DEFAULT,
NETSCAPE and RFC_2109.
Single cookie header Check this box to put all cookies into one request header for
maximum compatibility among different servers.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at each
component level.
Timeout Enter the number of milliseconds after which the protocol

connection should close.
1001
tFileFetch
Print response to console Select this check box to print the server response in the
console.
Upload file Select this check box to upload one or more files to the
server. For each file to be uploaded, click the [+] button
beneath the table displayed and set the following fields:
• Name: the value of the name attribute of the <input
type="file"> field in the original HTML form.
• File: the full path of the file to upload, e.g. "D:/
filefetch.txt".
• Content-Type: the content type of the file to upload.
The default value is "application/octet-
stream".
• Charset: the character set of the file to upload. The
default value is "ISO-8859-1".
Thhis option is available for the http and https protocols,
with the POST method option in the Basic settings view
selected.
With this option selected, the upload response will be saved
in the file specified in the Destination filename field in the
Enable proxy server Select this check box if you are connecting via a proxy
and complete the fields which follow with the relevant
information.
Enable NTLM Credentials Select this check box if you are using an NTLM
authentication protocol.
Domain: The client domain name.
Host: The client's IP address.
Need authentication Select this check box and enter the username and password
in the relevant fields, if they are required to access the
protocol.
Support redirection Select this check box to repeat the redirection request until
redirection is successful and the file can be retrieved.
Global Variables

check box.
INPUT_STREAM: the content of the file being fetched. This
is a Flow variable and it returns an InputStream.
1002
tFileFetch

use from it.
User Guide.
Usage
Usage rule This component is generally used as a start component

to feed the input flow of a Job and is often connected to
the Job using an OnSubjobOk or OnComponentOk link,
depending on the context.

Fetching data through HTTP

This scenario describes a three-component Job which retrieves a file from an HTTP website, reads
data from the fetched file and displays the data on the console.

Procedure
1. Drop a tFileFetch, a tFileInputDelimited and a tLogRow onto your design workspace.
2. Link tFileFetch to tFileInputDelimited using a Trigger > On Subjob Ok or On Component Ok
connection.
3. Link tFileInputDelimited to tLogRow using a Row > Main connection.
1003
tFileFetch

Procedure
1. Double-click tFileFetch to open its Basic settings view.
2. Select the protocol you want to use from the list. Here, http is selected.
3. In the URI field, type in the URI where the file to be fetched can be retrieved from. You can paste
the URI directly in your browser to view the data in the file.
4. In the Destination directory field, browse to the folder where the fetched file is to be stored. In
this example, it is D:/Output.
5. In the Destination filename field, type in a new name for the file if you want it to be changed. In
this example, new.txt.
6. If needed, select the Add header check box and define one or more HTTP request headers as fetch
conditions. For example, to fetch the file only if it has been modified since 19:43:31 GMT, October
29, 1994, fill in the Name and Value fields with "If-Modified-Since" and "Sat, 29 Oct 1994 19:43:31
GMT" respectively in the Headers table. For details about HTTP request header definitions, see
Header Field Definitions.
8. In the File name field, type in the full path to the fetched file which had been stored locally.
1004
tFileFetch
9. Click the [...] button next to Edit schema to open the Schema dialog box. In
this example, add one column output to store the data from the fetched file.
10. Leave other settings as they are.

Procedure
2. Press F6 or click Run on the Run tab to execute the Job.
The data of the fetched file is displayed on the console.
Reusing stored cookie to fetch files through HTTP

This scenario describes a two-component Job which logs in a given HTTP website and then using
cookie stored in a user-defined local directory, fetches data from this website.
1005
tFileFetch

Procedure
1. Drop two tFileFetch components onto your design workspace.
2. Link the two components as subJobs using a Trigger > On Subjob Ok connection.

Configuring the first subJob
Procedure
1. Double click tFileFetch_1 to open its component view.
1006
tFileFetch
2. Select the protocol you want to use from the Protocol list. Here, we use the https protocol.
3. In the URI field, type in the URI through which you can log in the website and fetch the web page
accordingly. In this example, the URI is https://www.codeproject.com/script/Members
hip/LogOn.aspx?download=true.
4. In the Destination directory field, browse to the folder where the fetched web page is to be stored.
This folder will be created on the fly if it does not exist. In this example, type in D:/download.
5. In the Destination Filename field, type in a new name for the file if you want it to be changed. In
this example, codeproject.html.
6. Under the Parameters table, click the plus button to add two rows and fill in the credentials for
accessing the desired website..
In the Name column, type in a new name respectively for the two rows. In this example, they are
Email and Password, which are required by the website you are logging in.
In the Value column, type in the authentication information.
7. Select the Save cookie check box.
8. In the Cookie file field, type in the full path to the file which you want to use to save the cookie.
In this example, it is D:/download/cookie.
10. Select the Support redirection check box so that the redirection request will be repeated until the
redirection is successful.
Configuring the second subJob
Procedure
1. Double-click tFileFetch_2 to open its Component view.
1007
tFileFetch
2. From the Protocol list, select http.

3. In the URI field, type in the address from which you fetch the files of your interest. In
this example, the address is http://www.codeproject.com/script/articles/
download.aspx?file=/KB/DLL/File_List_Downloader/FLD02June2011_Source.
zip&rp=http://www.codeproject.com/Articles/203991/File-List-Downloader
.
4. In the Destination directory field, type in the directory or browse to the folder where you want to
store the fetched files. This folder can be automatically created if it does not exist yet during the
execution process. In this example, type in D:/download.
5. In the Destination Filename field, type in a new name for the file if you want it to be changed. In
this example, source.zip.
6. Clear the POST method check box to deactivate the Parameters table.
7. Select the Read cookie check box.
8. In the Cookie file field, browse to the file which is used to save the cookie. In this example, it is
D:/download/cookie.

Procedure
Then, go to the local directory D:/download to check the downloaded file.
1008
tFileFetch
Related scenario
For an example of transferring data in streaming mode, see Reading data from a remote file in
streaming mode on page 1020
1009
tFileInputARFF
tFileInputARFF
Reads an ARFF file row by row to split them up into fields and then sends the fields as defined in the
schema to the next component.
tFileInputARFF Standard properties

These properties are used to configure tFileInputARFF running in the Standard Job framework.
The Standard tFileInputARFF component belongs to the File family.
Basic settings

are stored. The fields that follow are completed
Click this icon to open a connection wizard and store the

Excel file connection parameters you set in the component's
For more information about setting up and storing file
File Name Name and path of the ARFF file and/or variable to be
processed.

in the Repository.
available:
only.
1010
tFileInputARFF

Guide.

Advanced settings
handling.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule Use this component to read a file and separate the fields
with the specified separator.

Displaying the content of a ARFF file

This scenario describes a two-component Job in which the rows of an ARFF file are read, the delimited
data is selected and the output is displayed in the Run view.
1011
tFileInputARFF
An ARFF file looks like the following:
It is generally made of two parts. The first part describes the data structure, that is to say the rows
which begin by @attribute and the second part comprises the raw data, which follows the
expression @data.

Procedure
1. Drop the tFileInputARFF component from the Palette onto the workspace.
2. In the same way, drop the tLogRow component.
3. Right-click the tFileInputARFF and select Row > Main in the menu. Then, drag the link to the
tLogRow, and click it. The link is created and appears.

Procedure
1. Double-click the tFileInputARFF.
2. In the Component view, in the File Name field, browse your directory in order to select your .arff
file.
3. In the Schema field, select Built-In.
4. Click the [...] button next to Edit schema to add column descriptions corresponding to the file to
be read.
1012
tFileInputARFF
5. Click on the button as many times as required to create the number of columns required,
according to the source file. Name the columns as follows.
6. For every column, the Nullable check box is selected by default. Leave the check boxes selected,
for all of the columns.
7. Click OK.
8. In the workspace, double-click the tLogRow to display its Component view.
1013
tFileInputARFF
9. Click the [...] button next to Edit schema to check that the schema has been propagated. If not,
click the Sync columns button.

Procedure
2. Press F6 to execute your Job.
The console displays the data contained in the ARFF file, delimited using a vertical line (the
default separator).
1014
tFileInputDelimited
tFileInputDelimited
Reads a delimited file row by row to split them up into fields and then sends the fields as defined in
the schema to the next component.
tFileInputDelimited Standard properties

These properties are used to configure tFileInputDelimited running in the Standard Job framework.
The Standard tFileInputDelimited component belongs to the File family.
Basic settings

are stored.
File Name/Stream File name: Name and path of the file to be processed.
Stream: The data flow to be processed. The data must be
added to the flow in order for tFileInputDelimited to fetch
these data via the corresponding representative variable.
Related topic to the available variables: see Talend Studio
User Guide

Row separator The separator used to identify the end of a row.

CSV options Select this check box to specify the following CSV
parameters:
• Escape char: enter the escape character between
• Text enclosure: enter the enclosure character (only
one character) between double quotation marks.
For example, """ needs to be entered when double
quotation marks (") are used as the enclosure character.
1015
tFileInputDelimited
It is recommended to use standard escape character, that

is "\". Otherwise, you should set the same character for
Escape char and Text enclosure. For example, if the escape
character is set to "\", the text enclosure can be set to any
other character. On the other hand, if the escape character
is set to other character rather than "\", the text enclosure
can be set to any other characters. However, the escape
character will be changed to the same character as the text
enclosure. For instance, if the escape character is set to "#"
and the text enclosure is set to "@", the escape character
will be changed to "@", not "#".
Header Enter the number of rows to be skipped in the beginning of

file.
Footer Number of rows to be skipped at the end of the file.
Limit Maximum number of rows to be processed. If Limit = 0, no

row is read or processed.
available:
only.
Note that if the input value of any non-nullable primitive
field is null, the row of data including that field will be
rejected.
component only.

Job designs.
Skip empty rows Select this check box to skip the empty rows.
Uncompress as zip file Select this check box to uncompress the input file.
an error occurs.
1016
tFileInputDelimited
To catch the FileNotFoundException, you also need to

select this check box.
Advanced settings
Advanced separator (for numbers) Select this check box to change the separator used for
Extract lines at random Select this check box to set the number of lines to be
extracted randomly.
docs.oracle.com.
Trim all column Select this check box to remove the leading and trailing
whitespaces from all columns. When this check box is
cleared, the Check column to trim table is displayed, which
lets you select particular columns to trim.
console.
Check date Select this check box to check the date format strictly
Check columns to trim This table is filled automatically with the schema being
used. Select the check box(es) corresponding to the
column(s) to be trimmed.
Split row before field Select this check box to split rows before splitting fields.
Permit hexadecimal (0xNNN) or octal (0NNNN) for numeric Select this check box if any of your numeric types
types - it will act the opposite for Byte (long, integer, short, or byte type), will be parsed from a
hexadecimal or octal string.
In the table that appears, select the check box next to the
column or columns of interest to transform the input string
of each selected column to the type defined in the schema.
Select the Permit hexadecimal or octal check box to select
all the columns.
This table appears only when the Permit hexadecimal
(0xNNN) or octal (0NNNN) for numeric types - it will act the
opposite for Byte check box is selected.
Global Variables

1017
tFileInputDelimited

check box.
use from it.
User Guide.
Usage
Usage rule Use this component to read a file and separate fields
contained in this file using a defined separator. It allows
you to create a data flow using a Row > Main link or via a
Row > Reject link in which case the data is filtered by data
that does not correspond to the type defined. For further
information, please see Procedure on page 975.

Reading data from a Delimited file and display the output

The following scenario creates a two-component Job, which aims at reading each row of a file,
selecting delimited data and displaying the output in the Run log console.

Procedure
1. Drop a tFileInputDelimited component and a tLogRow component from the Palette to the design
workspace.
2. Right-click on the tFileInputDelimited component and select Row > Main. Then drag it onto the
tLogRow component and release when the plug symbol shows up.
1018
tFileInputDelimited

Procedure
1. Select the tFileInputDelimited component again, and define its Basic settings:
2. Fill in a path to the file in the File Name field. This field is mandatory.
Warning:
executing your Job.
3. Define the Row separator allowing to identify the end of a row. Then define the Field separator
used to delimit fields in a row.
4. In this scenario, the header and footer limits are not set. And the Limit number of processed rows
is set on 50.
5. Set the Schema as either a local (Built-in) or a remotely managed (Repository) to define the data
to pass on to the tLogRow component.
6. You can load and/or edit the schema via the Edit Schema function.
Related topics: see Talend Studio User Guide.
7. Enter the encoding standard the input file is encoded in. This setting is meant to ensure encoding
consistency throughout all input and output files.
8. Select the tLogRow and define the Field separator to use for the output display. Related topic:
tLogRow on page 1977.
9. Select the Print schema column name in front of each value check box to retrieve the column
labels in the output displayed.

Procedure
2. Go to Run tab, and click on Run to execute the Job.
The file is read row by row and the extracted fields are displayed on the Run log as defined in
both components Basic settings.
1019
tFileInputDelimited
The Log sums up all parameters in a header followed by the result of the Job.
Reading data from a remote file in streaming mode

This scenario describes a four component Job used to fetch data from a voluminous file almost as
soon as it has been read. The data is displayed in the Run view. The advantage of this technique is
that you do not have to wait for the entire file to be downloaded, before viewing the data.

Procedure
1. Drop the following components onto the workspace: tFileFetch, tSleep, tFileInputDelimited, and
tLogRow.
2. Connect tSleep and tFileInputDelimited using a Trigger > OnComponentOk link and connect
tFileInputDelimited to tLogRow using a Row > Main link.

Procedure
1. Double-click tFileFetch to display the Basic settings tab in the Component view and set the
properties.
1020
tFileInputDelimited
2. From the Protocol list, select the appropriate protocol to access the server on which your data is
stored.
3. In the URI field, enter the URI required to access the server on which your file is stored.
4. Select the Use cache to save the resource check box to add your file data to the cache memory.
This option allows you to use the streaming mode to transfer the data.
5. In the workspace, click tSleep to display the Basic settings tab in the Component view and set the
properties.
By default, tSleep's Pause field is set to 1 second. Do not change this setting. It pauses the second
Job in order to give the first Job, containing tFileFetch, the time to read the file data.
6. In the workspace, double-click tFileInputDelimited to display its Basic settings tab in the
Component view and set the properties.
7. In the File name/Stream field:

- Delete the default content.
- Press Ctrl+Space to view the variables available for this component.
- Select tFileFetch_1_INPUT_STREAM from the auto-completion list, to add the following
variable to the Filename field: ((java.io.InputStream)globalMap.get("tFile
Fetch_1_INPUT_STREAM")).
1021
tFileInputDelimited
8. From the Schema list, select Built-in and click [...] next to the Edit schema field to describe the
structure of the file that you want to fetch. The US_Employees file is composed of six columns: ID,
Employee, Age, Address, State, EntryDate.
Click [+] to add the six columns and set them as indicated in the above screenshot. Click OK.
9. In the workspace, double-click tLogRow to display its Basic settings in the Component view and
click Sync Columns to ensure that the schema structure is properly retrieved from the preceding
component.
Configuring Job execution and executing the Job

Procedure
1. Click the Job tab and then on the Extra view.
2. Select the Multi thread execution check box in order to run the two Jobs at the same time. Bear
in mind that the second Job has a one second delay according to the properties set in tSleep.
This option allows you to fetch the data almost as soon as it is read by tFileFetch, thanks to the
tFileDelimited component.
3. Save the Job and press F6 to run it.
1022
tFileInputDelimited
The data is displayed in the console as almost as soon as it is read.
1023
tFileInputExcel
tFileInputExcel
Reads an Excel file row by row to split them up into fields using regular expressions and then sends
the fields as defined in the schema to the next component.
tFileInputExcel Standard properties

These properties are used to configure tFileInputExcel running in the Standard Job framework.
The Standard tFileInputExcel component belongs to the File family.
Basic settings

are stored. The fields that follow are completed

Excel file connection parameters you set in the component
Read excel2007 file format (xlsx / xlsm) Select this check box to read the .xlsx or .xlsm file of Excel
2007.
File Name/Stream File name: Name of the file and/or the variable to be
processed.
Stream: Data flow to be processed. The data must be added
to the flow in order to be collected by tFileInputExcel via
the INPUT_STREAM variable in the auto-completion list
(Ctrl+Space).

Password Provide the password set for the Excel file in double
quotation marks by clicking the three-dot button to the
right of this frame.
This field is for Excel 2007 (and higher versions) files
protected by passwords and is available when Read
excel2007 file format(xlsx) is selected.
This component supports standard encryption and agile
encryption.
All sheets Select this check box to process all sheets of the Excel file.
1024
tFileInputExcel
Sheet list Click the plus button to add as many lines as needed to the
list of the excel sheets to be processed:
Sheet (name or position): enter the name or position of the
excel sheet to be processed.
Use Regex: select this check box if you want to use a regular
expression to filter the sheets to process.

file.
Footer Number of records to be skipped at the end of the file.
Limit Maximum number of lines to be processed.
Affect each sheet(header&footer) Select this check box if you want to apply the parameters
set in the Header and Footer fields to all excel sheets to be
processed.
Note: This option is only available when you select

Memory-consuming (User mode) from the Generation
mode drop-down list in the Advanced settings view.
First column and Last column Define the range of the columns to be processed through
setting the first and last columns in the First column and
Last column fields respectively.
in the Repository.
available:
only.

Guide.

1025
tFileInputExcel
Advanced settings
Advanced separator Select this check box to change the used data separators.
Trim all columns Select this check box to remove the leading and trailing
whitespaces from all columns. When this check box is
cleared, the Check column to trim table is displayed, which
lets you select particular columns to trim.
Check column to trim This table is filled automatically with the schema being
used. Select the check box(es) corresponding to the
column(s) to be trimmed.
Convert date column to string Available when Read excel2007 file format (xlsx) is
selected in the Basic settings view.
Select this check box to show the table Check need convert
date column. Here you can parse the string columns that
contain date values based on the given date pattern.
Column: all the columns available in the schema of the
source .xlsx file.
Convert: select this check box to choose all the columns for
conversion (only if they are all of the string type). You can
also select the individual check box next to each column for
conversion.
Date pattern: set the date format here.
handling.
Read real values for numbers Select this check box to read numbers in real values. This
check box becomes unavailable when you select Read
excel2007 file format (xlsx) in the Basic settings view.
Stop reading on encountering empty rows Select this check box to ignore the empty line encountered
and, if there are any, the lines that follow this empty line.
This check box becomes unavailable when you select Read
Generation mode Available when Read excel2007 file format (xlsx) is

selected in the Basic settings view. Select the mode used to
read the Excel 2007 file.
• Less memory consumed for large excel(Event mode):
used for large file. This is a memory-saving mode to
read the Excel 2007 file as a flow. This option helps
prevent Job failure with an out-of-memory error due to
high memory consumption when reading large Excel
files.
With this mode selected, the data will be extracted
with the format symbol, for example, the percent
symbol % and the currency symbol $. Moreover, the
Include phonetic runs check box is selected by default
to allow you to use phonetic strings at index.
• Memory-consuming (User mode): used for small file. It
needs much memory. With this mode selected, the pure
data without the format symbol will be extracted.
1026
tFileInputExcel
Don't validate the cells Select this check box to in order not to validate data. This
check box becomes unavailable when you select Read
Ignore the warning Select this check box to ignore all warnings generated to
indicate errors in the Excel file. This check box becomes
unavailable when you select Read excel2007 file format
(xlsx) in the Basic settings view.
Global Variables

CURRENT_SHEET: the name of the sheet being processed.
check box.
use from it.
User Guide.
Usage
Usage rule Use this component to read an Excel file and to output the
data separately depending on the schemas identified in the
file. You can use a Row > Reject link to filter the data which
doesn't correspond to the type defined. For an example of
how to use these two links, see Procedure on page 975.

Related scenarios
1027
tFileInputFullRow
tFileInputFullRow
Reads a file row by row and sends complete rows of data as defined in the schema to the next
component via a Row link.
tFileInputFullRow Standard properties

These properties are used to configure tFileInputFullRow running in the Standard Job framework.
The Standard tFileInputFullRow component belongs to the File family.
Basic settings
available:
only.
component only.

Job designs.
File Name Specify the path to the file to be processed.


file.
Footer Enter the number of rows to be skipped at the end of the

file.
1028
tFileInputFullRow
Limit Enter the maximum number of rows to be processed. If the

value is set to 0, no row is read or processed.
Advanced settings
docs.oracle.com.
Extract lines at random Select this check box to set the number of lines to be
extracted randomly.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule Use this component to read full rows in delimited files that
can get very large.
Reading full rows in a delimited file

The following scenario creates a two-component Job that aims at reading complete rows in the
delimited file states.csv and displaying the rows on the console.
The content of the file states.csv that holds ten rows of data is as follows:
StateID;StateName
1;Alabama
2;Alaska
3;Arizona
4;Arkansas
1029
tFileInputFullRow
5;California
6;Colorado
7;Connecticut
8;Delaware
9;Florida
10;Georgia
Reading full rows in a delimited file

Procedure
1. Create a new Job and add a tFileInputFullRow component and a tLogRow component by typing
their names in the design workspace or dropping them from the Palette.
2. Link the tFileInputFullRow component to the tLogRow component using a Row > Main
connection.
3. Double-click the tFileInputFullRow component to open its Basic settings view on the Component
tab.
4. Click the [...] button next to Edit schema to view the data to be passed onto the tLogRow
component. Note that the schema is read-only and it consists of only one column line.
5. In the File Name field, browse to or enter the path to the file to be processed. In this scenario, it is
E:/states.csv.
6. In the Row Separator field, enter the separator used to identify the end of a row. In this example,
it is the default value \n.
1030
tFileInputFullRow
7. In the Header field, enter 1 to skip the header row at the beginning of the file.
8. Double-click the tLogRow component to open its Basic settings view on the Component tab.
9. Press Ctrl+S to save your Job and then F6 to execute it.
As shown above, ten rows of data in the delimited file states.csv are read one by one, ignoring
field separators, and the complete rows of data are displayed on the console.
To extract fields from rows, you must use tExtractDelimitedFields, tExtractPositionalFields,
or tExtractRegexFields. For more information, see tExtractDelimitedFields on page 937,
tExtractPositionalFields on page 963 and tExtractRegexFields on page 966.
1031
tFileInputJSON
tFileInputJSON
Extracts JSON data from a file and transfers the data to a file, a database table, etc.
tFileInputJSON Standard properties

These properties are used to configure tFileInputJSON running in the Standard Job framework.
The Standard tFileInputJSON component belongs to the Internet and the File families.
Basic settings

are stored.

becomes built-in.

only.
Read By Select a way of extracting JSON data in the file.

• JsonPath: Extracts JSON data based on the JSONPath
query. With this option selected, you need to select a
JSONPath API version from the API version drop-down
list. It is recommended to read data by JSONPath in
order to gain better performance.
• Xpath: Extracts JSON data based on the XPath query.
• JsonPath without loop: Extracts JSON data based on the
JSONPath query without setting a loop node.
1032
tFileInputJSON
Use Url Select this check box to retrieve data directly from the Web.
URL Enter the URL path from which you will retrieve data.
This field is available only when the Use Url check box is
selected.
Filename Specify the file from which you will retrieve data.
This field is not visible if the Use Url check box is selected.

Loop Jsonpath query Enter the path pointing to the node within the JSON field,
on which the loop is based.
Note if you have selected Xpath from the Read by drop-
down list, the Loop Xpath query field is displayed instead.
Mapping Complete this table to map the columns defined in the

schema to the corresponding JSON nodes.
• Json query/JSONPath query: Specify the JSONPath
node that holds the desired data. For more information
about JSONPath expressions, see http://goessner.net/
articles/JsonPath/.
This column is available only when JsonPath is
selected from the Read By list.
• XPath query: Specify the XPath node that holds the
desired data.
• Get Nodes: Select this check box to extract the JSON
data of all the nodes or select the check box next to a
specific node to extract the data of that node.
Advanced settings
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Use the loop node as root Select this check box to use the loop node as the root for
querying the file.
The loop node is set in the Loop Json query text frame in
the Basic Settings view. If this option is checked, only the
child elements of the loop node are available for querying;
1033
tFileInputJSON
otherwise, both the parent elements and the child elements

of the loop node can be queried. You can specify a parent
element through JSON path syntax.
This check box is available only when JsonPath is selected
in the Read By drop-down list of the Basic settings view.
This check box is available only if the Read By XPath check
box is selected.
handling.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is a start component of a Job and always

needs an output link.
Extracting JSON data from a file using JSONPath without

setting a loop node
This scenario describes a two-component Job that extracts data from the JSON file Store.json by
specifying the complete JSON path for each node of interest and displays the flat data extracted on
the console.
1034
tFileInputJSON
The JSON file Store.json contains information about a department store and the content of the file is
as follows:
{"store": {
"name": "Sunshine Department Store",
"address": "Wangfujing Street",
"goods": {
"book": [
{
"category": "Reference",
"title": "Sayings of the Century",
"author": "Nigel Rees",
"price": 8.88
},
{
"category": "Fiction",
"title": "Sword of Honour",
"author": "Evelyn Waugh",
"price": 12.66
}
],
"bicycle": {
"type": "GIANT OCR2600",
"color": "White",
"price": 276
}
}
}}
In the following example, we will extract the store name, the store address, and the bicycle
information from this file.

Procedure
1. Create a new Job and add a tFileInputJSON component and a tLogRow component by typing their
names in the design workspace or dropping them from the Palette.
2. Link the tFileInputJSON component to the tLogRow component using a Row > Main connection.

Procedure
1. Double-click the tFileInputJSON component to open its Basic settings view.
1035
tFileInputJSON
2. Select JsonPath without loop from the Read By drop-down list. With this option, you need to
specify the complete JSON path for each node of interest in the JSONPath query fields of the
Mapping table.
4. Click the [+] button to add five columns, store_name, store_address, bicycle_type, and bicycle_color
of String type, and bicycle_price of Double type.
Click OK to close the schema editor. In the pop-up dialog box, click Yes to propagate the schema
to the subsequent component.
5. In the Filename field, specify the path to the JSON file that contains the data to be extracted. In
this example, it is "E:/Store.json".
6. In the Mapping table, the Column fields are automatically filled with the schema columns you
have defined.
In the JSONPath query fields, enter the JSONPath query expressions between double quotation
marks to specify the nodes that hold the desired data.
1036
tFileInputJSON
• For the columns store_name and store_address, enter the JSONPath query expressions
"$.store.name" and "$.store.address" relative to the nodes name and address respectively.
• For the columns bicycle_type, bicycle_color, and bicycle_price, enter the JSONPath query
expressions "$.store.goods.bicycle.type", "$.store.goods.bicycle.color", and "$.store.goods
.bicycle.price" relative to the child nodes type, color, and price of the bicycle node respectively.
7. Double-click the tLogRow component to display its Basic settings view.
Executing the Job

Procedure
As shown above, the store name, the store address, and the bicycle information are extracted
from the source JSON data and displayed in a flat table on the console.
Extracting JSON data from a file using JSONPath

Based on Extracting JSON data from a file using JSONPath without setting a loop node on page 1034,
this scenario extracts data under the book array of the JSON file Store.json by specifying a loop node
and the relative JSON path for each node of interest, and then displays the flat data extracted on the
console.
Procedure
Procedure
1. In the Studio, open the Job used in Extracting JSON data from a file using JSONPath without
setting a loop node on page 1034 to display it in the design workspace.
1037
tFileInputJSON
3. Select JsonPath from the Read By drop-down list.

4. In the Loop Json query field, enter the JSONPath query expression between double quotation
marks to specify the node on which the loop is based. In this example, it is "$.store.goods.book[*]".
Select the five columns added previously and click the x button to remove all of them.
Click the [+] button to add four columns, book_title, book_category, and book_author of String type,
and book_price of Double type.
6. In the Json query fields of the Mapping table, enter the JSONPath query expressions between
double quotation marks to specify the nodes that hold the desired data. In this example, enter the
JSONPath query expressions "title", "category", "author", and "price" relative to the four child nodes
of the book node respectively.
1038
tFileInputJSON
As shown above, the book information is extracted from the source JSON data and displayed in a
flat table on the console.
Extracting JSON data from a file using XPath

Based on Extracting JSON data from a file using JSONPath without setting a loop node on page 1034,
this scenario extracts the store name and the book information from the JSON file Store.json using
XPath queries and displays the flat data extracted on the console.
Procedure
Procedure
1. In the Studio, open the Job used in Extracting JSON data from a file using JSONPath without
setting a loop node on page 1034 to display it in the design workspace.
3. Select Xpath from the Read By drop-down list.

1039
tFileInputJSON
Select the five columns added previously and click the x button to remove all of them.
Click the [+] button to add five columns, store_name, book_title, book_category, and book_author of
String type, and book_price of Double type.
5. In the Loop XPath query field, enter the XPath query expression between double quotation marks
to specify the node on which the loop is based. In this example, it is "/store/goods/book".
6. In the XPath query fields of the Mapping table, enter the XPath query expressions between do
uble quotation marks to specify the nodes that hold the desired data.
• For the column store_name, enter the XPath query "../../name" relative to the name node.
• For the columns book_title, book_category, book_author, and book_price, enter the XPath query
expressions "title", "category", "author", and "price" relative to the four child nodes of the book
node respectively.
As shown above, the store name and the book information are extracted from the source JSON
data and displayed in a flat table on the console.
Extracting JSON data from a URL

In this scenario, tFileInputJSON retrieves data of the friends node from the JSON file facebook.json on
the Web that contains the data of a Facebook user and tExtractJSONFields extracts the data from the
friends node for flat data output.
1040
tFileInputJSON
The JSON file facebook.json is deployed on the Tomcat server, specifically, located in the folder
<tomcat path>/webapps/docs, and the content of the file is as follows:
{"user": {
"id": "9999912398",
"name": "Kelly Clarkson",
"friends": [
{
"name": "Tom Cruise",
"id": "55555555555555",
"likes": {"data": [
{
"category": "Movie",
"name": "The Shawshank Redemption",
"id": "103636093053996",
"created_time": "2012-11-20T15:52:07+0000"
},
{
"category": "Community",
"name": "Positiveretribution",
"id": "471389562899413",
"created_time": "2012-12-16T21:13:26+0000"
}
]}
},
{
"name": "Tom Hanks",
"id": "88888888888888",
"likes": {"data": [
{
"category": "Journalist",
"name": "Janelle Wang",
"id": "136009823148851",
"created_time": "2013-01-01T08:22:17+0000"
},
{
"category": "Tv show",
"name": "Now With Alex Wagner",
"id": "305948749433410",
"created_time": "2012-11-20T06:14:10+0000"
}
]}
}
]
}}

Procedure
1. Create a new Job and add a tFileInputJSON component, a tExtractJSONFields component, and two
tLogRow components by typing their names in the design workspace or dropping them from the
Palette.
2. Link the tFileInputJSON component to the first tLogRow component using a Row > Main connecti
on.
3. Link the first tLogRow component to the tExtractJSONFields component using a Row > Main
connection.
1041
tFileInputJSON
4. Link the tExtractJSONFields component to the second tLogRow component using a Row > Main
connection.

Procedure
2. Select JsonPath without loop from the Read By drop-down list. Then select the Use Url check box
and in the URL field displayed enter the URL of the file facebook.json from which the data
will be retrieved. In this example, it is http://localhost:8080/docs/facebook.json.
adding one column friends of String type.
Click OK to close the dialog box and accept the propogation prompted by the pop-up dialog box.
4. In the Mapping table, enter the JSONPath query "$.user.friends[*]" next to the friends
column to retrieve the entire friends node from the source file.
5. Double-click tExtractJSONFields to open its Basic settings view.
1042
tFileInputJSON
6. Select Xpath from the Read By drop-down list.

7. In the Loop XPath query field, enter the XPath expression between double quotation marks to
specify the node on which the loop is based. In this example, it is "/likes/data".
adding five columns of String type, id, name, like_id, like_name, and like_category, which will hold
the data of relevant nodes under the JSON field friends.
Click OK to close the dialog box and accept the propogation prompted by the pop-up dialog box.
9. In the XPath query fields of the Mapping table, type in the XPath query expressions between
double quotation marks to specify the JSON nodes that hold the desired data. In this example:
• "../../id" (querying the "/friends/id" node) for the column id,
• "../../name" (querying the "/friends/name" node) for the column name,
• "id" for the column like_id,
• "name" for the column like_name, and
• "category" for the column like_category.
10. Double-click the second tLogRow component to open its Basic settings view.
1043
tFileInputJSON
Executing the Job

Procedure
As shown above, the friends data in the JSON file specified using the URL is extracted and then
the data from the node friends is extracted and displayed in a flat table.
1044
tFileInputLDIF
tFileInputLDIF
Reads an LDIF file row by row to split them up into fields and sends the fields as defined in the
schema to the next component using a Row connection.
tFileInputLDIF Standard properties

These properties are used to configure tFileInputLDIF running in the Standard Job framework.
The Standard tFileInputLDIF component belongs to the File family.
Basic settings

File Name Name of the file and/or variable to be processed.


add operation as prefix when the entry is modify type Select this check box to display the operation mode.
Value separator Type in the separator required for parsing data in the given
file. By default, the separator used is ",".
in the Repository.
available:
only.
1045
tFileInputLDIF

Advanced settings
handling.
Use field options (for Base64 decode checked) Select this check box to specify the Base64-encoded
columns of the input flow. Once selected, this check box
activates the Decode Base64 encoding values table to ena
ble you to precise the columns to be decoded from Base64.
Note:
The data type of the columns to be handled by this check
box is byte that you define in the input schema editor.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule Use this component to read full rows in a voluminous LDIF
file. This component enables you to create a data flow,
using a Row > Main link, and to create a reject flow with
a Row > Reject link filtering the data which type does
not match the defined type. For an example of usage, see
Procedure on page 1096 from tFileInputXML.

1046
tFileInputLDIF

Related scenario
For a related scenario, see Writing data from a database table into an LDIF file on page 1133.
1047
tFileInputMail
tFileInputMail
Reads the standard key data of a given MIME or MSG email file.
tFileInputMail Standard properties

These properties are used to configure tFileInputMail running in the Standard Job framework.
The Standard tFileInputMail component belongs to the File family.
Basic settings
File Name Specify the email file to read and extract data from.

in the Repository.
available:
only.

Guide.

Mail type Select a type of email from the drop-down list, either MIME
or MSG.
Attachment export directory Specify the directory to which you want to export email
attachments.
Mail parts Specify the header fields to extract from the MIME email file
specified in the File Name field.
the column names defined in the schema.
1048
tFileInputMail
• Mail part: Type in the names of the header fields or

body parts to be extracted from the email file in double
quotation marks. Refer to https://tools.ietf.org/html/
rfc4021 for a list of MIME mail header fields.
• Multi value: Select this check box to allow multiple
field values.
• Separator: Enter a character as the separators for
multiple field values.
This table appears only when MIME is selected from the
Mail type drop-down list.
MSG Mail parts Specify what to extract from the defined MSG email file for
each schema column.
the column name defined in the schema.
• Mail part: Click each cell and then select an email part
to be extracted.
This table appears only when MSG is selected from the Mail
type drop-down list.
and complete the process for error-free rows.
Advanced settings
Global Variables
Global Variables EXPORTED_FILE_PATH: the directory to export mail

attachment. This is a Flow variable and it returns a string.
check box.
use from it.
User Guide.
Usage

output. It is defined as an intermediary step.
1049
tFileInputMail
Extracting key fields from an email

This Java scenario describes a two-component Job that extracts some key standard fields and displays
the values on the Run console.
Procedure
Procedure
1. Drop a tFileInputMail and a tLogRow component from the Palette to the design workspace.
2. Connect the two components together using a Main Row link.
3. Double-click tFileInputMail to display its Basic settings view and define the component
properties.
4. Click the three-dot button next to the File Name field and browse to the mail file to be processed.
5. Set schema type to Built-in and click the three-dot button next to Edit schema to open a dialog
box where you can define the schema including all columns you want to retrieve on your output.
6. Click the plus button in the dialog box to add as many columns as you want to include in the
output flow. In this example, the schema has four columns: Date, Author, Object and Status.
7. Once the schema is defined, click OK to close the dialog box and propagate the schema into the
Mail parts table.
8. Click the three-dot button next to Attachment export directory and browse to the directory in
which you want to export email attachments, if any.
9. In the Mail part column of the Mail parts table, type in the actual header or body standard keys
that will be used to retrieve the values to be displayed.
10. Select the Multi Value check box next to any of the standard keys if more than one value for the
relative standard key is present in the input file.
1050
tFileInputMail
11. If needed, define a separator for the different values of the relative standard key in the Separator
field.
12. Double-click tLogRow to display its Basic settings view and define the component properties in
order for the values to be separated by a carriage return. On Windows OS, type in \n between
double quotes.
13. Save your Job and press F6 to execute it and display the output flow on the console.
Results
The header key values are extracted as defined in the Mail parts table. Mail reception date, author,
subject and status are displayed on the console.
1051
tFileInputMSDelimited
Reads the data structures (schemas) of a multi-structured delimited file and sends the fields as
defined in the different schemas to the next components using Row connections.
tFileInputMSDelimited Standard properties

These properties are used to configure tFileInputMSDelimited running in the Standard Job framework.
The Standard tFileInputMSDelimited component belongs to the File family.
Basic settings
Multi Schema Editor The Multi Schema Editor helps to build and configure the
data flow in a multi-structure delimited file to associate
one schema per output.
For more information, see The Multi Schema Editor on page
1053.
Output Lists all the schemas you define in the Multi Schema
Editor, along with the related record type and the field
separator that corresponds to every schema, if different field
separators are used.
Advanced settings
Trim all column Select this check box to remove leading and trailing
whitespaces from defined columns.
Advanced separator (for numbers) Select this check box to modify the separators used for nu
mbers:
Global Variables

1052

check box.
use from it.
User Guide.
Usage
Usage rule Use this component to read multi-structured delimited files

and separate fields contained in these files using a defined
separator.

The Multi Schema Editor

The Multi Schema Editor enables you to:
• set the path to the source file,
Warning: Use absolute path (instead of relative path) for this field to avoid possible errors.
• define the source file properties,
• define data structure for each of the output schemas.
When you define data structure for each of the output schemas in the Multi Schema Editor, column
names in the different data structures automatically appear in the input schema lists of the
components that come after tFileInputMSDelimited. However, you can still define data structures
directly in the Basic settings view of each of these components.
The Multi Schema Editor also helps to declare the schema that should act as the source schema
(primary key) from the incoming data to insure its unicity.The editor uses this mapping to associate all
schemas processed in the delimited file to the source schema in the same file.
The editor opens with the first column, that usually holds the record type indicator, selected by
default. However, once the editor is open, you can select the check box of any of the schema columns
to define it as a primary key.
The below figure illustrates an example of the Multi Schema Editor.
1053
For detailed information about the usage of the Multi Schema Editor, see Reading a multi structure
delimited file on page 1054.
Reading a multi structure delimited file

The following scenario creates a Java Job which aims at reading three schemas in a delimited file and
displaying their data structure on the Run Job console.
The delimited file processed in this example looks like the following:
1054

Procedure
1. Drop a tFileInputMSDelimited component and three tLogRow components from the Palette onto
2. In the design workspace, right-click tFileInputMSDelimited and connect it to tLogRow1,
tLogRow2, and tLogRow3 using the row_A_1, row_B_1, and row_C_1 links respectively.

Procedure
1. Double-click tFileInputMSDelimited to open the Multi Schema Editor.
1055
2. Click Browse... next to the File name field to locate the multi schema delimited file you need to
process.
3. In the File Settings area:
-Select from the list the encoding type the source file is encoded in. This setting is meant to
ensure encoding consistency throughout all input and output files.
-Select the field and row separators used in the source file.
Note:
Select the Use Multiple Separator check box and define the fields that follow accordingly if
different field separators are used to separate schemas in the source file.
A preview of the source file data displays automatically in the Preview panel.
1056
Note:
Column 0 that usually holds the record type indicator is selected by default. However, you can
select the check box of any of the other columns to define it as a primary key.
4. Click Fetch Codes to the right of the Preview panel to list the type of schema and records you
have in the source file. In this scenario, the source file has three schema types (A, B, C).
Click each schema type in the Fetch Codes panel to display its data structure below the Preview
panel.
5. Click in the name cells and set column names for each of the selected schema.
In this scenario, column names read as the following:
-Schema A: Type, DiscName, Author, Date,
-Schema B: Type, SongName,
1057
-Schema C: Type, LibraryName.

You need now to set the primary key from the incoming data to insure its unicity (DiscName in this
scenario). To do that:
6. In the Fetch Codes panel, select the schema holding the column you want to set as the primary
key (schema A in this scenario) to display its data structure.
7. Click in the Key cell that corresponds to the DiscName column and select the check box that app
ears.
8. Click anywhere in the editor and the false in the Key cell will become true.
You need now to declare the parent schema by which you want to group the other "children"
schemas (DiscName in this scenario). To do that:
9. In the Fetch Codes panel, select schema B and click the right arrow button to move it to the right.
Then, do the same with schema C.
Note:
The Cardinality field is not compulsory. It helps you to define the number (or range) of fields
in "children" schemas attached to the parent schema. However, if you set the wrong number or
range and try to execute the Job, an error message will display.
10. In the Multi Schema Editor, click OK to validate all the changes you did and close the editor.
The three defined schemas along with the corresponding record types and field separators display
automatically in the Basic settings view of tFileInputMSDelimited.
1058
The three schemas you defined in the Multi Schema Editor are automatically passed to the three
tLogRow components.
11. If needed, click the Edit schema button in the Basic settings view of each of the tLogRow
components to view the input and output data structures you defined in the Multi Schema Editor
or to modify them.

Procedure
The multi schema delimited file is read row by row and the extracted fields are displayed on the
Run Job console as defined in the [Multi Schema Editor].
1059
1060
tFileInputMSPositional
Reads the data structures (schemas) of a multi-structured positional file and sends the fields as
defined in the different schemas to the next components using Row connections.
tFileInputMSPositional Standard properties

These properties are used to configure tFileInputMSPositional running in the Standard Job framework.
The Standard tFileInputMSPositional component belongs to the File family.
Basic settings

File name/Stream Name of the file and/or the variable to be processed


Row separator String (ex: "\n"on Unix) to distinguish rows.
Header Field Position Start-end position of the schema identifier.
Records Schema: define as many schemas as needed.

Header value: value in the row that identifies a schema.
Pattern: string which represents the length of each column
of the schema, separated by commas. Make sure the values
defined in this field are relevant with the defined schema.
Reject incorrect row size: select the check boxes of the
schemas where to reject incorrect row size.
Parent row: Select the parent row from the drop-down list.
By default, it is <Empty>.
Parent key column: Type in the parent key column name. If
the parent row is not <Empty>, this field must be filled with
a column name of the parent row schema.
Key column: Type in the key column name.
Skip from header Number of rows to be skipped in the beginning of file.
Skip from footer Number of rows to be skipped at the end of the file.
1061

Die on parse error Let the component die if an parsing error occurs.
Die on unknown header type Length values separated by commas, interpreted as a string
between quotes. Make sure the values entered in this fields
are consistent with the schema defined.
Advanced settings
Process long rows (needed for processing rows longer than Select this check box to process long rows (this is necessary
100,000 characters wide) to process rows longer than 100 000 characters).
mbers:
handling.
Global Variables

NB_LINE_REJECTED: the number of rows rejected. This is a
Flow variable and it returns an integer.
NB_LINE_UNKOWN_HEADER_TYPES: the number of rows
with unknown header type. This is a Flow variable and it
returns an integer.
NB_LINE_PARSE_ERRORS: the number of rows with parse
errors. This is a Flow variable and it returns an integer.
check box.
use from it.
1062

User Guide.
Usage
Usage rule Use this component to read a multi schemas positional file
and separate fields using a position separator value. You
can also create a rejection flow using a Row > Reject link
to filter the data which does not correspond to the type
defined. For an example of how to use these two links, see
Procedure on page 975.
Reading data from a positional file

The following scenario reads data from a positional file, which contains two schemas. The positional
file is shown below:
schema_1 (car_owner):schema_id;car_make;owner;age
schema_2 (car-insurance):schema_id;car_owner;age;car_insurance
1bmw John 45
1bench Mike 30
2John 45 yes
2Mike 50 No
Dropping the components

Procedure
1. Drop one tFileInputMSPositional and two tLogRow from the Palette to the design workspace.
2. Rename the two tLogRow components as car_owner and car_insurance.

Procedure
1. Double-click the tFileInputMSPositional component to show its Basic settings view and define its
properties.
1063
2. In the File name/Stream field, type in the path to the input file. Also, you can click the [...] button
to browse and choose the file.
3. In the Header Field Position field, enter the start-end position for the schema identifier in the
input file, 0-1 in this case as the first character in each row is the schema identifier.
4. Click the [+] button twice to added two rows in the Records table.
5. Click the cell under the Schema column to show the [...] button.
Click the [...] button to show the schema naming box.
6. Enter the schema name and click OK.

The schema name appears in the cell and the schema editor opens.
1064
7. Define the schema car_owner, which has four columns: schema_id, car_make, owner and age.
8. Repeat the steps to define the schema car_insurance, which has four columns: schema_id,
car_owner, age and car_insurance.
9. Connect tFileInputMSPositional to the car_owner component with the Row > car_owner link, and
the car_insurance component with the Row > car_insurance link.
10. In the Header value column, type in the schema identifier value for the schema, 1 for the schema
car_owner and 2 for the schema car_insurance in this case.
11. In the Pattern column, type in the length of each field in the schema, the number of characters,
number, etc in each field, 1,8,10,3 for the schema car_owner and 1,10,3,3 for the schema
car_insurance in this case.
12. In the Skip from header field, type in the number of beginning rows to skip, 2 in this case as the
two rows in the beginning just describes the two schemas, instead of the values.
13. Choose Table (print values in cells of a table) in the Mode area of the components car_owner and
car_insurance.
Executing the Job

Procedure
1065
The file is read row by row based on the length values defined in the Pattern field and output in
two tables with different schemas.
1066
tFileInputMSXML
tFileInputMSXML
Reads the data structures (schemas) of a multi-structured XML file and sends the fields as defined in
the different schemas to the next components using Row connections.
tFileInputMSXML Standard properties

These properties are used to configure tFileInputMSXML running in the Standard Job framework.
The Standard tFileInputMSXML component belongs to the File and the XML families.
Basic settings
File Name Name of the file and/or the variable to be processed.


Root XPath query The root of the XML tree, which the query is based on.
Enable XPath in column "Schema XPath loop" but lose the o Select this check box if you want to define a XPath path in
rder the Schema XPath loop field of the Outputs table while not
keeping the order of the data shown in the source XML file.
Warning:
This options takes effect only if you select the Dom4j
generation mode in the Advanced settings view.
Outputs Schema: Define as many schemas as needed.

Schema XPath loop: Enter the node of the XML tree or
XPath path which the loop is based on.
XPath Queries: Enter the fields to be extracted from the
structured input.
Create empty row: Select this check box if you want to
create empty rows for the empty field(s) in the schema.
Advanced settings
1067
tFileInputMSXML
Ignore DTD file Select this check box to ignore the DTD file indicated in the
XML file being processed.
Generation mode Select the appropriate generation mode according to your

memory availability. The available modes are:
Note:
• Fast with low memory consumption (SAX)
Encoding Select the encoding type from the list or select CUSTOM
handling.
Global Variables

check box.
use from it.
User Guide.
Reading a multi-structure XML file

The following scenario describes a Job which reads a multi-structure XML file, extracts the desired
fields and displays them on the console.
Designing the Job

Procedure
1. Drop a tFileInputMSXML component from the Palette onto the design workspace and double-click
the component to open its Basic settings view in the Component tab.
1068
tFileInputMSXML
2. Browse to the XML file you want to process. In this example, it is D:/Input/multischema_xml.xml,
which contains the following data:
<root>
<toy>Cat</toy>
<record>We Belong Together</record>
<book>As You Like It</book>
<book>All's Well That Ends Well</book>
<record>When You Believe</record>
<toy>Dog</toy>
</root>
3. In the Root XPath query field, enter the root of the XML tree, which the query will be based on. In
this example, it is "/root".
4. Select the Enable XPath in column "Schema XPath loop" but lose the order check box.
In this example, to extract the desired fields, you need to define a XPath path in the Schema
XPath loop field in the Outputs table for each output flow while not keeping the order of the data
shown in the source XML file.
5. Click the plus button to add lines in the Outputs table where you can define the output schemas,
record and book in this example.
6. In the Outputs table, click in the Schema cell and then click a three-dot button to display a dialog
box where you can define the schema name.
Enter a name for the output schema and click OK to close the dialog box.
7. The tFileInputMSXML schema editor appears.

Define the schema according to your need.
1069
tFileInputMSXML
8. Do the same to define the output schema record.

9. In the Schema XPath loop cell, enter the node of the XML tree, which the loop is based on. In this
example, enter "/book" and "/record" respectively.
10. In the XPath Queries cell, enter the fields to be extracted from the structured XML input. In this
example, enter the XPath query ".".
11. In the design workspace, drop two tLogRow compnents from the Palette and connect
tFileInputMSXML to tLogRow1 and tLogRow2 using the book and record links respectively.
Rename the two tLogRow components as book and record respectively.

Procedure
The multi-structure XML file is read row by row and the extracted fields are displayed on the
console. The first two fields are for the book schema, and the last two fields are for the record
schema.
1070
tFileInputMSXML
1071
tFileInputPositional
Reads a positional file row by row to split them up into fields based on a given pattern and then sends
the fields as defined in the schema to the next component.
tFileInputPositional Standard properties

These properties are used to configure tFileInputPositional running in the Standard Job framework.
The Standard tFileInputPositional component belongs to the File family.
Basic settings

are stored.
File name/Stream File name: Name and path of the file to be processed.


added to the flow in order for tFileInputPositional to fetch
these data via the corresponding representative variable.
along with this component, for example, the INPUT_STREAM
variable of tFileFetch; otherwise, you could define it
User Guide.
Related scenario to the input stream, see Reading data from
a remote file in streaming mode on page 1020.
Use byte length as the cardinality Select this check box to enable the support of double-byte
character to this component. JDK 1.6 is required for this
feature.
Customize Select this check box to customize the data format of the
positional file and define the table columns:
1072
Padding char: Enter, between double quotation marks, the

padding charater you need to remove from the field. A space
by default.
Pattern Length values separated by commas, interpreted as a string

between quotes. Make sure the values entered in this field
are consistent with the schema defined.
Pattern Units The unit of the length values specified in the Pattern field.
• Bytes: With this option selected, the length values in
the Pattern field should be the count of bytes that
represent symbols in original encoding of the input file.
• Symbols: With this option selected, the length values
in the Pattern field should be the count of regular
symbols, not including surrogate pairs.
• Symbols (including rare): With this option selected,
the length values in the Pattern field should be the
count of symbols, including rare symbols such as
surrogate pairs, and each surrogate pair counts as a
single symbol. Considering the performance factor, it is
not recommended to use this option when your input
data consists of only regular symbols.
Uncompress as zip file Select this check box to uncompress the input file.
an error occurs.

file.

available:
only.
1073

This component must work with tSetDynamicSchema to
leverage the dynamic schema feature.

Guide.

Advanced settings
Needed to process rows longer than 100 000 characters Select this check box if the rows to be processed in the
input file are longer than 100 000 characters.
docs.oracle.com.
Global Variables

check box.
use from it.
User Guide.
1074
Usage
Usage rule Use this component to read a file and separate fields using
a position separator value. You can also create a rejection
flow using a Row > Reject link to filter the data which does
not correspond to the type defined. For an example of how
to use these two links, see Procedure on page 975.
Reading a Positional file and saving filtered results to XML

The following scenario describes a two-component Job, which aims at reading data from an input file
that contains contract numbers, customer references, and insurance numbers as shown below, and
outputting the selected data (according to the data position) into an XML file.
Contract CustomerRef InsuranceNr
00001 8200 50330
00001 8201 50331
00002 8202 50332
00002 8203 50333

About this task
Procedure
1. Drop a tFileInputPositional component from the Palette to the design workspace.
2. Drop a tFileOutputXML component as well. This file is meant to receive the references in a
structured way.
3. Right-click the tFileInputPositional component and select Row > Main. Then drag it onto the
tFileOutputXML component and release when the plug symbol shows up.
Configuring data input

Procedure
1. Double-click the tFileInputPositional component to show its Basic settings view and define its
properties.
1075
2. Define the Job Property type if needed. For this scenario, we use the built-in Property type.
As opposed to the Repository, this means that the Property type is set for this station only.
3. Fill in a path to the input file in the File Name field. This field is mandatory.
4. Define the Row separator identifying the end of a row if needed, by default, a carriage return.
5. If required, select the Use byte length as the cardinality check box to enable the support of
double-byte character.
6. Define the Pattern to delimit fields in a row. The pattern is a series of length values corresponding
to the values of your input files. The values should be entered between quotes, and separated by
a comma. Make sure the values you enter match the schema defined.
7. Fill in the Header, Footer and Limit fields according to your input file structure and your need. In
this scenario, we only need to skip the first row when reading the input file. To do this, fill the
Header field with 1 and leave the other fields as they are.
8. Next to Schema, select Repository if the input schema is stored in the Repository. In this use case,
we use a Built-In input schema to define the data to pass on to the tFileOutputXML component.
9. You can load and/or edit the schema via the Edit Schema function. For this schema, define three
columns, respectively Contract, CustomerRef and InsuranceNr matching the structure of the input fi
le. Then, click OK to close the Schema dialog box and propagate the changes.
1076
Configuring data output

Procedure
1. Double-click tFileOutputXML to show its Basic settings view.
2. Enter the XML output file path.

3. Define the row tag that will wrap each row of data, in this use case ContractRef.
4. Click the three-dot button next to Edit schema to view the data structure, and click Sync columns
to retrieve the data structure from the input component if needed.
5. Switch to the Advanced settings tab view to define other settings for the XML output.
6. Click the plus button to add a line in the Root tags table, and enter a root tag (or more) to wrap
the XML output structure, in this case ContractsList.
7. Define parameters in the Output format table if needed. For example, select the As attribute
check box for a column if you want to use its name and value as an attribute for the parent XML
element, clear the Use schema column name check box for a column to reuse the column label
from the input schema as the tag label. In this use case, we keep all the default output format
settings as they are.
8. To group output rows according to the contract number, select the Use dynamic grouping check
box, add a line in the Group by table, select Contract from the Column list field, and enter an
attribute for it in the Attribute label field.
1077
9. Leave all the other parameters as they are.

Procedure
1. Press Ctrl+S to save your Job to ensure that all the configured parameters take effect.
The file is read row by row based on the length values defined in the Pattern field and output as
an XML file as defined in the output settings. You can open it using any standard XML editor.
1078
tFileInputProperties
Reads a text file row by row and separates the fields according to the model key = value.
tFileInputProperties Standard properties

These properties are used to configure tFileInputProperties running in the Standard Job framework.
The Standard tFileInputProperties component belongs to the File family.
Basic settings
in the Repository.
For this component, the schema is read-only. It is made of
two column, Key and Value, corresponding to the parameter
name and the parameter value to be copied.
File format Select from the list your file format, either: .properties or
.ini.
.properties: data in the configuration file is written in two

lines and structured according to the following way: key =
value.
.ini: data in the configuration file is written in two lines and

structured according to the following way: key = value and
re-grouped in sections.
Section Name: enter the section name on which the
iteration is based.
to be used.
variable, see Talend Studio User Guide.

Calculate MD5 Hash Select this check box to verify that the file to be processed
has been correctly downloaded.
Advanced settings
handling.
1079
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule Use this component to read a text file and separate data
according to the structure key = value.
Reading and matching the keys and the values of

different .properties files and outputting the results in a
glossary
This four-component Job reads two .properties files, one in French and the other in English. The data
in the two input files is mapped to output a glossary matching the English and French terms.
The two input files used in this scenario hold localization strings for the tMysqlInput component in
Talend Studio .
1080
The glossary displays on the console listing three columns holding: the key name in the first column,
the English term in the second, and the corresponding French term in the third.

Procedure
1. Drop the following components from the Palette onto the design workspace: tFileInputProperties
(x2), tMap, and tLogRow.
2. Connect the component together using Row > Main links. The second properties file, FR, is used as
a lookup flow.

Procedure
1. Double-click the first tFileInputProperties component to open its Basic settings view and define
its properties.
1081
2. In the File Format field, select your file format.

3. In the File Name field, click the three-dot button and browse to the input .properties file you want
to use.
4. Do the same with the second tFileInputProperties and browse to the French .properties file this
time.
5. Double-click the tMap component to open the tMap editor.
1082
6. Select all columns from the English_terms table and drop them to the output table.
Select the key column from the English_terms table and drop it to the key column in the
French_terms table.
7. In the glossary table in the lower right corner of the tMap editor, rename the value field to EN
because it will hold the values of the English file.
8. Click the plus button to add a line to the glossary table and rename it to FR.
9. In the Length field, set the maximum length to 255.
10. In the upper left corner of the tMap editor, select the value column in the English_terms table and
drop it to the FR column in the French_terms table. When done, click OK to validate your changes
and close the map editor and propagate the changes to the next component.

Procedure
2. Press F6 or click the Run button from the Run tab to execute it.
1083
1084
tFileInputRaw
tFileInputRaw
Reads all data in a raw file and sends it to a single output column for subsequent processing by
another component.
tFileInputRaw Standard properties

These properties are used to configure tFileInputRaw running in the Standard Job framework.
The Standard tFileInputRaw component belongs to the File family.
Basic settings
available:
only.

Filename The name of and path of the input file to be processed,

which you can enter manually between double quotes or
browse and select by clicking the [...] button.

Mode Read the file as a string: The content of the file is read as a
string.
Read the file as a bytes array: The content of the file is read
as a bytes array.
Stream the file: As soon as the first character is entered in
the source file, it is read immediately.
1085
tFileInputRaw
Encoding If you are using the Read the file as a string mode, select
the encoding type from the list or select Custom and define
it manually.
Advanced settings
Global Variables
Global Variables FILENAME_PATH: the path of the input file. This is an After
check box.
use from it.
User Guide.
Usage
Usage rule Use this component to provide input data for Jobs that
require a single column of data or that require a whole file
to be read as a single column.

Related Scenario
For a related use case, see:
• Uploading files to Dropbox on page 655
1086
tFileInputRegex
tFileInputRegex
Reads a file row by row to split them up into fields using regular expressions and sends the fields as
defined in the schema to the next component.
Powerful feature which can replace number of other components of the File family. Requires some
advanced knowledge on regular expression syntax.
tFileInputRegex Standard properties

These properties are used to configure tFileInputRegex running in the Standard Job framework.
The Standard tFileInputRegex component belongs to the File family.
Basic settings

are stored.
File name/Stream File name: Name of the file and/or the variable to be
processed.

Stream: Data flow to be processed. The data must be added

to the flow so that it can be collected by the tFileInputRege
x via the INPUT_STREAM variable in the autocompletion list
(Ctrl+Space).
Regex Type in your Java regular expression including the

subpattern matching the fields to be extracted. This field
can contain multiple lines.
Note: Antislashes need to be doubled in regexp
Warning:
• The regular expression needs to be in double
quotes.
• To extract all the desired strings, make sure the
regular expression contains the corresponding
subpatterns that match the strings. Also, each
subpattern in the regular expression needs to be in
a pair of brackets.
1087
tFileInputRegex

file.

Ignore error message for the unmatched record Select this check box to avoid outputing error messages for
records that do not match the specified regex. This check
box is cleared by default.
available:
only.
component only.

Job designs.
an error occurs.
Advanced settings
docs.oracle.com.
In the Map/Reduce version of tFileInputRegex, you need to
select the Custom encoding check box to display this list.
1088
tFileInputRegex
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule Use this component to read a file and separate fields
contained in this file according to the defined Regex. You
can also create a rejection flow using a Row > Reject link
to filter the data which doesn't correspond to the type
Reading data using a Regex and outputting the result to

Positional file
The following scenario creates a two-component Job, reading data from an Input file using regular
expression and outputting delimited data into a positional file.

Procedure
1. Drop a tFileInputRegex component from the Palette to the design workspace.
2. Drop a tFileOutputPositional component the same way.
3. Right-click on the tFileInputRegex component and select Row > Main. Drag this main row link
onto the tFileOutputPositional component and release when the plug symbol displays.
1089
tFileInputRegex

Procedure
1. Select the tFileInputRegex again so the Component view shows up, and define the properties:
2. The Job is built-in for this scenario. Hence, the Properties are set for this station only.
3. Fill in a path to the file in File Name field. This field is mandatory.
4. Define the Row separator identifying the end of a row.
5. Then define the Regular expression in order to delimit fields of a row, which are to be passed on
to the next component. You can type in a regular expression using Java code, and on mutiple lines
if needed.
Warning:
Regex syntax requires double quotes.
6. In this expression, make sure you include all subpatterns matching the fields to be extracted.
7. In this scenario, ignore the header, footer and limit fields.
8. Select a local (Built-in) Schema to define the data to pass on to the tFileOutputPositional
component.
9. You can load or create the schema through the Edit Schema function.
10. Then define the second component properties:
1090
tFileInputRegex
11. Enter the Positional file output path.

12. Enter the Encoding standard, the output file is encoded in. Note that, for the time being, the
encoding consistency verification is not supported.
13. Select the Schema type. Click on Sync columns to automatically synchronize the schema with the
Input file schema.

Procedure
2. Now go to the Run tab, and click on Run to execute the Job.
The file is read row by row and split up into fields based on the Regular Expression definition. You
can open it using any standard file editor.
1091
tFileInputXML
tFileInputXML
Reads an XML structured file row by row to split them up into fields and sends the fields as defined in
the schema to the next component.
tFileInputXML Standard properties

These properties are used to configure tFileInputXML running in the Standard Job framework.
The Standard tFileInputXML component belongs to the File and the XML families.
Basic settings

are stored.
available:
only.
component only.

Job designs.
File name/Stream File name: Name and path of the file to be processed.


added to the flow in order for tFileInputXML to fetch these
data via the corresponding representative variable.
1092
tFileInputXML

along with this component, for example, the INPUT_STREAM
variable of tFileFetch; otherwise, you could define it

User Guide. Related scenario to the input stream, see
1020.
Loop XPath query Node of the tree, which the loop is based on.
Mapping Column: Columns to map. They reflect the schema as

defined in the Schema type field.
XPath Query: Enter the fields to be extracted from the
structured input.
Get nodes: Select this check box to recuperate the XML
content of all current nodes specified in the Xpath query
list, or select the check box next to specific XML nodes
to recuperate only the content of the selected nodes.
These nodes are important when the output flow from this
component needs to use the XML structure, for example, the
Document data type.
For further information about the Document type, see
Note:
The Get Nodes option functions in the DOM4j and SAX
modes, although in SAX mode namespaces are not s
upported. For further information concerning the DOM4j
and SAX modes, please see the properties noted in the
Generation mode list of the Advanced Settings tab.
Limit Maximum number of rows to be processed. If Limit = 0,

no row is read nor processed. If -1, all rows are read or
processed.
an error occurs.
Advanced settings
Ignore DTD file Select this check box to ignore the DTD file indicated in the
XML file being processed.
1093
tFileInputXML
Thousands separator: define the separators to use for

thousands.
Decimal separator: define the separators to use for decimals.
Ignore the namespaces Select this check box to ignore name spaces.
Generate a temporary file: click the three-dot button to
browse to the XML temporary file and set its path in the
field.
Use Separator for mode Xerces Select this check box if you want to separate concatenated
children node values.
Note:
This field can only be used if the selected Generation
mode is Xerces.
The following field displays:

Field separator: Define the delimiter to be used to separate
the children node values.
docs.oracle.com.
Generation mode From the drop-down list select the generation mode for the
XML file, according to the memory available and the desired
speed:
Note:
• Memory-consuming (Xerces).
• Fast with low memory consumption (SAX)
Global Variables

check box.
1094
tFileInputXML

use from it.
User Guide.
Usage
Usage rule tFileInputXML is for use as an entry componant. It allows

you to create a flow of XML data using a Row > Main link.
You can also create a rejection flow using a Row > Reject
link to filter the data which doesn't correspond to the type
Reading and extracting data from an XML structure

This scenario describes a basic Job that reads a defined XML directory and extracts specific
information and outputs it on the Run console via a tLogRow component.
Procedure
Procedure
1. Drop tFileInputXML and tLogRow from the Palette to the design workspace.
2. Connect both components together using a Main Row link.
3. Double-click tFileInputXML to open its Basic settings view and define the component properties.
1095
tFileInputXML
4. As the street dir file used as input file has been previously defined in the Metadata area, select
Repository as Property type. This way, the properties are automatically leveraged and the rest
of the properties fields are filled in (apart from Schema). For more information regarding the
metadata creation wizards, see Talend Studio User Guide.
5. Select the same way the relevant schema in the Repository metadata list. Edit schema if you want
to make any change to the schema loaded.
6. The Filename shows the structured file to be used as input
7. In Loop XPath query, change if needed the node of the structure where the loop is based.
8. On the Mapping table, fill the fields to be extracted and displayed in the output.
9. If the file size is consequent, fill in a Limit of rows to be read.
10. Enter the encoding if needed then double-click on tLogRow to define the separator character.
Results
The fields defined in the input properties are extracted from the XML structure and displayed on the
console.
Extracting erroneous XML data via a reject flow

This Java scenario describes a three-component Job that reads an XML file and:
1. first, returns correct XML data in an output XML file,
2. and second, displays on the console erroneous XML data which type does not correspond to the
defined one in the schema.
Procedure
Procedure
1. Drop the following components from the Palette to the design workspace: tFileInputXML,
tFileOutputXML and tLogRow.
Right-click tFileInputXML and select Row > Main in the contextual menu and then click
tFileOutputXML to connect the components together.
Right-click tFileInputXML and select Row > Reject in the contextual menu and then click tLogRow
to connect the components together using a reject link.
1096
tFileInputXML
2. Double-click tFileInputXML to display the Basic settings view and define the component
properties.
3. In the Property Type list, select Repository and click the three-dot button next to the field to
display the Repository Content dialog box where you can select the metadata relative to the input
file if you have already stored it in the File xml node under the Metadata folder of the Repository
tree view. The fields that follow are automatically filled with the fetched data. If not, select Built-
in and fill in the fields that follow manually.
For more information about storing schema metadat in the Repository tree view, see Talend
Studio User Guide.
4. In the Schema Type list, select Repository and click the three-dot button to open the dialog box
where you can select the schema that describe the structure of the input file if you have already
stored it in the Repository tree view. If not, select Built-in and click the three-dot button next to
Edit schema to open a dialog box where you can define the schema manually.
1097
tFileInputXML
The schema in this example consists of five columns: id, CustomerName, CustomerAddress, idState
and id2.
5. Click the three-dot button next to the Filename field and browse to the XML file you want to
process.
6. In the Loop XPath query, enter between inverted commas the path of the XML node on which to
loop in order to retrieve data.
In the Mapping table, Column is automatically populated with the defined schema.
In the XPath query column, enter between inverted commas the node of the XML file that holds
the data you want to extract from the corresponding column.
7. In the Limit field, enter the number of lines to be processed, the first 10 lines in this example.
8. Double-click tFileOutputXML to display its Basic settings view and define the component
properties.
9. Click the three-dot button next to the File Name field and browse to the output XML file you want
to collect data in, customer_data.xml in this example.
In the Row tag field, enter between inverted commas the name you want to give to the tag that
will hold the recuperated data.
Click Edit schema to display the schema dialog box and make sure that the schema matches that
of the preceding component. If not, click Sync columns to retrieve the schema from the preceding
component.
10. Double-click tLogRow to display its Basic settings view and define the component properties.
Click Edit schema to open the schema dialog box and make sure that the schema matches that
of the preceding component. If not, click Sync columns to retrieve the schema of the preceding
component.
1098
tFileInputXML
In the Mode area, select the Vertical option.

Results
The output file customer_data.xml holding the correct XML data is created in the defined path and
erroneous XML data is displayed on the console of the Run view.
1099
tFileList
tFileList
Iterates a set of files or folders in a given directory based on a filemask pattern.
Note: This component iterates over every file in a directory, including system file, hidden file, zero-
byte file, and so on, as long as the file meets the conditions set in the Files field.
tFileList Standard properties

These properties are used to configure tFileList running in the Standard Job framework.
The Standard tFileList component belongs to the File and the Orchestration families.
Basic settings
Directory Path to the directory where the files are stored.

FileList Type Select the type of input you want to iterate on from the list:
Files if the input is a set of files,
Directories if the input is a set of directories,
Both if the input is a set of the above two types.
Case Sensitive Set the case mode from the list to either create or not
create case sensitive filter on filenames.
Generate Error if no file found Select this check box to generate an error message if no
files or directories are found.
Use Glob Expressions as Filemask This check box is selected by default. It filters the results
using a Global Expression (Glob Expressions).
Files Click the plus button to add as many filter lines as needed:
Filemask: in the added filter lines, type in a filename or a
filemask using special characters or regular expressions.
Order by The folders are listed first of all, then the files. You can
choose to prioritise the folder and file order either:
By default: alphabetical order, by folder then file;
By file name: alphabetical order or reverese alphabetical
order;
By file size: smallest to largest or largest to smallest;
By modified date: most recent to least recent or least recent
to most recent.
1100
tFileList
Note:
If ordering by file name, in the event of identical file
names then modified date takes precedence. If ordering
by file size, in the event of identical file sizes then file
name takes precedence. If ordering by modified date,
in the event of identical dates then file name takes
precedence.
Order action Select a sort order by clicking one of the following radio
buttons:
ASC: ascending order;
DESC: descending order;
Advanced settings
Use Exclude Filemask Select this check box to enable Exclude Filemask field to
exclude filtering condition based on file type:
Exclude Filemask: Fill in the field with file types to be
excluded from the Filemasks in the Basic settings view.
Note:
File types in this field should be quoted with double
quotation marks and seperated by comma.
Format file path to slash(/) style(useful on Windows) Select this check box to format the file path to slash(/) style
which is useful on Windows.
Global Variables
Global Variables CURRENT_FILE: the current file name. This is a Flow

CURRENT_FILEEXTENSION: the extension of the current file.
CURRENT_FILEDIRECTORY: the current file directory. This is
NB_FILE: the number of files iterated upon so far. This is a
check box.
1101
tFileList

use from it.
User Guide.
Usage
Usage rule tFileList provides a list of files or folders from a defined

directory on which it iterates

Row: Iterate

Row: Iterate.
Parallelize.

Studio User Guide.
Iterating on a file directory

The following scenario creates a three-component Job, which aims at listing files from a defined
directory, reading each file by iteration, selecting delimited data and displaying the output in the Run
log console.

Procedure
1. Drop the following components from the Palette to the design workspace: tFileList, tFileInputDeli
mited, and tLogRow.
2. Right-click the tFileList component, and pull an Iterate connection to the tFileInputDelimited
component. Then pull a Main row from the tFileInputDelimited to the tLogRow component.

Procedure
1. Double-click tFileList to display its Basic settings view and define its properties.
1102
tFileList
2. Browse to the Directory that holds the files you want to process. To display the path on the Job
itself, use the label (__DIRECTORY__) that shows up when you put the pointer anywhere in the
Directory field. Type in this label in the Label Format field you can find if you click the View tab in
the Basic settings view.
3. In the Basic settings view and from the FileList Type list, select the source type you want to
process, Files in this example.
4. In the Case sensitive list, select a case mode, Yes in this example to create case sensitive filter on
file names.
5. Keep the Use Glob Expressions as Filemask check box selected if you want to use global
expressions to filter files, and define a file mask in the Filemask field.
6. Double-click tFileInputDelimited to display its Basic settings view and set its properties.
7. Enter the File Name field using a variable containing the current filename path, as you filled in
the Basic settings of tFileList. Press Ctrl+Space bar to access the autocomplete list of variables,
and select the global variable ((String)globalMap.get("tFileList_1_CURREN
T_FILEPATH")) . This way, all files in the input directory can be processed.
8. Fill in all other fields as detailed in the tFileInputDelimited section. Related topic:
tFileInputDelimited on page 1015.
9. Select the last component, tLogRow, to display its Basic settings view and fill in the separator to
be used to distinguish field content displayed on the console. Related topic: tLogRow on page
1977.
1103
tFileList
Executing the Job

Press Ctrl + S to save your Job, and press F6 to run it.
The Job iterates on the defined directory, and reads all included files. Then delimited data is passed
on to the last component which displays it on the console.
Finding duplicate files between two folders

This scenario describes a Job that iterates on files in two folders, transforms the iteration results to
data flows to obtain a list of filenames, and then picks up all duplicates from the list and shows them
on the Run console, as a preparation step before merging the two folders, for example.
1104
tFileList

Procedure
1. From the Palette, drop two tFileList components, two tIterateToFlow components, two
tFileOutputDelimited components, a tFileInputDelimited component, a tUniqRow component, and
a tLogRow component onto the design workspace.
2. Link the first tFileList component to the first tIterateToFlow component using a Row > Iterate
connection, and the connect the first tIterateToFlow component to the first tFileOutputDelimited
component using a Row > Main connection to form the first subJob.
3. Link the second tFileList component to the second tIterateToFlow component using a Row
> Iterate connection, and the connect the second tIterateToFlow component to the second
tFileOutputDelimited component using a Row > Main connection to form the second subJob.
4. Link the tFileInputDelimited to the tUniqRow component using a Row > Main connection, and the
tUniqRow component to the tLogRow component using a Row > Duplicates connection to form
the third subJob.
5. Link the three subJobs using Trigger > On Subjob Ok connections so that they will be triggered
one after another, and label the components to better identify their roles in the Job.

Procedure
1. In the Basic settings view of the first tFileList component, fill the Directory field with the path to
the first folder you want to read filenames from, E:/DataFiles/DI/images in this example, and leave
the other settings as they are.
1105
tFileList
2. Double-click the first tIterateToFlow component to show its Basic settings view.
3. Double-click the [...] button next to Edit schema to open the Schema dialog box and define the
schema of the text file the next component will write filenames to. When done, click OK to close
the dialog box and propagate the schema to the next component.
In this example, the schema contains only one column: Filename.
1106
tFileList
4. In Value field of the Mapping table, press Ctrl+Space to access the autocomplete list of variables,
and select the global variable ((String)globalMap.get("tFileList_1_CURREN
T_FILE")) to read the name of each file in the input directory, which will be put into a data
flow to pass to the next component.
5. In the Basic settings view of the first tFileOutputDelimited component, fill the File Name field
with the path of the text file that will store the filenames from the incoming flow, D:/temp/tempda
ta.csv in this example. This completes the configuration of the first subJob.
6. Repeat the steps above to complete the configuration of the second subJob, but:
• fill the Directory field in the Basic settings view of the second tFileList component with the
other folder you want to read filenames from, E:/DataFiles/DQ/images in this example.
• select the Append check box in the Basic settings view of the second tFileOutputDelimited
component so that the filenames previously written to the text file will not be overwritten.
7. In the Basic settings view of the tFileInputDelimited component, fill the File name/Stream
field with the path of the text file that stores the list of filenames, D:/temp/tempdata.csv in this
example, and define the file schema, which contains only one column in this example, Filename.
1107
tFileList
8. In the Basic settings view of the tUniqRow component, select the Key attribute check box for the
only column, Filename in this example.
9. In the Basic settings view of the tLogRow component, select the Table (print values in cells of a
table) option for better display effect.
Executing the Job

Procedure
2. Click Run or press F6 to run the Job.
All the duplicate files between the selected folders are displayed on the console.
1108
tFileList
Results
For other scenarios using tFileList, see tFileCopy on page 988.
1109
tFileOutputARFF
tFileOutputARFF
Writes an ARFF file that holds data organized according to the defined schema.
tFileOutputARFF Standard properties

These properties are used to configure tFileOutputARFF running in the Standard Job framework.
The Standard tFileOutputARFF component belongs to the File family.
Basic settings


Excel file connection parameters you set in the component
File name Name or path to the output file and/or the variable to be
used.

Attribute Define Displays the schema you defined in the Edit schema dialog
box.
Column: Name of the column.
Type: Data type.
Pattern: Enter the data model (pattern), if necessary.
Relation Enter the name of the relation.
Append Select this check box to add the new rows at the end of the
file.
in the Repository.
available:
1110
tFileOutputARFF

only.
Built-in: You can create the schema and store it locally for
this component. Related topic: see Talend Studio User Guide.
Repository: You have already created and stored the

schema in the Repository. You can reuse it in various
projects and Job flowcharts. Related topic: see Talend Studio
User Guide.
Create directory if not exists This check box is selected by default. It creates a directory
to hold the output table if it does not exist.
Advanced settings
Don't generate empty file Select this check box if you do not want to generate empty
files.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule Use this component along with a Row link to collect data
from another component and to re-write the data to an
ARFF file.
1111
tFileOutputARFF
Code field with a context variable to choose your HDFS
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
unusable.
Guide.

Row: Main.
Trigger: On Subjob Ok; On Subjob Error; Run if.

Component Ok; On Component Error; Synchronize; Paralle
lize.

Studio User Guide.

Related scenario
For tFileOutputARFF related scenario, see Displaying the content of a ARFF file on page 1011.
1112
tFileOutputDelimited
Outputs the input data to a delimited file according to the defined schema.
tFileOutputDelimited Standard properties

These properties are used to configure tFileOutputDelimited running in the Standard Job framework.
The Standard tFileOutputDelimited component belongs to the File family.
Basic settings

are stored.
1020.
File Name Name or path to the output file and/or the variable to be
used.

Row Separator The separator used to identify the end of a row.

1113
file.
Include Header Select this check box to include the column header to the
file.
Compress as zip file Select this check box to compress the output file in zip
format.
available:
only.
component only.

Job designs.
Advanced settings
CSV options Select this check box to specify the following CSV
parameters:
• Escape char: enter the escape character between
• Text enclosure: enter the enclosure character (only
one character) between double quotation marks.
For example, """ needs to be entered when double
quotation marks (") are used as the enclosure character.
It is recommended to use standard escape character, that
is "\". Otherwise, you should set the same character for
1114
Escape char and Text enclosure. For example, if the escape

character is set to "\", the text enclosure can be set to any
other character. On the other hand, if the escape character
is set to other character rather than "\", the text enclosure
can be set to any other characters. However, the escape
character will be changed to the same character as the text
enclosure. For instance, if the escape character is set to "#"
and the text enclosure is set to "@", the escape character
will be changed to "@", not "#".
Create directory if not exists This check box is selected by default. It creates the directory
that holds the output delimited file, if it does not already
exist.
Split output in several files In case of very big output files, select this check box to
divide the output delimited file into several files.
Rows in each output file: set the number of lines in each of
the output files.
Custom the flush buffer size Select this check box to define the number of lines to write
before emptying the buffer.
Row Number: set the number of lines to write.
Output in row mode Select this check box to ensure atomicity of the flush so
that each row of data can remain consistent as a set and
incomplete rows of data are never written to a file.
This check box is mostly useful when using this component
in the multi-thread situation.
docs.oracle.com.
files.
Throw an error if the file already exist Select this check box to throw an exception if the output
file specified in the File Name field on the Basic settings
view already exists.
Clear this check box to overwrite the existing file.
Global Variables

FILE_NAME: the name of the file being processed. This is a
check box.
1115

use from it.
User Guide.
Usage
Usage rule Use this component to write a delimited file and separate
fields using a field separator value.

Writing data in a delimited file

This scenario describes a three-component Job that extracts certain data from a file holding
information about clients, customers, and then writes the extracted data in a delimited file.
In the following example, we have already stored the input schema under the Metadata node in the
Repository tree view. For more information about storing schema metadata in the Repository, see

Procedure
1. In the Repository tree view, expand Metadata and File delimited in succession and then browse to
your input schema, customers, and drop it on the design workspace. A dialog box displays where
you can select the component type you want to use.
1116
2. Click tFileInputDelimited and then OK to close the dialog box. A tFileInputDelimited component
holding the name of your input schema appears on the design workspace.
3. Drop a tMap component and a tFileOutputDelimited component from the Palette to the design
workspace.
4. Link the components together using Row > Main connections.

Procedure
1. Double-click tFileInputDelimited to open its Basic settings view. All its property fields are
automatically filled in because you defined your input file locally.
2. If you do not define your input file locally in the Repository tree view, fill in the details manually
after selecting Built-in in the Property type list.
3. Click the [...] button next to the File Name field and browse to the input file, customer.csv in this
example.
1117
Warning:
executing your Job.
4. In the Row Separators and Field Separators fields, enter respectively "\n" and ";" as line and field
separators.
5. If needed, set the number of lines used as header and the number of lines used as footer in the
corresponding fields and then set a limit for the number of processed rows.
In this example, Header is set to 6 while Footer and Limit are not set.
6. In the Schema field, schema is automatically set to Repository and your schema is already defined
since you have stored your input file locally for this example. Otherwise, select Built-in and click
the [...] button next to Edit Schema to open the Schema dialog box where you can define the
input schema, and then click OK to close the dialog box.
Configuring the mapping component
Procedure
1. In the design workspace, double-click tMap to open its editor.
1118
2.
In the tMap editor, click on top of the panel to the right to open the Add a new output table
dialog box.
3. Enter a name for the table you want to create, row2 in this example.
4. Click OK to validate your changes and close the dialog box.
5. In the table to the left, row1, select the first three lines (Id, CustomerName and CustomerAddress)
and drop them to the table to the right
6. In the Schema editor view situated in the lower left corner of the tMap editor, change the type of
RegisterTime to String in the table to the right.
7. Click OK to save your changes and close the editor.
Procedure
1. In the design workspace, double-click tFileOutputDelimited to open its Basic settings view and
define the component properties.
2. In the Property Type field, set the type to Built-in and fill in the fields that follow manually.
3. Click the [...] button next to the File Name field and browse to the output file you want to write
data in, customerselection.txt in this example.
4. In the Row Separator and Field Separator fields, set "\n" and ";" respectively as row and field
separators.
1119
5. Select the Include Header check box if you want to output columns headers as well.
6. Click Edit schema to open the schema dialog box and verify if the recuperated schema
corresponds to the input schema. If not, click Sync Columns to recuperate the schema from the
preceding component.

Procedure
The three specified columns Id, CustomerName and CustomerAddress are output in the defined
output file.
Utilizing Output Stream to save filtered data to a local file

Based on the preceding scenario, this scenario saves the filtered data to a local file using output
stream.

Procedure
1. Drop tJava from the Palette to the design workspace.
2. Connect tJava to tFileInputDelimited using a Trigger > On Subjob OK connection.
1120

Procedure
2. In the Code area, type in the following command:
new java.io.File("C:/myFolder").mkdirs();
globalMap.put("out_file",new
java.io.FileOutputStream("C:/myFolder/customerselection.txt",false));
Note:
In this scenario, the command we use in the Code area of tJava will create a new folder C:/
myFolder where the output file customerselection.txt will be saved. You can customize the
command in accordance with actual practice.
3. Double-click tFileOutputDelimited to open its Basic settings view.
4. Select Use Output Stream check box to enable the Output Stream field in which you can define
the output stream using command.
Fill in the Output Stream field with following command:
(java.io.OutputStream)globalMap.get("out_file")
Note:
You can customize the command in the Output Stream field by pressing CTRL+SPACE to select
built-in command from the list or type in the command into the field manually in accordance
with actual practice. In this scenario, the command we use in the Output Stream field will call
the java.io.OutputStream class to output the filtered data stream to a local file which is
defined in the Code area of tJava in this scenario.
5. Click Sync columns to retrieve the schema defined in the preceding component.
1121
6. Leave rest of the components as they were in the previous scenario.

Procedure
The three specified columns Id, CustomerName and CustomerAddress are output in the defined
output file.
1122
tFileOutputExcel
tFileOutputExcel
Writes an MS Excel file with separated data values according to a defined schema.
tFileOutputExcel Standard properties

These properties are used to configure tFileOutputExcel running in the Standard Job framework.
The Standard tFileOutputExcel component belongs to the File family.
Basic settings
Write excel 2007 file format (xlsx / xlsm) Select this check box to write the processed data into the
.xlsx or .xlsm format of Excel 2007.
In order to avoid the inconvenience of writing manually, you
1020.
File Name Name or path to the output file.


Sheet name Name of the xsl sheet.
1123
tFileOutputExcel
Warning: If a subJob contains multiple tFileOutputExc

el components that write the same excel file (that is,
the File Name options of these components point to
the same file), these components overwrite the same
xsl sheet and only the data of the tFileOutputExc
el component that is the last one to write the excel
file remains. To avoid data lost, make sure that these
tFileOutputExcel components are in different subJobs.
Include header Select this check box to include a header row to the output
file.
Append existing file Select this check box to add the new lines at the end of the
file.
Append existing sheet: Select this check box to add the new
lines at the end of the Excel sheet.
Is absolute Y pos. Select this check box to add information in specified cells:
First cell X: cell position on the X-axis (X-coordinate or
Abcissa).
First cell Y: cell position on the Y-axis (Y-coordinate).
Keep existing cell format: select this check box to retain the
original layout and format of the cell you want to write into.
Font Select in the list the font you want to use.
Define all columns auto size Select this check box if you want the size of all your
columns to be defined automatically. Otherwise, select the
Auto size check boxes next to the column names you want
their size to be defined automatically.
Protect file Select this check box and enter the password in the
Password field to protect the file using a password.
This component supports agile encryption.
This option is available when Write excel2007 file
format(xlsx) is selected and Use Output Stream is not
selected.
in the Repository.
available:
only.
1124
tFileOutputExcel

Guide.

Advanced settings
Create directory if not exists This check box is selected by default. This option creates
the directory that will hold the output files if it does not
already exist.
Custom the flush buffer size Available when Select this check box to write the processed
data into theWrite excel2007 file format (xlsx) is selected
in the Basic settings view.
Select this check box to set the maximum number of rows
in the Row number field that are allowed in the buffer.
Advanced separator (for numbers) Select this check box to modify the separators you want to
use for numbers:
handling.
Don't generate empty file Select the check box to avoid the generation of an empty
file.
Recalculate formula Select this check box if you need to recalculate formula(s) in
the specified Excel file.
This check box appears only when you select all these three
check boxes: Write excel2007 file format(xlsx), Append
existing file, and Append existing sheet.
Global Variables

check box.
1125
tFileOutputExcel

use from it.
User Guide.
Usage
Usage rule Use this component to write an MS Excel file with data
passed on from other components using a Row link.
Talend Studio .
unusable.
Guide.

Related scenario
For tFileOutputExcel related scenario, see tSugarCRMInput (deprecated);
For scenario about the usage of Use Output Stream check box, see Utilizing Output Stream to save
filtered data to a local file on page 1120.
1126
tFileOutputJSON
tFileOutputJSON
Receives data and rewrites it in a JSON structured data block in an output file.
tFileOutputJSON Standard properties

These properties are used to configure tFileOutputJSON running in the Standard Job framework.
The Standard tFileOutputJSON component belongs to the File family.
Basic settings
File Name Name and path of the output file.

Generate an array json Select this check box to generate an array JSON file.
Name of data block Enter a name for the data block to be written, between
This field disappears when the Generate an array json check
box is selected.
available:
only.
component only.

Job designs.
1127
tFileOutputJSON
Advanced settings
Create directory if not exists This check box is selected by default. This option creates
the directory that will hold the output files if it does not
already exist.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule Use this component to rewrite received data in a JSON

structured output file.
Writing a JSON structured file

This is a 2 component scenario in which a tRowGenerator component generates random data which a
tFileOutputJSON component then writes to a JSON structured output file.
Procedure
Procedure
1. Drop a tRowGenerator and a tFileOutputJSON component onto the workspace from the Palette.
2. Link the components using a Row > Main connection.
3. Double click tRowGenerator to define its Basic Settings properties in the Component view.
1128
tFileOutputJSON
4. Click [...] next to Edit Schema to display the corresponding dialog box and define the schema.
5. Click [+] to add the number of columns desired.

6. Under Columns type in the column names.
7. Under Type, select the data type from the list.
8. Click OK to close the dialog box.
9. Click [+] next to RowGenerator Editor to open the corresponding dialog box.
1129
tFileOutputJSON
10. Under Functions, select pre-defined functions for the columns, if required, or select [...] to set
customized function parameters in the Function parameters tab.
11. Enter the number of rows to be generated in the corresponding field.
13. Click tFileOutputJSON to set its Basic Settings properties in the Component view.
14. Click [...] to browse to where you want the output JSON file to be generated and enter the file
name.
15. Enter a name for the data block to be generated in the corresponding field, between double
quotation marks.
16. Select Built-In as the Schema type.
17. Click Sync Columns to retrieve the schema from the preceding component.
Results
The data from the input schema is written in a JSON structured data block in the output file.
1130
tFileOutputLDIF
tFileOutputLDIF
Writes or modifies an LDIF file with data separated in respective entries based on the schema defined,
or else deletes content from an LDIF file.
tFileOutputLDIF outputs data to an LDIF type of file which can then be loaded into an LDAP directory.
tFileOutputLDIF Standard properties

These properties are used to configure tFileOutputLDIF running in the Standard Job framework.
The Standard tFileOutputLDIF component belongs to the File family.
Basic settings
File Name Specify the path to the LDIF output file.

Wrap Specify the number of characters at which the line will be

wrapped.
Change type Select a changetype that defines the operation you want to
perform on the entries in the output LDIF file.
• Add: the LDAP operation for adding the entry.
• Modify: the LDAP operation for modifying the entry.
• Delete: the LDAP operation for deleting the entry.
• Modrdn: the LDAP operation for modifying an entry's
RDN (Relative Distinguished Name).
• Default: the default LDAP operation.
Multi-Values / Modify Detail Specify the attributes for multi-value fields when Add or
Default is selected from the Change type list or provide the
detailed modification information when Modify is selected
from the Change type list.
• Operation: Select an operation to be performed on
the corresponding field. This column is available only
when Modify is selected from the Change type list.
• MultiValue: Select the check box if the corresponding
field is a multi-value field.
• Separator: Specify the value separator in the
corresponding multi-value field.
• Binary: Select the check box if the corresponding field
represents binary data.
• Base64: Select the check box if the corresponding
field should be base-64 encoded. The base-64
encoded data in the LDIF file is represented by the ::
symbol.
This table is available only when Add, Modify, or Default is
selected from the Change type list.
1131
tFileOutputLDIF
available:
only.
component only.

Job designs.
file.
Advanced settings
Enforce safe base 64 conversion Select this check box to enable the safe base-64 encoding.
For more detailed information about the safe base-64
encoding, see https://www.ietf.org/rfc/rfc2849.txt.
Create directory if not exists This check box is selected by default. It creates the
directory that holds the output delimited file, if it does not
already exist.
Custom the flush buffer size Select this check box to specify the number of lines to
write before emptying the buffer.
Row number Type in the number of lines to write before emptying the bu
ffer.
This field is available only when the Custom the flush
buffer size check box is selected.
handling.
files.
1132
tFileOutputLDIF
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is used to write an LDIF file with data
passed on from an input component using a Row > Main
connection.

Writing data from a database table into an LDIF file

This scenario describes a Job that loads data into a database table, and then extracts the data from
the table and writing the data into a new LDIF file.
1133
tFileOutputLDIF

Procedure
workspace or dropping them from the Palette: a tFixedFlowInput component, a tMysqlOutput
component, a tMysqlInput component, and a tFileOutputLDIF component.
3. Link tMysqlInput to tFileOutputLDIF using a Row > Main connection.
4. Link tFixedFlowInput to tMysqlInput using a Trigger > On Subjob Ok connection.

Loading data into a database table
Procedure
four columns: dn, id_owners, registration, and make, all of String type.
1134
tFileOutputLDIF
3. Click OK to close the schema editor and accept the propagation prompted by the pop-up dialog
box.
4. In the Mode area, select Use Inline Content(delimited file), and then in the Content field
displayed, enter the following input data:24;24;5382 KC 94;Volkswagen 32;32;9591
0E 79;Honda 35;35;3129 VH 61;Volkswagen
5. Double-click tMysqlOutput to open its Basic settings view.
6. Fill in the Host, Port, Database, Username, and Password fields with your MySQL database
connection details.
7. In the Table field, enter the name of the table into which the data will be written. In this example,
it is ldifdata.
8. Select Drop table if exists and create from the Action on table drop-down list.
1135
tFileOutputLDIF
Extracting data from the database table and writing it into an LDIF file
Procedure
1. Double-click tMysqlInput to open its Basic settings view.
2. Fill in the Host, Port, Database, Username, and Password fields with your MySQL database
connection details.
four columns: dn, id_owners, registration, and make, all of String type.
4. In the Table Name field, enter the name of the table from which the data will be read. In this
example, it is ldifdata.
5. Click the Guess Query button to fill in the Query field with the auto-generated query.
6. Double-click tFileOutputLDIF to open its Basic settings view.
7. In the File Name field, browse to or enter the path to the LDIF file to be generated. In this
example, it is E:/out.ldif.
1136
tFileOutputLDIF
8. Select the operation Add from the Change type list.

9. Click the Sync columns button to retrieve the schema from the preceding component.

Procedure
The LDIF file created contains the data from the database table and the change type for the
entries is set to add.
1137
tFileOutputMSDelimited
Creates a complex multi-structured delimited file, using data structures (schemas) coming from
several incoming Row flows.
tFileOutputMSDelimited Standard properties

These properties are used to configure tFileOutputMSDelimited running in the Standard Job
framework.
The Standard tFileOutputMSDelimited component belongs to the File family.
Basic settings
to be used.

Row Separator String (ex: "\n"on Unix) to distinguish rows.
Field Separator Character, string or regular expression to separate fields.
Use Multi Field Separators Select this check box to set a different field separator for
each of the schemas using the Field separator field in the
Schemas area.
Schemas The table gets automatically populated by schemas

coming from the various incoming rows connected to
tFileOutputMSDelimited. Fill out the dependency between
the various schemas:
Parent row: Type in the parent flow name (based on the
Row name transferring the data).
Parent key column: Type in the key column of the parent
row.
Key column: Type in the key column for the selected row.
Advanced settings
mbers:
CSV options Select this check box to take into account all parameters
specific to CSV files, in particular Escape char and Text
enclosure parameters.
1138
already exist.
handling.
files.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule Use this component to write a multi-schema delimited file

and separate fields using a field separator value.

Related scenarios
1139
tFileOutputMSPositional
Creates a complex multi-structured file, using data structures (schemas) coming from several
incoming Row flows.
tFileOutputMSPositional Standard properties

These properties are used to configure tFileOutputMSPositional running in the Standard Job
framework.
The Standard tFileOutputMSPositional component belongs to the File family.
Basic settings
File Name Name and path to the file to be created and/or variable to
be used.

Schemas The table gets automatically populated by schemas

coming from the various incoming rows connected to
tFileOutputMSPositional. Fill out the dependency between
the various schemas:
Parent row: Type in the parent flow name (based on the
Row name transferring the data).
Parent key column: Type in the key column of the parent
row
Key column: Type in the key column for the selected row.
Pattern: Type in the pattern that positions the fields
separator for each incoming row.
Padding char: type in the padding character to be used
Alignment: Select the relevant alignment parameter
Advanced settings
mbers:
already exist.
1140
handling.
Global Variables

NB_LINE_REJECTED: the number of rows rejected. This is a
NB_LINE_UNKOWN_HEADER_TYPES: the number of rows
with unknown header type. This is a Flow variable and it
returns an integer.
NB_LINE_PARSE_ERRORS: the number of rows with parse
errors. This is a Flow variable and it returns an integer.
check box.
use from it.
User Guide.
Usage
Usage rule Use this component to write a multi-schema positional file

and separate fields using a position separator value.
Related scenarios
1141
tFileOutputMSXML
tFileOutputMSXML
Creates a complex multi-structured XML file, using data structures (schemas) coming from several
incoming Row flows.
tFileOutputMSXML Standard properties

These properties are used to configure tFileOutputMSXML running in the Standard Job framework.
The Standard tFileOutputMSXML component belongs to the File family.
Basic settings
File Name Name and path to the file to be created and or the variable
to be used.

Configure XML tree Opens the dedicated interface to help you set the XML
mapping. For details about the interface, see Defining the
MultiSchema XML tree on page 1143.
Advanced settings
Create directory only if not exists This check box is selected by default. It creates the
already exist.
mbers:
handling.
files.
Trim the whitespace characters Select this check box to remove leading and trailing
whitespace from the columns.
Escape text Select this check box to escape special characters.
1142
tFileOutputMSXML
Global Variables

check box.
use from it.
User Guide.
Defining the MultiSchema XML tree

Double-click on the tFileOutputMSXML component to open the dedicated interface or click on the
three-dot button on the Basic settings vertical tab of the Component tab.
To the left of the mapping interface, under Linker source, the drop-down list includes all the input
schemas that should be added to the multi-schema output XML file (only if more than one input flow
is connected to the tFileOutputMSXML component).
And under Schema List, are listed all columns retrieved from the input data flow in selection.
1143
tFileOutputMSXML
To the right of the interface, are expected all XML structures you want to create in the output XML
file.
You can create manually or easily import the XML structures. Then map the input schema columns
onto each element of the XML tree, respectively for each of the input schemas in selection under
Linker source.
Importing the XML tree

The easiest and most common way to fill out the XML tree panel, is to import a well-formed XML file.
Procedure
3. On the menu, select Import XML tree.
4. Browse to the file to import and click OK.
• You can import an XML tree from files in XML, XSD and DTD formats.
• When importing an XML tree structure from an XSD file, you can choose an element as the
root of your XML tree.
The XML Tree column is hence automatically filled out with the correct elements.
5. If you need to add or remove an element or sub-elements, right-click the relevant element of the
tree to display the contextual menu.
6. Select Delete to remove the selection from the tree or select the relevant option among: Add sub-
element, Add attribute, Add namespace to enrich the tree.
Creating the XML tree manually

If you don't have any XML structure defined as yet, you can create it manually.
Procedure
3. On the menu, select Add sub-element to create the first element of the structure.
4. If you need to add an attribute or a child element to any element or remove any element, right-
click the left of the corresponding element name to display the contextual menu.
6. On the menu, select the relevant option among: Add sub-element, Add attribute, Add namespace
or Delete.
Mapping XML data from multiple schema sources

Once your XML tree is ready, select the first input schema that you want to map.
You can map each input column with the relevant XML tree element or sub-element to fill out the
Related Column.
1144
tFileOutputMSXML
Procedure
1. Click on one of the Schema column name.
2. Drag it onto the relevant sub-element to the right.
3. Release the mouse button to implement the actual mapping.
A light blue link displays that illustrates this mapping. If available, use the Auto-Map button,
located to the bottom left of the interface, to carry out this operation automatically.
4. If you need to disconnect any mapping on any element of the XML tree, select the element and
right-click to the left of the element name to display the contextual menu
5. Select Disconnect link.
The light blue link disappears.
Defining the node status

Defining the XML tree and mapping the data is not sufficient. You also need to define the loop
elements for each of the source in selection and if required the group element.
Define a loop element

The loop element allows you to define the iterating object. Generally the Loop element is also the
row generator.
About this task

To define an element as loop element:
Procedure
3. Select Set as Loop Element.
Results
The Node Status column shows the newly added status.
There can only be one loop element at a time.
1145
tFileOutputMSXML
Define a group element

The group element is optional, it represents a constant element where the groupby operation can be
performed. A group element can be defined only if a loop element was defined before.
About this task

When using a group element, the rows should sorted, in order to be able to group by the selected
node.
To define an element as group element:
Procedure
3. Select Set as Group Element.
Results
The Node Status column shows the newly added status and any group status required are
automatically defined, if needed.
Click OK once the mapping is complete to validate the definition and continue the job configuration
where needed.
Related scenarios
1146
tFileOutputPositional
Writes a file row by row according to the length and the format of the fields or columns in a row.
tFileOutputPositional Standard properties

These properties are used to configure tFileOutputPositional running in the Standard Job framework.
The Standard tFileOutputPositional component belongs to the File family.
Basic settings

are stored.
1020.
File Name Name or path to the file to be processed and or the variable
to be used.

1147

available:
only.
component only.

Job designs.
file.
Include header Select this check box to include the column header to the fi
le.
Compress as zip file Select this check box to compress the output file in zip fo
rmat.
Formats Customize the positional file data format and fill in the
columns in the Formats table.
Padding char: Type in between quotes the padding
characters used. A space by default.
Keep: If the data in the column or in the field are too long,
select the part you want to keep.
Advanced settings
Use byte length as the cardinality Select this check box to add support of double-byte
character to this component. JDK 1.6 is required for this
feature.
1148
Custom the flush buffer size Select this check box to define the number of lines to write
before emptying the buffer.
Row Number: set the number of lines to write.
Output in row mode Writes in row mode.
docs.oracle.com.
files.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule Use this component to read a file and separate the fields
using the specified separator.
Talend Studio .
1149

unusable.
Guide.
Related scenario
For a related scenario, see Reading data using a Regex and outputting the result to Positional file on
page 1089.
For scenario about the usage of Use Output Stream check box, see Utilizing Output Stream to save
filtered data to a local file on page 1120.
1150
tFileOutputProperties
Writes a configuration file, of the type .ini or .properties, containing text data organized according to
the model key = value.
tFileOutputProperties Standard properties

These properties are used to configure tFileOutputProperties running in the Standard Job framework.
The Standard tFileOutputProperties component belongs to the File family.
Basic settings
in the Repository.
For this component, the schema is read-only. It is made of
two column, Key and Value, corresponding to the parameter
name and the parameter value to be copied.
File format Select from the list file format: either .properties or .ini.
.properties: data in the configuration file is written in two

lines and structured according to the following way: key =
value.
.ini: data in the configuration file is written in two lines and

structured according to the following way: key = value and
re-grouped in sections.
Section Name: enter the section name on which the
iteration is based.
to be used.

Advanced settings
handling.
1151
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule Use this component to write files where data is organized
according to the structure key = value.
Related scenarios
For a related scenario, see Reading and matching the keys and the values of different .properties files
and outputting the results in a glossary on page 1080 of tFileInputProperties on page 1079.
1152
tFileOutputRaw
tFileOutputRaw
Provides data coming from another component, in the form of a single column of output data.
tFileOutputRaw Standard properties

These properties are used to configure tFileOutputRaw running in the Standard Job framework.
The Standard tFileOutputRaw component belongs to the File family.
Basic settings
in the Repository.
available:
only.

Filename The name of and path to the output file to be processed,

which you can enter manually between double quotes or
browse and select by clicking the [...] button.

Encoding If the output is a string, select the encoding type from the
list or select Custom and define it manually.
1153
tFileOutputRaw
Advanced settings
Global Variables
check box.
use from it.
User Guide.
Usage
Usage rule Use the tFileOutputRaw component to receive data coming

from a data source that provides its data in a single column.

1154
tFileOutputXML
tFileOutputXML
Writes an XML file with separated data values according to a defined schema.
tFileOutputXML Standard properties

These properties are used to configure tFileOutputXML running in the Standard Job framework.
The Standard tFileOutputXML component belongs to the File and the XML families.
Basic settings
File Name Name or path to the output file and/or the variable to be
used.
Related topic: see Defining variables from the Component
view section in Talend Studio User Guide

Incoming record is a document Select this check box if the data from the preceding
component is in XML format.
When this check box is selected, a Column list appears
allowing you to select a Document type column of the
schema that holds the data, and the Row tag field d
isappears.
When this check box is selected, in the Advanced settings
view, only the check boxes Create directory if not exists,
Don't generate empty file, Trim data, tStatCatcher Statistics
and the list Encoding are available.
Row tag Specify the tag that will wrap data and structure per row.
available:
only.
1155
tFileOutputXML
component only.

Job designs.
connection is linked with the input component.
Advanced settings
Split output in several files If the output is big, you can split the output into several
files, each containing the specified number of rows.
Rows in each output file: Specify the number of rows in each
output file.
to hold the output XML files if required.
Root tags Specify one or more root tags to wrap the whole output file
structure and data. The default root tag is root.
Output format Define the output format.

• Column: The columns retrieved from the input schema.
• As attribute: select check box for the column(s) you
want to use as attribute(s) of the parent element in the
XML output.
Note:
If the same column is selected in both the Output format
table as an attribute and in the Use dynamic grouping
setting as the criterion for dynamic grouping, only the
dynamic group setting will take effect for that column.
Use schema column name: By default, this check box is

selected for all columns so that the column labels from the
input schema are used as data wrapping tags. If you want
to use a different tag than from the input schema for any
column, clear this check box for that column and specify a
tag label between quotation marks in the Label field.
Use dynamic grouping Select this check box if you want to dynamically group the
output columns. Click the plus button to add one ore more
grouping criteria in the Group by table.
Column: Select a column you want to use as a wrapping
element for the grouped output rows.
Attribute label: Enter an attribute label for the group
wrapping element, between quotation marks.
Custom the flush buffer size Select this check box to define the number of rows to buffer
before the data is written into the target file and the buffer
is emptied.
Row Number: Specify the number of rows to buffer.
1156
tFileOutputXML
docs.oracle.com.
Don't generate empty file Select the check box to avoid the generation of an empty fi
le.
Trim data Select this check box to remove the spaces at the beginning
and at the end of the text, and merge multiple consecutive
spaces into one within the text.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule Use this component to write an XML file with data passed
on from other components using a Row link.
Related scenarios
For related scenarios using tFileOutputXML, see Reading a Positional file and saving filtered results to
XML on page 1075 and Using a SOAP message from an XML file to get country name information and
saving the information to an XML file on page 3454.
1157
tFileProperties
tFileProperties
Creates a single row flow that displays the main properties of the processed file.
tFileProperties Standard properties

These properties are used to configure tFileProperties running in the Standard Job framework.
The Standard tFileProperties component belongs to the File family.
Basic settings
processed and passed on to the next component.
The schema of this component is read-only. It describes the
main properties of the specified file. You can click the [...]
button next to Edit schema to view the predefined schema
which contains the following fields:
• abs_path: the absolute path of the file.
• dirname: the directory of the file.
• basename: the name of the file.
• mode_string: the access mode of the file, r and w for
read and write permissions respectively.
• size: the file size in bytes.
• mtime: the timestamp indicating when the file was
last modified, in milliseconds that have elapsed since
the Unix epoch (00:00:00 UTC, Jan 1, 1970).
• mtime_string: the date and time the file was last
modified.
File Name or path to the file to be processed and/or the variable

to be used.

Calculate MD5 Hash Select this check box to check the MD5 of the downloaded
file.
Advanced settings
Global Variables

1158
tFileProperties

check box.
use from it.
User Guide.
Usage

Row: Main; Iterate.

Row: Iterate.
lize.

Studio User Guide.
Displaying the properties of a processed file

This Java scenario describes a very simple Job that displays the properties of the specified file.
Procedure
Procedure
1. Drop a tFileProperties component and a tLogRow component from the Palette onto the design
workspace.
2. Right-click on tFileProperties and connect it to tLogRow using a Main Row link.
3. In the design workspace, select tFileProperties.

4. Click the Component tab to define the basic settings of tFileProperties.
1159
tFileProperties
5. Set Schema type to Built-In.

6. If desired, click the Edit schema button to see the read-only columns.
7. In the File field, enter the file path or browse to the file you want to display the properties for.
Results
The properties of the defined file are displayed on the console.
1160
tFileRowCount
tFileRowCount
Opens a file and reads it row by row in order to determine the number of rows inside.
tFileRowCount Standard properties

These properties are used to configure tFileRowCount running in the Standard Job framework.
The Standard tFileRowCount component belongs to the File family.
Basic settings
to be used.

Row separator String (ex: "\n"on Unix) to distinguish rows in the output
file.
Ignore empty rows Select this check box to ignore the empty rows while the
component is counting the rows in the file.
handling.
Advanced settings
Global Variables
Global Variables COUNT: the number of rows in a file. This is a Flow variable
check box.
use from it.
1161
tFileRowCount

User Guide.
Usage
Usage rule tFileRowCount is a standalone component, it must be used

with a OnSubjobOk connection to tJava.

Row: Main; Iterate.

lize.

Studio User Guide.

Writing a file to MySQL if the number of its records matches

a reference value
In this scenario, tFileRowCount counts the number of records in a .txt file, which is compared against
a reference value through tJava. Once the two values match, the .txt file will be written to a MySQL
table.
The .txt file has two records:
1;andy
2;mike

Procedure
1. Drop tFileRowCount, tJava, tFlieInputDelimited, and tMysqlOutput from the Palette onto the
design workspace.
2. Link tFileRowCount to tJava using an OnSubjobOk trigger.
3. Link tJava to tFlieInputDelimited using a Run if trigger.
4. Link tFlieInputDelimited to tMysqlOutput using a Row > Main connection.
1162
tFileRowCount

Procedure
1. Double-click tFileRowCount to open its Basic settings view.
2. In the File Name field, type in the full path of the .txt file. You can also click the [...] button to
browse for this file.
Select the Ignore empty rows check box.
In the Code box, enter the function to print out the number of rows in the file:
System.out.println(globalMap.get("tFileRowCount_1_COUNT"));
4. Click the if trigger connection to open its Basic settings view.
1163
tFileRowCount
In the Condition box, enter the statement to judge if the number of rows is 2:
((Integer)globalMap.get("tFileRowCount_1_COUNT"))==2
This if trigger means that if the row count equals 2, the rows of the .txt file will be written to
MySQL.
5. Double-click tFlieInputDelimited to open its Basic settings view.
In the File name/Stream field, type in the full path of the .txt file. You can also click the [...]
button to browse for this file.
6. Click the Edit schema button open the schema editor.
7. Click the [+] button to add two columns, namely id and name, respectively of the integer and
string type.
8. Click the Yes button in the pop-up box to propagate the schema setup to the following
component.
1164
tFileRowCount
9. Double-click tMysqlOutput open its Basic settings view.
10. In the Host and Port fields, enter the connection details.
In the Database field, enter the database name.
In the Username and Password fields, enter the authentication details.
In the Table field, enter the table name, for instance "staff".
11. In the Action on table list, select Create table if not exists.
In the Action on data list, select Insert.
Executing the Job

Procedure
As shown above, the Job has been executed successfully and the number of rows in the .txt file
has been printed out.
3. Go to the MySQL GUI and open the table staff.
As shown above, the table has been created with the two records inserted.
1165
tFileTouch
tFileTouch
Creates an empty file or, if the specified file already exists, updates its date of modification and of last
access while keeping the contents unchanged.
tFileTouch Standard properties

These properties are used to configure tFileTouch running in the Standard Job framework.
The Standard tFileTouch component belongs to the File family.
Basic settings
File Name Path and name of the file to be created and/or the variable
to be used.

Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage

Row: Main.
1166
tFileTouch


Parallelize.

Studio User Guide.
Related scenarios
1167
tFileUnarchive
tFileUnarchive
Decompresses an archive file for further processing, in one of the following formats: *.tar.gz , *.tgz,
*.tar, *.gz and *.zip.
tFileUnarchive Standard properties

These properties are used to configure tFileUnarchive running in the Standard Job framework.
The Standard tFileUnarchive component belongs to the File family.
Basic settings
Archive file File path to the archive.

Extraction directory Folder where the unzipped file(s) will be put.

Use archive file name as root directory Select this check box to create a folder named as the
archive, if it does not exist, under the specified directory and
extract the zipped file(s) to that folder.
Check the integrity before unzip Select this check box to run an integrity check before
unzipping the archive.
Extract file paths Select this check box to reproduce the file path structure
zipped in the archive.
Need a password Select this check box and provide the correct decrypt
method and password if the archive to be unzipped is
password protected. Note that the encrypted archive must
be one created by the tFileArchive component; otherwise
you will see error messages or get nothing extracted even if
no error message is displayed.
Decrypt method: select the decrypt method from the list,
either Java Decrypt or Zip4j Decrypt.
Enter the password: enter the decryption password.
settings.
Advanced settings
1168
tFileUnarchive
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component can be used as a standalone component

but it can also be used within a Job as a Start component
using an Iterate link.

Row: Iterate.

Row: Iterate.
Parallelize.

Studio User Guide.
Limitation
Warning:
Such files can be decompressed: *.tar.gz , *.tgz, *.tar, *.gz and
*.zip.
Related scenario
For tFileUnarchive related scenario, see tFileCompare on page 984.
1169
tFilterColumns
tFilterColumns
Homogenizes schemas either by ordering the columns, removing unwanted columns or adding new
columns.
tFilterColumns Standard properties

These properties are used to configure tFilterColumns running in the Standard Job framework.
The Standard tFilterColumns component belongs to the Processing family.
Basic settings
available:
only.
previous component in the Job.
component only.

Job designs.
Advanced settings
Global Variables

1170
tFilterColumns

check box.
use from it.
User Guide.
Usage

Related Scenario
For more information regarding the tFilterColumns component in use, see Cleaning up and filtering a
CSV file on page 3027.
1171
tFilterRow
tFilterRow
Filters input rows by setting one or more conditions on the selected columns.
tFilterRow Standard properties

These properties are used to configure tFilterRow running in the Standard Job framework.
The Standard tFilterRow component belongs to the Processing family.
Basic settings
The schema of this component is built-in only.
Logical operator used to combine conditions Select a logical operator to combine simple conditions and
to combine the filter results of both modes if any advanced
conditions are defined.
And: returns the boolean value of true if all conditions are tr
ue; otherwise false. For each two conditions combined using
a logical AND, the second condition is evaluated only if the
first condition is evaluated to be true.
Or: returns the boolean value of true if any condition is true;
otherwise false. For each two conditions combined using a
logical OR, the second condition is evaluated only if the first
condition is evaluated to be false.
Conditions Click the plus button to add as many simple conditions

as needed. Based on the logical operator selected, the
conditions are evaluated one after the other in sequential
order for each row. When evaluated, each condition returns
the boolean value of true or false.
Input column: Select the column of the schema the function
is to be operated on
Function: Select the function on the list
Operator: Select the operator to bind the input column with
the value
Value: Type in the filtered value, between quotes if needed.
Use advanced mode Select this check box when the operations you want to
perform cannot be carried out through the standard
functions offered, for example, different logical operations
in the same component. In the text field, type in the regular
expression as required.
If multiple advanced conditions are defined, use a logical
operator between two conditions:
&& (logical AND): returns the boolean value of true if both
conditions are true; otherwise false. The second condition is
evaluated only if the first condition is evaluated to be true.
1172
tFilterRow
|| (logical OR): returns the boolean value of true if either

condition is true; otherwise false. The second condition is
evaluated only if the first condition is evaluated to be false.
Advanced settings
Global Variables

check box.
NB_LINE_OK: the number of rows matching the filter. This is
an After variable and it returns an integer.
use from it.
User Guide.
Usage

Filtering a list of names using simple conditions

The following scenario shows a Job that uses simple conditions to filter a list of records. This scenario
will output two tables: the first will list all male persons with a last name shorter than nine characters
and aged between 10 and 80 years; the second will list all rejected records. An error message for each
rejected record will display in the same table to explain why such a record has been rejected.
1173
tFilterRow

Procedure
1. Drop tFixedFlowInput, tFilterRow and tLogRow from the Palette onto the design workspace.
2. Connect the tFixedFlowInput to the tFilterRow, using a Row > Main link. Then, connect the
tFilterRow to the tLogRow, using a Row > Filter link.
3. Drop tLogRow from the Palette onto the design workspace and rename it as reject. Then, connect
the tFilterRow to the reject, using a Row > Reject link.
4. Label the components to better identify their roles in the Job.

Procedure
1. Double-click tFixedFlowInput to display its Basic settings view and define its properties.
2. Click the [...] button next to Edit schema to define the schema for the input data. In this example,
the schema is made of the following four columns: LastName (type String), Gender (type String),
Age (type Integer) and City (type String).
When done, click OK to validate the schema setting and close the dialog box. A new dialog box
opens and asks you if you want to propagate the schema. Click Yes.
1174
tFilterRow
3. Set the row and field separators in the corresponding fields if needed. In this example, use the
default settings for both, namely the row separator is a carriage return and the field separator is a
semi-colon.
4. Select the Use Inline Content(delimited file) option in the Mode area and type in the input data in
the Content field.
The input data used in this example is shown below:
Van Buren;M;73;Chicago
Adams;M;40;Albany
Jefferson;F;66;New York
Adams;M;9;Albany
Jefferson;M;30;Chicago
Carter;F;26;Chicago
Harrison;M;40;New York
Roosevelt;F;15;Chicago
Monroe;M;8;Boston
Arthur;M;20;Albany
Pierce;M;18;New York
Quincy;F;83;Albany
McKinley;M;70;Boston
Coolidge;M;4;Chicago
Monroe;M;60;Chicago
5. Double-click tFilterRow to display its Basic settings view and define its properties.
1175
tFilterRow
6. In the Conditions table, add four conditions and fill in the filtering parameters.
• From the InputColumn list field of the first row, select LastName, from the Function list field,
select Length, from the Operator list field, select Lower than, and in the Value column, type in
9 to limit the length of last names to nine characters.
• From the InputColumn list field of the second row, select Gender, from the Operator list field,
select Equals, and in the Value column, type in M in double quotes to filter records of male
persons.
Warning:
In the Value field, you must type in your values between double quotes for all types of va
lues, except for integer values, which do not need quotes.
• From the InputColumn list field of the third row, select Age, from the Operator list field, select
Greater than, and in the Value column, type in 10 to set the lower limit to 10 years.
• From the InputColumn list field of the four row, select Age, from the Operator list field, select
Lower than, and in the Value column, type in 80 to set the upper limit to 80 years.
7. To combine the conditions, select And as that only those records that meet all the defined
conditions are accepted.
8. In the Basic settings of tLogRow components, select Table (print values in cells of a table) in the
Mode area.
Executing the Job

Procedure
Save your Job and press F6 to execute it.
As shown above, the first table lists the records of male persons aged between 10 and 80 years,
whose last names are made up of less than nine characters, and the second table lists all the records
that do not match the filter conditions. Each rejected record has a corresponding error message that
explains the reason of rejection.
1176
tFilterRow
Filtering a list of names through different logical operations

Based on the previous scenario, this scenario further filters the input data so that only those records
of people from New York and Chicago are accepted. Without changing the filter settings defined in
the previous scenario, advanced conditions are added in this scenario to enable both logical AND and
logical OR operations in the same tFilterRow component.
Procedure
Procedure
1. Double-click the tFilterRow component to show its Basic settings view.
2. Select the Use advanced mode check box, and type in the following expression in the text field:
input_row.City.equals("Chicago") || input_row.City.equals("New York")
This defines two conditions on the City column of the input data to filter records that contain the
cities of Chicago and New York, and uses a logical OR to combine the two conditions so that reco
rds satisfying either condition will be accepted.
3. Press Ctrl+S to save the Job and press F6 to execute it.
1177
tFilterRow
As shown above, the result list of the previous scenario has been further filtered, and only the
records containing the cities of New York and Chicago are accepted.
1178
tFirebirdClose
tFirebirdClose
Closes a transaction with a Firebird database.
tFirebirdClose Standard properties

These properties are used to configure tFirebirdClose running in the Standard Job framework.
The Standard tFirebirdClose component belongs to the Databases family.
Basic settings
Component list Select the tFirebirdConnection component in the list if more

Advanced settings
level.
Usage
Usage rule This component is to be used along with Firebird

components, especially with tFirebirdConnection and
tFirebirdCommit.
Guide.
1179
tFirebirdClose
Related scenarios
1180
tFirebirdCommit
tFirebirdCommit
Commits a global transaction instead of doing so on every row or every batch, thus providing a gain in
performance.
tFirebirdCommit Standard properties

These properties are used to configure tFirebirdCommit running in the Standard Job framework.
The Standard tFirebirdCommit component belongs to the Databases family.
Basic settings

Warning:
tFirebirdCommit to your Job, your data will be committed
Advanced settings
level.
Usage

tFirebird* components, especially with the tFirebirdConne
ction and tFirebirdRollback components.
1181
tFirebirdCommit

Guide.
Related scenario
For tFirebirdCommit related scenario, see Inserting data in mother/daughter tables on page 2426
1182
tFirebirdConnection
tFirebirdConnection
subjobs.
tFirebirdConnection Standard properties

These properties are used to configure tFirebirdConnection running in the Standard Job framework.
The Standard tFirebirdConnection component belongs to the Databases and the ELT families.
Basic settings


settings.
1183
tFirebirdConnection
Advanced settings
commit component.
component.
Usage

tFirebird* components, especially with the tFirebirdCommit
and tFirebirdRollback components.

Related scenarios
For tFirebirdConnection related scenario, see tMysqlConnection on page 2425
1184
tFirebirdInput
tFirebirdInput
Executes a database query on a Firebird database with a strictly defined order which must correspond
to the schema definition then passes on the field list to the next component via a Main row link.
tFirebirdInput Standard properties

These properties are used to configure tFirebirdInput running in the Standard Job framework.
The Standard tFirebirdInput component belongs to the Databases family.
Basic settings

connection.
Guide.
Port Listening port number of the DB server.
1185
tFirebirdInput

settings.
component only.

Job designs.

available:
only.
definition.
Advanced Settings
Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.

columns.
level.
Global Variables

1186
tFirebirdInput

check box.
use from it.
User Guide.
Usage
Usage rule This component covers all possible SQL queries for FireBird
databases.
unusable.
Guide.

Related scenarios
1187
tFirebirdInput
See also related topic: Reading data from different MySQL databases using dynamically loaded
1188
tFirebirdOutput
tFirebirdOutput
Executes the action defined on the table in a Firebird database and/or on the data contained in the
tFirebirdOutput writes, updates, makes changes or suppresses entries in a database.
tFirebirdOutput Standard properties

These properties are used to configure tFirebirdOutput running in the Standard Job framework.
The Standard tFirebirdOutput component belongs to the Databases family.
Basic settings

connection.
Guide.
1189
tFirebirdOutput

settings.
operations:
again.
exist.
Job stops.
be inserted.
Warning:
component only.
1190
tFirebirdOutput

Job designs.
alend.com).

available:
only.
Row > Rejects link.
Advanced settings
connection created.




1191
tFirebirdOutput

column.
Note:
processing.
Note:
the Insert, Update, or Delete option in the Action on data
option.

batch.
is selected.
level.
Global Variables

check box.
1192
tFirebirdOutput

use from it.
User Guide.
Usage
a table in a Firebird database. It also allows you to create a
unusable.
Guide.

Related scenarios
1193
tFirebirdRollback
tFirebirdRollback
Cancels the transation committed in the connected Firebird database.
tFirebirdRollback Standard properties

These properties are used to configure tFirebirdRollback running in the Standard Job framework.
The Standard tFirebirdRollback component belongs to the Databases family.
Basic settings

Advanced settings
level.
Usage

tFirebird* components, especially with the tFirebirdConne
ction and tFirebirdCommit components.
1194
tFirebirdRollback

Guide.
Related scenario
For tFirebirdRollback related scenario, see Rollback from inserting data in mother/daughter tables on
page 2429.
1195
tFirebirdRow
tFirebirdRow
Executes the stated SQL query on the specified Firebird database.
Depending on the nature of the query and the database, tFirebirdRow acts on the actual DB structure
statements.
tFirebirdRow is the specific component for this database query. The row suffix means the component
implements a flow in the job design although it doesn't provide output.
tFirebirdRow Standard properties

These properties are used to configure tFirebirdRow running in the Standard Job framework.
The Standard tFirebirdRow component belongs to the Databases family.
Basic settings

connection.
Guide.
1196
tFirebirdRow

settings.

Guide.

Studio User Guide.

available:
only.



definition.
Row > Rejects link.
1197
tFirebirdRow
Advanced settings
connection created.
use column list.
Note:
tab.
instruction.
Note:
increased

level.
Global Variables
check box.
1198
tFirebirdRow

use from it.
User Guide.
Usage
and covers all possible SQL queries.
unusable.
Guide.

Related scenarios
1199
tFixedFlowInput
tFixedFlowInput
Generates a fixed flow from internal variables.
tFixedFlowInput Standard properties

These properties are used to configure tFixedFlowInput running in the Standard Job framework.
The Standard tFixedFlowInput component belongs to the Misc family.
Basic settings
Repository.
available:
only.

Guide.

stored it in the Repository, hence can be reused in various
projects and job designs. Related topic: see Talend Studio
User Guide.
Mode From the three options, select the mode that you want to
use.
Use Single Table : Enter the data that you want to generate
in the relevant value field.
Use Inline Table : Add the row(s) that you want to generate.
Use Inline Content : Enter the data that you want to
generate, separated by the separators that you have already
defined in the Row and Field Separator fields.
Number of rows Enter the number of lines to be generated.
1200
tFixedFlowInput
Values Between inverted commas, enter the values corresponding

to the columns you defined in the schema dialog box via
the Edit schema button.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component can be used as a start or intermediate

component and thus requires an output component.
Related scenarios
• Buffering output data on the webapp server on page 421.
• Iterating on a DB table and listing its column names on page 2419.
• Filtering a list of names using simple conditions on page 1173.
1201
tFlowMeter
tFlowMeter
Counts the number of rows processed in the defined flow, so this number can be caught by the
tFlowMeterCatcher component for logging purposes.
tFlowMeter Standard properties

These properties are used to configure tFlowMeter running in the Standard Job framework.
The Standard tFlowMeter component belongs to the Logs & Errors family.
Basic settings
Use input connection name as label Select this check box to reuse the name given to the input
main row flow as label in the logged data.
Mode Select the type of values for the data measured: Absolute:
the actual number of rows is logged
Relative: a ratio (%) of the number of rows is logged. When
this option is selected, a Connections List shows to let you
select a reference connection.
Thresholds Adds a threshold to watch proportions in volumes

measured. you can decide that the normal flow has to be
between low and top end of a row number range, and if the
flow is under this low end, there is a bottleneck.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule Cannot be used as a start component as it requires an input

flow to operate.
If you have a need of log, statistics and other measurement of your data flows, see Talend Studio User
Guide.
1202
tFlowMeter
Related scenario
For related scenario, see Catching flow metrics from a Job on page 1205
1203
tFlowMeterCatcher
tFlowMeterCatcher
Operates as a log function triggered by the use of a tFlowMeter component in the Job.
Based on a defined schema, the tFlowMeterCatcher catches the processing volumetric from the
tFlowMeter component and passes them on to the output component.
tFlowMeterCatcher Standard properties

These properties are used to configure tFlowMeterCatcher running in the Standard Job framework.
The Standard tFlowMeterCatcher component belongs to the Logs & Errors family.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the fields to be
processed and passed on to the next component. In this
particular case, the schema is read-only, as this component
gathers standard log information including:
Moment: Processing time and date
Pid: Process ID
Father_pid: Process ID of the father Job if applicable. If not

applicable, Pid is duplicated.
Root_pid: Process ID of the root Job if applicable. If not

applicable, pid of current Job is duplicated.
System_pid: Process id generated by the system
Project: Project name, the Job belongs to.
Job: Name of the current Job
Job_repository_id: ID generated by the application.
Job_version: Version number of the current Job
Context: Name of the current context
Origin: Name of the component if any
Label: Label of the row connection preceding the

tFlowMeter component in the Job, and that will be analyzed
for volumetrics.
Count: Actual number of rows being processed
Reference: Number of rows passing the reference link.
Thresholds: Only used when the relative mode is selected in

the tFlowMeter component.
1204
tFlowMeterCatcher
Global Variables

check box.
use from it.
User Guide.
Usage

which triggers automatically at the end of the main Job.
Limitation The use of this component cannot be separated from

the use of the tFlowMeter. For more information, see
tFlowMeter on page 1202
Catching flow metrics from a Job

The following basic Job aims at catching the number of rows being passed in the flow processed. The
measures are taken twice, once after the input component, that is, before the filtering step and once
right after the filtering step, that is, before the output component.
• Drop the following components from the Palette to the design workspace: tMysqlInput,
tFlowMeter (x2), tMap, tLogRow, tFlowMeterCatcher and tFileOutputDelimited.
• Link components using row main connections and click on the label to give consistent name
throughout the Job, such as US_States from the input component and filtered_states for the output
from the tMap component, for example.
• Link the tFlowMeterCatcher to the tFileOutputDelimited component using a row main link also as
data is passed.
1205
tFlowMeterCatcher
• On the tMysqlInput Component view, configure the connection properties as Repository, if the
table metadata are stored in the Repository. Or else, set the Type as Built-in and configure
manually the connection and schema details if they are built-in for this Job.
• The 50 States of the USA are recorded in the table states. In order for all 50 entries of the table to
get selected, the query to run onto the Mysql database is as follows:
select * from states.
• Select the relevant encoding type on the Advanced settings vertical tab.
• Then select the following component which is a tFlowMeter and set its properties.
• Select the check box Use input connection name as label, in order to reuse the label you chose in
the log output file (tFileOutputDelimited).
• The mode is Absolute as there is no reference flow to meter against, also no Threshold is to be set
for this example.
• Then launch the tMap editor to set the filtering properties.
• For this use case, drag and drop the ID and State columns from the Input area of the tMap towards
the Output area. No variable is used in this example.
1206
tFlowMeterCatcher
• On the Output flow area (labelled filtered_states in this example), click the arrow & plus button to
activate the expression filter field.
• Drag the State column from the Input area (row2) towards the expression filter field and type in
the rest of the expression in order to filter the state labels starting with the letter M. The final
expression looks like: row2.State.startsWith("M")
• Click OK to validate the setting.
• Then select the second tFlowMeter component and set its properties.
• Select the check box Use input connection name as label.

• Select Relative as Mode and in the Reference connections list, select US_States as reference to be
measured against.
• Once again, no threshold is used for this use case.
• No particular setting is required in the tLogRow.
• Neither does the tFlowMeterCatcher as this component's properties are limited to a preset
schema which includes typical log information.
• So eventually set the log output component (tFileOutputDelimited).
• Select the Append check box in order to log all tFlowMeter measures.
1207
tFlowMeterCatcher
• Then save your Job and press F6 to execute it.
The Run view shows the filtered state labels as defined in the Job.
In the delimited csv file, the number of rows shown in column count varies between tFlowMeter1
and tFlowMeter2 as the filtering has then been carried out. The reference column shows also this
difference.
1208
tFlowToIterate
tFlowToIterate
Reads data line by line from the input flow and stores the data entries in iterative global variables.
tFlowToIterate Standard properties

These properties are used to configure tFlowToIterate running in the Standard Job framework.
The Standard tFlowToIterate component belongs to the Orchestration family.
Basic settings
Use the default (key, value) in global variables When selected, the system uses the default value of the
global variable in the current Job.
Customize key: Type in a name for the new global variable. Press Ctrl
+Space to access all available variables either global or
user-defined.
value: Click in the cell to access a list of the columns

attached to the defined global variable.
Global Variables

check box.
CURRENT_ITERATION: the sequence number of the current
iteration. This is a Flow variable and it returns an integer.
use from it.
User Guide.
Usage
Usage rule You cannot use this component as a start component.

tFlowToIterate requires an output component.

Row: Iterate
Trigger: Run if; On Component Ok; On Component Error.
1209
tFlowToIterate

Row: Main;

Studio User Guide.
Transforming data flow to a list

The following scenario describes a Job that reads a list of files from a defined input file, iterates on
each of the files and displays their content row by row on the Run console.
Setting up the Job

Procedure
1. Drop the following components from the Palette onto the design workspace: two tFileInputDeli
mited components, a tFlowToIterate, and a tLogRow.
2. Connect the first tFileInputDelimited to tFlowToIterate using a Row > Main link, tFlowToIterate
to the second tFileInputDelimited using an Iterate link, and the second tFileInputDelimited to
tLogRow using a Row > Main link.
Configuring the Components

Procedure
1. Double-click the first tFileInputDelimited to display its Basic settings view.
2. Click the [...] button next to the File Name field to select the path to the input file.
Note:
The File Name field is mandatory.
1210
tFlowToIterate
The input file used in this scenario is Customers.txt. It is a text file that contains a list of names
of three other simple text files: Name.txt, E-mail.txt and Address.txt. The first text file, Name.txt,
is made of one column holding customers' names. The second text file, E-mail.txt, is made of
one column holding customers' e-mail addresses. The third text file, Address.txt, is made of one
column holding customers' postal addresses.
Fill in all other fields as needed. For more information, see tFileInputDelimited Standard
properties on page 1015. In this scenario, the header and the footer are not set and there is no
limit for the number of processed rows.
3. Click Edit schema to describe the data structure of this input file. In this scenario, the schema is
made of one column, FileName.
4. Double-click tFlowToIterate to display its Basic settings view.
Click the plus button to add new parameter lines and define your variables, and click in
the key cell to enter the variable name as desired. In this scenario, one variable is defined:
"Name_of_File".
Alternatively, you can select the Use the default (key, value) in global variables check box to use
the default in global variables.
5. Double-click the second tFileInputDelimited to display its Basic settings view.
1211
tFlowToIterate
In the File name field, enter the directory of the files to be read, and then press Ctrl+Space to
select the global variable "Name_of_File". In this scenario, the syntax is as follows:
"C:/scenario/flow_to_iterate/"+((String)globalMap.get("Name_of_File"))
Click Edit schema to define the schema column name. In this scenario, it is RowContent.
Fill in all other fields as needed. For more information, see tFileInputDelimited Standard
properties on page 1015.
6. In the design workspace, select the last component, tLogRow, and click the Component tab to
define its basic settings.
Define your settings as needed. For more information, see tLogRow Standard properties on page
1977.

Procedure
1. Save your Job by pressing Ctrl+S.
1212
tFlowToIterate
Results
Customers' names, customers' e-mails, and customers' postal addresses appear on the console
preceded by the schema column name.
1213
tForeach
tForeach
Creates a loop on a list for an iterate link.
tForeach Standard properties

These properties are used to configure tForeach running in the Standard Job framework.
The Standard tForeach component belongs to the Orchestration family.
Basic settings
Values Use the [+] button to add rows to the Values table. Then
click on the fields to enter the list values to be iterated
upon, between double quotation marks.
Advanced settings
level.
Global Variables

check box.
CURRENT_VALUE: the value currently iterated upon. This is
use from it.
User Guide.
Usage
Usage rule tForeach is an input component and requires an Iterate link

to connect it to another component.
Iterating on a list and retrieving the values

This scenario describes a two component Job in which a list is created and iterated upon in a tForeach
component. The values are then retrieved in a tJava component.
1214
tForeach
Setting up the Job

Procedure
1. Drop a tForeach and a tJava component onto the design workspace.
2. Link tForeach to tJava using a Row > Iterate connection.
Results

Procedure
1. Double-click tForeach to open its Basic settings view:
2. Click the [+] button to add as many rows to the Values list as required.
3. Click on the Value fields to enter the list values, between double quotation marks.
4. Double-click tJava to open its Basic settings view:
5. Enter the following Java code in the Code area: System.out.println(globalMap

.get("tForeach_1_CURRENT_VALUE")+"_out");
1215
tForeach

Procedure
Results
The tJava run view displays the list values retrieved from tForeach, each one suffixed with _out:
1216
tFTPClose
tFTPClose
Closes an active FTP connection to release the occupied resources.
tFTPClose Standard properties

These properties are used to configure tFTPClose running in the Standard Job framework.
The Standard tFTPClose component belongs to the Internet family.
Basic settings
Component list Select the component that opens the connection you need
to close from the list.
Advanced settings
Global Variables

Usage
Usage rule This component is more commonly used with other FTP
components, especially with the tFTPConnection compon
ent.
Related scenarios
• Listing and getting files/folders on an FTP directory on page 1230
• Putting files onto an FTP server on page 1246
• Renaming a file located on an FTP server on page 1253
1217
tFTPConnection
tFTPConnection
Opens an FTP connection to transfer files in a single transaction.
tFTPConnection Standard properties

These properties are used to configure tFTPConnection running in the Standard Job framework.
The Standard tFTPConnection component belongs to the Internet family.
Basic settings
filled in.
Host The IP address or hostname of the FTP server.
Port The listening port number of the FTP server.
Username and Password The user authentication data to access the FTP server.
settings.
SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.
Warning: This option does not work with an HTTP/

HTTPS proxy. If you need a proxy, set a SOCKS proxy in
the Advanced settings tab.
Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
1218
tFTPConnection
version is greater than 3, the encoding should be UTF-8, or

else an error occurs.
FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.
Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
Keystore Password The password for your keystore file.

Security Mode Select the security mode from the list, either Implicit or
Explicit.
Connection mode Select the connection mode from the list, either Passive or
Active.
Encoding Specify the encoding type by selecting an encoding type

from the list or selecting CUSTOM and enter the encoding
type manually.
Advanced settings
Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.
Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.
1219
tFTPConnection
Global Variables

Usage
Usage rule This component is typically used as a single-component

subJob. It is used along with other FTP components.
Related scenarios
• Listing and getting files/folders on an FTP directory on page 1230
• Putting files onto an FTP server on page 1246
• Renaming a file located on an FTP server on page 1253
1220
tFTPDelete
tFTPDelete
Deletes files or folders in a specified directory on an FTP server.
tFTPDelete Standard properties

These properties are used to configure tFTPDelete running in the Standard Job framework.
The Standard tFTPDelete component belongs to the Internet family.
Basic settings
filled in.
settings.
Remote directory The directory where the files/folders to be deleted are

located.
Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
SFTP connection.
1221
tFTPDelete

Password.
FTPS connection.

Explicit.
Use Perl5 Regex Expression as Filemask Select this check box to use Perl5 regular expressions in the
or Files field as file filters. This is useful when the name of
the file to be processed contains special characters such as
parentheses.
For more information about Perl5 regular expression syntax,
see Perl5 Regular Expression Syntax.
Files The names of the files/folders or the paths to the files/

folders to be deleted. You can specify multiple files/folders
in a line by using wildcards or a regular expression.
Target Type Select the type of the target to be deleted, either File or
Directory.
1222
tFTPDelete
Active.

type manually.
an error occurs.
Clear the check box to skip any error and continue the Job
execution process.
Advanced settings
information.
Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.
Global Variables

NB_FILE The number of the files processed. This is an After variable

CURRENT_STATUS The execution result of the component. This is a Flow

1223
tFTPDelete
Usage

Related scenario
1224
tFTPFileExist
tFTPFileExist
Checks if a file or a directory exists on an FTP server.
tFTPFileExist Standard properties

These properties are used to configure tFTPFileExist running in the Standard Job framework.
The Standard tFTPFileExist component belongs to the Internet family.
Basic settings
filled in.
settings.
Remote directory The remote directory under which the file or the directory
will be checked.
Target Type Select the type of the target to be checked, either File or
Directory.
File Name The name of the file or the path to the file to be checked.
1225
tFTPFileExist
This property is available only when File is selected from

the Target Type list.
Directory Name The name of the directory or the path to the directory to be
checked.
This property is available only when Directory is
selected from the Target Type list.
SFTP connection.

Password.
FTPS connection.

Explicit.
Active.
1226
tFTPFileExist

type manually.
Advanced settings
information.
closing errors.
Global Variables

EXISTS The result of whether a specified file/directory exists. This is

a Flow variable and it returns a boolean.
FILENAME The name of the file/directory processed. This is an After

Usage

subJob but can also be used with other components.
Related scenario
1227
tFTPFileList
tFTPFileList
Lists all files and folders directly under a specified directory based on a filemask pattern.
tFTPFileList Standard properties

These properties are used to configure tFTPFileList running in the Standard Job framework.
The Standard tFTPFileList component belongs to the Internet family.
Basic settings
filled in.
settings.
Remote directory The remote directory where the files and folders to be listed
are located.
File detail Select this check box to list the details of each file/folder.
The informative details include the file/folder permissions,
the name of the author, the name of the group of users
that have read/write permissions, the file size, and the last
modification date.
1228
tFTPFileList
Files The names of the files/folders to be listed. You can specify

multiple files/folders in a line by using wildcards or a
regular expression.
SFTP connection.

Password.
FTPS connection.

Explicit.
Active.

type manually.
1229
tFTPFileList
Advanced settings
information.
closing errors.
Global Variables

CURRENT_FILE The current file name. This is a Flow variable and it returns
a string.
CURRENT_FILEPATH The current file path. This is a Flow variable and it returns a
string.

Usage

Listing and getting files/folders on an FTP directory

Here is an example of using Talend FTP components to iterate and list all files and folders on an FTP
server directory, and then get only text files on that directory to a local directory.
1230
tFTPFileList
Creating a Job for listing and getting files/folders on an FTP directory

Create a Job to connect to an FTP server, iterate and list all files and folders on an FTP root directory,
then get only text files on the FTP root directory to a local directory, finally close the connection to
the server.
Before you begin

Prerequisites: To replicate this scenario, an FTP server must be started and a couple of files/folders
must be put onto the root directory of the FTP server.
1231
tFTPFileList
Procedure
1. Create a new Job and add a tFTPConnection component, a tFTPFileList component, a
tIterateToFlow component, a tLogRow component, a tFTPGet component, and a tFTPClose
2. Link the tFTPFileList component to the tIterateToFlow component using a Row > Iterate
connection.
3. Link the tIterateToFlow component to the tLogRow component using a Row > Main connection.
4. Link the tFTPConnection component to the tFTPFileList component using a Trigger > OnSubjobOk
connection.
5. Do the same to link the tFTPFileList component to the tFTPGet component, and the tFTPGet
component to the tFTPClose component.
Opening a connection to the FTP server

Configure the tFTPConnection component to open a connection to the FTP server.
Procedure
1. Double-click the tFTPConnection component to open its Basic settings view.
2. In the Host and Port fields, enter the FTP server IP address and the listening port number
respectively.
3. In the Username and Password fields, enter the authentication details.
Listing all files/folders on the FTP root directory

Configure the tFTPFileList component, the tIterateToFlow component, and the tLogRow component
to iterate all files and folders on the FTP root directory and display the names and paths of these files
and folders on the console of Talend Studio .
Procedure
1. Double-click the tFTPFileList component to open its Basic settings view.
2. Specify the connection details required to access the FTP server. In this example, select the Use
an existing connection check box and from the Component list drop-down list displayed, select
the connection component to reuse its connection details you have already defined.
1232
tFTPFileList
3. In the Remote directory field, specify the FTP server directory on which the files and folders will
be iterated. In this example, it is /, which means the root directory of the FTP server.
4. Clear the Move to the current directory check box.
5. Double-click the tIterateToFlow component to open its Basic settings view.
6. Click the button next to Edit schema to open the schema dialog box.
7. Click the button to add two String type columns filename and filepath that will hold the names
and paths of the files to be iterated respectively. When done, click OK to close the dialog box.
8. In the Mapping table, set the values for the filename and filepath columns. In this example, the
global variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILE")) for filename and the global
variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILEPATH")) for filepath.
Note that you can fill the values by pressing Ctrl + Space to access the global variables list and
then selecting tFTPFileList_1_CURRENT_FILE and tFTPFileList_1_CURRENT_FILEPATH from the list.
9. Double-click the tLogRow component to open its Basic settings view, and then select Table (print
values in cells of a table) in the Mode area for better readability of the result.
1233
tFTPFileList
Getting files on the FTP server directory to a local directory

Configure the tFTPGet component to get only the text files on the FTP root directory to a local
directory.
Procedure
1. Double-click the tFTPGet component to open its Basic settings view.
3. In the Local directory field, specify the local directory to which the files and folders will be
downloaded. In this example, it is D:/FtpDownloads.
4. In the Remote directory field, specify the FTP server directory under which the files and folders
will be downloaded. In this example, it is /, which means the root directory of the FTP server.
5. In the Files table, click the [+] button to add a line and in the Filemask column field, enter *.txt
between double quotation marks to get only the text files on the FTP directory to the local
directory.
Closing the connection to the FTP server

Configure the tFTPClose component to close the connection to the FTP server.
Procedure
1. Double-click the tFTPClose component to open its Basic settings view.
1234
tFTPFileList
2. From the Component list drop-down list, select the tFTPConnection component that opens the
connection you need to close. In this example, only one tFTPConnection component is used and it
is selected by default.
Executing the Job to list and get files/folders on the FTP directory
After setting up the Job and configuring the components used in the Job for listing and getting files/
folders on the FTP directory, you can then execute the Job and verify the Job execution result.
Procedure
As shown above, the names and paths of the files and folders on the FTP server root directory are
displayed on the console, and only the text files are downloaded to the specified local directory.
1235
tFTPFileProperties
tFTPFileProperties
Retrieves the properties of a specified file on an FTP server.
tFTPFileProperties Standard properties

These properties are used to configure tFTPFileProperties running in the Standard Job framework.
The Standard tFTPFileProperties component belongs to the Internet family.
Basic settings
filled in.
Schema and Edit schema A schema is a row description, and it defines the fields to be
The schema of this component is read-only. It describes the
main properties of the specified file. You can click the [...]
button next to Edit schema to view the predefined schema
which contains the following fields:
• abs_path: the absolute path of the file.
• dirname: the directory of the file.
• basename: the name of the file.
• size: the file size in bytes.
• mtime: the timestamp indicating when the file was last
modified, in milliseconds that have elapsed since the
Unix epoch (00:00:00 UTC, Jan 1, 1970).
• mtime_string: the date and time the file was last
modified.
settings.
1236
tFTPFileProperties
Remote directory The path to the directory where the file is available.
File The name of the file or the path to the file whose properties
will be retrieved.
Transfer mode Select the transfer mode from the list, either asciibinary.
SFTP connection.

Password.
FTPS connection.

Explicit.
Active.

type manually.
1237
tFTPFileProperties
Calculate MD5 Hash Select this check box to check the file's MD5.
Advanced settings
information.
closing errors.
Global Variables

Usage
Related scenario
Displaying the properties of a processed file on page 1159
1238
tFTPGet
tFTPGet
Downloads files to a local directory from an FTP directory.
tFTPGet Standard properties

These properties are used to configure tFTPGet running in the Standard Job framework.
The Standard tFTPGet component belongs to the Internet family.
Basic settings
filled in.
settings.
Local directory The local directory in which downloaded files will be saved.
Remote directory The FTP directory from which files will be downloaded.
Overwrite file Select the action to be performed when the file already
exists.
• never: Never overwrite the file.
1239
tFTPGet
• always: Always overwrite the file.

• size different or: Overwrite the file when the file size is
different.
• overwrite: Overwrite the existing file.
• resume: Resume downloading the file from the point of
interruption.
• append: Add data to the end of the file without
overwriting data.
overwrite, resume, and append are available when the SFTP
Support check box is selected.
Append Select this check box to append data at the end of the file in
order to avoid overwriting data.
SFTP connection.

Password.
FTPS connection.

Explicit.
1240
tFTPGet
parentheses.
Files The names of the files or the paths to the files to be

downloaded. You can specify multiple files in a line by using
wildcards or a regular expression.
Active.

type manually.
an error occurs.
execution process.
Advanced settings
information.
closing errors.
Print message Select this check box to display the list of files downloaded
on the console.
1241
tFTPGet
Global Variables



TRANSFER_MESSAGES The file transferred information. This is an After variable

Usage

subJob but can also be used as output or end object.
Related scenario
Listing and getting files/folders on an FTP directory on page 1230
1242
tFTPPut
tFTPPut
Uploads files from a local directory to an FTP directory.
tFTPPut Standard properties

These properties are used to configure tFTPPut running in the Standard Job framework.
The Standard tFTPPut component belongs to the Internet family.
Basic settings
filled in.
settings.
Local directory The local directory from which the files will be uploaded to
the FTP server.
Remote directory The FTP directory where the uploaded files will be placed.
exists.
1243
tFTPPut

different.
• overwrite: Overwrite the existing file.
• resume: Resume downloading the file from the point of
interruption.
• append: Add data to the end of the file without
overwriting data.
overwrite, resume, and append are available when the SFTP
Support check box is selected.
Append Select this check box to append data at the end of the file in
order to avoid overwriting data.
SFTP connection.

Password.
FTPS connection.

Explicit.
1244
tFTPPut

parentheses.
parentheses.
Files Specify the files to be uploaded.

• Filemask: the file names or the path to the files to be
uploaded.
• New name: the name to give the file after the transfer.
Active.

type manually.
an error occurs.
execution process.
Advanced settings
information.
closing errors.
1245
tFTPPut

Global Variables



CURRENT_FILE_EXISTS The result of whether the current file exists. This is a Flow
TRANSFER_MESSAGES The file transferred information. This is an After variable

Usage

subJob but can also be used as output component.
Putting files onto an FTP server

Here is an example of using Talend FTP components to put several files in a local directory onto an
FTP server.
1246
tFTPPut
Creating a Job for putting files onto an FTP server

Create a Job to connect to an FTP server, then put several local files onto the server, finally close the
connection to the server.
Procedure
1. Create a new Job and add a tFTPConnection component, a tFTPPut component, and a tFTPClose
2. Link the tFTPConnection component to the tFTPPut component using a Trigger > OnSubjobOk
connection.
3. Link the tFTPPut component to the tFTPClose component using a Trigger > OnSubjobOk
connection.

Procedure
respectively.
4. From the Connection Mode drop-down list, select the FTP connection mode you want to use,
Active in this example.
Putting files onto the FTP server

Configure the tFTPPut component to put several local files onto the FTP server root directory.
Procedure
1. Double-click the tFTPPut component to open its Basic settings view.
1247
tFTPPut
3. In the Local directory field, specify the local directory that contains the files to be put onto the
FTP server. In this example, it is D:/components.
4. In the Remote directory field, specify the FTP server directory onto which the files will be put. In
this example, it is /, which means the root directory of the FTP server.
6. In the Files table, click twice the [+] button to add two lines, and in the two Filemask column
fields, enter *.txt and *.png respectively, which means only the text and png files in the specified
local directory will be put onto the FTP server root directory.

Procedure
1248
tFTPPut
Executing the Job to put files on the FTP server

After setting up the Job and configuring the components used in the Job for putting files onto the FTP
server, you can then execute the Job and verify the Job execution result.
Procedure
2. Connect to the FTP server to verify the result.
As shown above, only the text and png files in the local directory are put onto the FTP server.
1249
tFTPRename
tFTPRename
Renames files in an FTP directory.
tFTPRename Standard properties

These properties are used to configure tFTPRename running in the Standard Job framework.
The Standard tFTPRename component belongs to the Internet family.
Basic settings
filled in.
settings.
Remote directory The path to the FTP directory where the files to be renamed
are available.
exists.
1250
tFTPRename
different.
SFTP connection.

Password.
FTPS connection.

Explicit.
Files Specify the files to be renamed and their new names.

• Filemask: specify the file to be renamed by entering
the filename or filemask using wildcharacters or
regular expressions.
• New name: enter the new name of the file.
Active.
1251
tFTPRename

type manually.
an error occurs.
execution process.
Advanced settings
information.
closing errors.
Global Variables



Usage
Usage rule This component is generally used as a subJob with one

component, but it can also be used as an output or end
component.
1252
tFTPRename
Renaming a file located on an FTP server

Here is an example of using Talend FTP components to rename a file located on an FTP server.
Creating a Job for renaming a file on an FTP server

Create a Job to connect to an FTP server, then rename a file on the server, finally close the connection
to the server.
Before you begin

Prerequisites: To replicate this scenario, an FTP server must be started and a file must be put onto
the server. In this example, the file movies.json has been put into the folder movies under the root
directory of the FTP server.
Procedure
1. Create a new Job and add a tFTPConnection component, a tFTPRename component, and a
tFTPClose component by typing their names in the design workspace or dropping them from the
Palette.
2. Link the tFTPConnection component to the tFTPRename component using a Trigger >
3. Link the tFTPRename component to the tFTPClose component using a Trigger > OnSubjobOk
connection.
1253
tFTPRename

Procedure
respectively.
Renaming the file on the FTP server

Configure the tFTPRename component to rename the file on the FTP server.
Procedure
1. Double-click the tFTPRename component to open its Basic settings view.
3. In the Remote directory field, enter the directory on the FTP server where the file to be renamed
exists. In this example, it is /movies.
5. In the Files table, click the [+] button to add a line, and then enter the existing file name in the
Filemask column field and the new file name in the New name column field. In this example, they
are movies.json and action_movies.json respectively.

Procedure
1254
tFTPRename
Executing the Job to rename the file on the FTP server

After setting up the Job and configuring the components used in the Job for renaming the file on the
FTP server, you can then execute the Job and verify the Job execution result.
Procedure
2. Connect to the FTP server to verify the result.
As shown above, the file on the FTP server has been renamed from movies.json to action_movies.
json.
1255
tFTPTruncate
tFTPTruncate
Truncates files in an FTP directory.
tFTPTruncate Standard properties

These properties are used to configure tFTPTruncate running in the Standard Job framework.
The Standard tFTPTruncate component belongs to the Internet family.
Basic settings
filled in.
settings.
Remote directory The path to the FTP directory in which the files will be
truncated.
SFTP connection.
1256
tFTPTruncate

Password.
FTPS connection.

Explicit.
parentheses.
Files The names of the files or the paths to the files to be

truncated. You can specify multiple files in a line by using
wildcards or a regular expression.
Active.
1257
tFTPTruncate

type manually.
Advanced settings
information.
closing errors.
Global Variables



Usage

Related scenario
1258
tFuzzyMatch
tFuzzyMatch
Compares a column from the main flow with a reference column from the lookup flow and outputs
the main flow data displaying the distance.
tFuzzyMatch Standard properties

These properties are used to configure tFuzzyMatch running in the Standard Job framework.
The Standard tFuzzyMatch component belongs to the Data Quality family.
Basic settings
in the Repository.
Two read-only columns, Value and Match are added to the
output schema automatically.

Guide.

Matching type Select the relevant matching algorithm among:

Levenshtein: Based on the edit distance theory. It calculates
the number of insertion, deletion or substitution required
for an entry to match the reference entry.
Metaphone: Based on a phonetic algorithm for indexing
entries by their pronunciation. It first loads the phonetics of
all entries of the lookup reference and checks all entries of
the main flow against the entries of the reference flow. It
does not support Chinese characters.
Double Metaphone: a new version of the Metaphone
phonetic algorithm, that produces more accurate results
than the original algorithm. It can return both a primary
and a secondary code for a string. This accounts for
some ambiguous cases as well as for multiple variants
of surnames with common ancestry. It does not support
Chinese characters.
Min distance (Levenshtein only) Set the minimum number of changes

allowed to match the reference. If set to 0, only perfect
matches are returned.
Max distance (Levenshtein only) Set the maximum number of changes

allowed to match the reference.
1259
tFuzzyMatch
Matching column Select the column of the main flow that needs to be
checked against the reference (lookup) key column
Unique matching Select this check box if you want to get the best match
possible, in case several matches are available.
Matching item separator In case several matches are available, all of them are
displayed unless the unique match box is selected. Define
the delimiter between all matches.
Advanced settings
tStatCatcher Select this check box to collect log data at the component level.
Statistics
Global Variables

check box.
use from it.
User Guide.
Usage

requires two input components and an output component.
Checking the Levenshtein distance of 0 in first names

This scenario describes a four-component Job aiming at checking the edit distance between the First
Name column of an input file with the data of the reference input file. The output of this Levenshtein
type check is displayed along with the content of the main flow on a table
1260
tFuzzyMatch
Setting up the Job

Procedure
1. Drag and drop the following components from the Palette to the design workspace: tFileInputDeli
mited (x2), tFuzzyMatch, tLogRow.
2. Link the first tFileInputDelimited component to the tFuzzyMatch component using a Row > Main
connection.
3. Link the second tFileInputDelimited component to the tFuzzyMatch using a Row > Main
connection (which appears as a Lookup row on the design workspace).
4. Link the tFuzzyMatch component to the standard output tLogRow using a Row > Main connection.

Procedure
1. Define the first tFileInputDelimited in its Basic settings view. Browse the system to the input file
to be analyzed.
2. Define the schema of the component. In this example, the input schema has two columns,
firstname and gender.
3. Define the second tFileInputDelimited component the same way.
Warning:
Make sure the reference column is set as key column in the schema of the lookup flow.
1261
tFuzzyMatch
4. Double-click the tFuzzyMatch component to open its Basic settings view, and check its schema.
The Schema should match the Main input flow schema in order for the main flow to be checked
against the reference.
Note that two columns, Value and Matching, are added to the output schema. These are standard
matching information and are read-only.
5. Select the method to be used to check the incoming data. In this scenario, Levenshtein is the
Matching type to be used.
6. Then set the distance. In this method, the distance is the number of char changes (insertion,
deletion or substitution) that needs to be carried out in order for the entry to fully match the
reference.
In this use case, we set both the minimum distance and the maximum distance to 0. This means
only the exact matches will be output.
7. Also, clear the Case sensitive check box.
8. Check that the matching column and look up column are correctly selected.
9. Leave the other parameters as default.
Executing the Job

Procedure
Save the Job and press F6 to execute the Job.
1262
tFuzzyMatch
Results
As the edit distance has been set to 0 (min and max), the output shows the result of a regular join
between the main flow and the lookup (reference) flow, hence only full matches with Value of 0 are
displayed.
A more obvious example is with a minimum distance of 1 and a maximum distance of 2, see
Procedure on page 1263
Checking the Levenshtein distance of 1 or 2 in first names

This scenario is based on the scenario described above. Only the minimum and maximum distance
settings in the tFuzzyMatch component are modified, which will change the output displayed.
Procedure
Procedure
1. In the Component view of the tFuzzyMatch, change the minimum distance from 0 to 1. This
excludes straight away the exact matches (which would show a distance of 0).
2. Change also the maximum distance to 2. The output will provide all matching entries showing a
discrepancy of 2 characters at most.
No other changes are required.
1263
tFuzzyMatch
3. Make sure the Matching item separator is defined, as several references might be matching the
main flow entry.
4. Save the new Job and press F6 to run it.
As the edit distance has been set to 2, some entries of the main flow match more than one
reference entry.
Results
You can also use another method, the metaphone, to assess the distance between the main flow and
the reference, which will be described in the next scenario.
Checking the Metaphonic distance in first name

This scenario is based on the scenario described above.
Procedure
Procedure
1. Change the Matching type to Metaphone. There is no minimum nor maximum distance to set as
the matching method is based on the discrepancies with the phonetics of the reference.
2. Save the Job and press F6. The phonetics value is displayed along with the possible matches.
1264
tFuzzyMatch
1265
tGoogleDataprocManage
Creates or deletes a Dataproc cluster in the Global region on Google Cloud Platform.
tGoogleDataprocManage Standard properties

These properties are used to configure tGoogleDataprocManage running in the Standard Job
framework.
The Standard tGoogleDataprocManage component belongs to the Cloud family.
Fabric.
Basic settings

services.
see the administrator of your Google Cloud Platform or visit
Google Cloud Platform Auth Guide.
Action Select the action you want tGoogleDataprocManage to

perform on the your cluster:
• Start to create a cluster
• Stop to destroy a cluster
Version Select the version of the image to be used to create a

Dataproc cluster.
Region From this drop-down list, select the Google Cloud region to
be used.
Zone Select the geographic zone in which the computing

resources are used and your data is stored and processed.
The available zones vary depending on the region you have
selected from the Regional drop-down list.
A zone in terms of Google Cloud is an isolated location
within a region, another geographical term employed by
Google Cloud.
Instance configuration Enter the parameters to determine how many masters and
workers to be used by the Dataproc cluster to be created
and the performance of these masters and workers.
1266
Advanced settings
Wait for cluster Select this check box to keep this component running until the cluster is completely set up.
ready
When you clear this check box, this component stops running immediately after sending the
command to create.
Master disk size Enter a number without quotation marks to determine the size of the disk of each master instance.
Master local SSD Enter a number without quotation marks to determine the number of local solid-state drive (SSD)
storage devices to be added to each master instance.
According to Google, these local SSDs are suitable only for temporary storage such as caches,
processing space or low value data. It is recommended to store important data to durable storage
options of Google. For further information about the Google storage options, see Durable storage
options.
Worker disk size Enter a number without quotation marks to determine the size of the disk of each worker instance.
Worker local SSD Enter a number without quotation marks to determine the number of local solid-state drive (SSD)
storage devices to be added to each worker instance.
According to Google, these local SSDs are suitable only for temporary storage such as caches,
processing space or low value data. It is recommended to store important data to durable storage
options of Google. For further information about the Google storage options, see Durable storage
options.
Network or Select either check box to use a Google Compute Engine network or subnetwork for the cluster to be
Subnetwork created to enable intra-cluster communications.
As Google does not allow network and subnetwork to be used concurrently, selecting one check box
hides the other check box.
For further information about Google Dataproc cluster network configuration, see Dataproc Network.
Initialization action In this table, select the initialization actions that are available in the shared bucket on Google Cloud
Storage to run on all the nodes in your Dataproc cluster immediately after this cluster is set up.
If you need to use custom initialization scripts, upload them to this shared Google bucket so that
tGoogleDataprocManage can read them.
• In the Executable file column, enter the Google Cloud Storage URI to these scripts to be used,
for example gs://dataproc-initialization-actions/MyScript
• In the Executable timeout column, enter the amount of time within double quotation marks
to determine the duration of the execution. If the executable is not completed at the end of
this timeout, an explanatory error message is returned. The value is a string with up to nine
fractional digits, for example, "3.5s" for 3.5 seconds.
For further information about this shared bucket and the initialization actions, see Initialization
actions.
Statistics
Usage
Usage rule This component is used standalone in a subJob.
1267
tGoogleDriveConnection
Opens a Google Drive connection that can be reused by other Google Drive components.
tGoogleDriveConnection Standard properties

These properties are used to configure tGoogleDriveConnection running in the Standard Job
framework.
The Standard tGoogleDriveConnection component belongs to the Cloud family.
Basic settings
filled in.
Application Name The application name required by Google Drive to get

access to its APIs.
OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.
Access Token The access token generated through Google Developers

OAuth 2.0 Playground.
This property is available only when Access Token is
selected from the OAuth Method drop-down list.
1268
Client ID and Client Secret The client ID and client secret.

These two properties are available only when Installed
Application (Id & Secret) is selected from the
OAuth Method drop-down list.
Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.
Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.
Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.
Advanced settings
DataStore Path The path to the credential file that stores the refresh token.
Note: When your client ID, client secret, or any other

configuration related to the Installed Application
authentication changes, you need to delete this
credential file manually before running your Job again.

Application (Id & Secret) or Installed
Global Variables

1269
Usage
Usage rule This component is more commonly used with other Google
Drive components. In a Job design, it is usually used to
open a Google Drive connection that can be reused by other
Google Drive components.
OAuth methods for accessing Google Drive

Talend provides the following four OAuth methods to access Google Drive using Google Drive
components and metadata wizard.
• Installed Application (Id & Secret)
• Installed Application (JSON)
• Service Account
• Access Token (deprecated)
How to access Google Drive using client ID and secret

To use client ID and client secret to access Google Drive, you need to first generate the client ID and
client secret by completing the following steps using Google Chrome.
Before you begin

A Google account has already been signed up for using Google Drive.
Procedure
1. Go to Google API Console and select an existing project or create a new one. In this example, we
create a new project TalendProject.
2. Go to the Library page and in the right panel, find Google Drive API and enable the Google Drive
API that allows you to access resources from Google Drive.
1270
3. Go to the Credentials page, click OAuth consent screen in the right panel and set a product name
in the Product name shown to users field. In this example, it is TalendProduct. When done,
click Save.
1271
4. Click Create credentials > OAuth client ID, and in the Create client ID page, create a new client ID
TalendApplication with Application type set to Other.
1272
5. Click Create. You will be shown your client ID and client secret that can be used by Google Drive
components and metadata wizard to access Google Drive using the OAuth method Installed
Application (Id & Secret).
How to access Google Drive using a client secret JSON file

To use a client secret JSON file to access Google Drive, you need to first download the client secret
JSON file from Google API Console by completing the following steps using Google Chrome.
1273
Before you begin

The client ID and client secret have been created in Google API Console. For more information, see
How to access Google Drive using client ID and secret on page 1270.
Procedure
1. Go to Google API Console.
2. Go to the Credentials page.
3. Click the Download JSON button to download the client secret JSON file and securely store it in a
local folder. This JSON file can then be used by Google Drive components and metadata wizard to
access Google Drive via the OAuth method Installed Application (JSON).
How to access Google Drive using a service account JSON file

To use a service account JSON file to access Google Drive, you need to first create a service account in
Google API Console, then download the service account JSON file by completing the following steps
using Google Chrome.
Before you begin

1. A Google account has already been signed up for using Google Drive.
2. In Google API Console, your project has been created, the Google Drive API has been enabled, and
the product name has been set. For more information about how to make these configuration, see
How to access Google Drive using client ID and secret on page 1270.
Procedure
1. Go to Google API Console.
2. Open the Service accounts page. If prompted, select your project.
1274
3. Click CREATE SERVICE ACCOUNT.

4. In the Create service account window, type a name for the service account, select Furnish a new
private key and then the key type JSON.
1275
5. Click Create. In the pop-up window, choose a folder and click Save to store your service account
JSON file securely. This JSON file can then be used by Google Drive components and metadata
wizard to access Google Drive via the OAuth method Service Account.
How to access Google Drive using an access token (deprecated)

To use an access token to access Google Drive, you need to first generate the access token by
completing the following steps using Google Developers OAuth Playground.
Before you begin

1. A Google account has already been signed up for using Google Drive.
2. The client ID and client secret have been created in Google API Console. For more information,
see How to access Google Drive using client ID and secret on page 1270.
Procedure
1. Go to Google Developers OAuth Playground.
2. Click OAuth 2.0 Configuration and select Use your own OAuth credentials check box, enter the
OAuth client ID and client secret you have already created in the OAuth Client ID and OAuth
Client secret fields respectively.
1276
3. In OAuth 2.0 Playground Step 1, select the scope https://www.googleapis.com/auth/

drive under Drive API v3 for the Google Drive API and click Authorize APIs, then click Allow to
generate the authorization code.
1277
4. In OAuth 2.0 Playground Step 2, click Exchange authorization code for tokens to generate the
OAuth access token.
The OAuth access token is displayed on the right panel as shown in below figure. It can be used
by Google Drive components and metadata wizard to access Google Drive via the OAuth method
Access Token.
1278
Note that the access token expires in every 3600 seconds. You can click Refresh access token in
OAuth 2.0 Playground Step 2 to refresh it.
Related scenario
Managing files with Google Drive on page 1297
1279
tGoogleDriveCopy
tGoogleDriveCopy
Creates a copy of a file/folder in Google Drive.
tGoogleDriveCopy Standard properties

These properties are used to configure tGoogleDriveCopy running in the Standard Job framework.
The Standard tGoogleDriveCopy component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.
Connection Component Select the component that opens the database connection
to be reused by this component.

access to its APIs.
the drop-down list.
applications .
Google Drive.
1280
tGoogleDriveCopy


algorithm.
Copy Mode Select the type of the item to be copied.

• File: Select this option when you need to copy a file.
• Folder: Select this option when you need to copy a
folder.
Source The name or ID of the source file/folder to be copied.
Source Access Mode Select the method by which the source file/folder is
specified, either by Name or by Id.
Destination Folder Name The name or ID of the destination folder in which the copy
of the source file/folder will be saved.
Destination Access Mode Select the method by which the destination folder is
Rename (set new title) Select this check box to rename the copy of the file/folder
in the destination folder. In the Destination Name field
displayed, enter the name for the file/folder after being
copied to the destination folder.
1281
tGoogleDriveCopy
Remove Source File Select this check box to remove the source file after it is
copied to the destination folder.
This property is available only when File is selected from
the Copy Mode drop-down list.
the [...] button next to Edit schema to view the predefined
schema which contains the following fields:
• sourceId: The ID of the source file/folder.
• destinationId: The ID of the destination file/folder.
Advanced settings


Global Variables

SOURCE_ID The ID of the source file/folder. This is an After variable and

DESTINATION_ID The ID of the destination file/folder. This is an After variable

Usage
Usage rule This component can be used as a standalone component or

as a start component of a Job or subJob.
Related scenario
1282
tGoogleDriveCreate
tGoogleDriveCreate
Creates a new folder in Google Drive.
tGoogleDriveCreate Standard properties

These properties are used to configure tGoogleDriveCreate running in the Standard Job framework.
The Standard tGoogleDriveCreate component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

access to its APIs.
the drop-down list.
applications .
Google Drive.
1283
tGoogleDriveCreate


algorithm.
Parent Folder The name or ID of the parent folder in which a new folder
will be created.
Access Method Select the method by which the parent folder is specified,
either by Name or by Id.
New Folder Name The name of the new folder to be created.
• parentFolderId: the ID of the parent folder.
• newFolderId: the ID of the new folder.
Advanced settings
1284
tGoogleDriveCreate


Global Variables

PARENT_FOLDER_ID The ID of the parent folder. This is an After variable and it

returns a string.
NEW_FOLDER_ID The ID of the new folder. This is an After variable and it

returns a string.
Usage

Related scenario
1285
tGoogleDriveDelete
tGoogleDriveDelete
Deletes a file/folder in Google Drive.
tGoogleDriveDelete Standard properties

These properties are used to configure tGoogleDriveDelete running in the Standard Job framework.
The Standard tGoogleDriveDelete component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

access to its APIs.
the drop-down list.
applications .
Google Drive.
1286
tGoogleDriveDelete


algorithm.
File/Folder The name or ID of the file/folder to be deleted.
Delete Mode Select the method by which the file/folder to be deleted is

Use Trash Select this check box to move the file/folder to be deleted
to the trash.
schema with only one field named fileId which describes
the ID of the file/folder.
Advanced settings
1287
tGoogleDriveDelete


Global Variables

FILE_ID The ID of the file/folder. This is an After variable and it

returns a string.
Usage

Related scenario
1288
tGoogleDriveGet
tGoogleDriveGet
Gets a file's content and downloads the file to a local directory.
tGoogleDriveGet Standard properties

These properties are used to configure tGoogleDriveGet running in the Standard Job framework.
The Standard tGoogleDriveGet component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

access to its APIs.
the drop-down list.
applications .
Google Drive.
1289
tGoogleDriveGet


algorithm.
File The name or ID of the file to be downloaded.
Access Method Select the method by which the file to be downloaded is

Save as File Select this check box to save the file to a local directory.
In the Save to field displayed, browse to or enter the path
where you want to save the file to be downloaded.
schema with only one field named content which describes
the content of the file to be downloaded.
Advanced settings
1290
tGoogleDriveGet


Export Google Doc as Select the type for the Google Doc to be exported.
Export Google Draw as Select the type for the Google Draw to be exported.
Export Google Presentation as Select the type for the Google Presentation to be exported.
Export Google Spreadsheet as Select the type for the Google Spreadsheet to be exported.
Add extension Select this check box to add extension to the exported file.
Global Variables

FILE_ID The ID of the file. This is an After variable and it returns a

string.
Usage

Related scenario
1291
tGoogleDriveList
tGoogleDriveList
Lists all files, or folders, or both files and folders in a specified Google Drive folder.
tGoogleDriveList Standard properties

These properties are used to configure tGoogleDriveList running in the Standard Job framework.
The Standard tGoogleDriveList component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

access to its APIs.
the drop-down list.
applications .
Google Drive.
1292
tGoogleDriveList


algorithm.
Folder Name The name or ID of the folder in which the files/folders will
be listed.
Access Method Select the method by which the folder is specified, either by
Name or by Id.
FileList Type Select the type of data you want to list.

• Files: Only files.
• Directories: Only folders.
• Both: Both files and folders.
Include SubDirectories Select this check box to list also the files/folders in the
subdirectories.
• id: The ID of the file/folder.
• name: The name of the file/folder.
1293
tGoogleDriveList
• mimeType: The MIME type of the file/folder.

• modifiedTime: The last modification date of the file/
folder.
• size: The file size in bytes.
• kind: The kind of the resource.
• trashed: Whether the file has been trashed.
• parents: The ID of the parent folder.
• webViewLink: A link for opening the file in a Google
editor or viewer in a browser.
Advanced settings


Include trashed files Select this check box to also take into account files and
folders that have been removed from the specified path.
Global Variables

Usage

Related scenario
1294
tGoogleDrivePut
tGoogleDrivePut
Uploads data from a data flow or a local file to Google Drive.
tGoogleDrivePut Standard properties

These properties are used to configure tGoogleDrivePut running in the Standard Job framework.
The Standard tGoogleDrivePut component belongs to the Cloud family.
Basic settings
filled in.
drop-down list.

access to its APIs.
the drop-down list.
applications .
Google Drive.
1295
tGoogleDrivePut


algorithm.
File Name The name for the file after being uploaded.
Destination Folder The name or ID of the folder in which uploaded data will be
stored.
Access Method Select the method by which the destination folder is

Replace if Existing Select this check box to overwrite any existing file with the
newly uploaded one.
Upload Mode Select one of the following upload modes from the drop-
down list:
• Upload Incoming content as File: Select this option
to upload data from an input flow of the preceding
component.
• Upload Local File: Select this option to upload data
from a local file. In the File field displayed, specify the
path to the file to be uploaded.
• Expose As OutputStream: Select this option to expose
output stream of this component, which can be used
by other components to write the file content. For
1296
tGoogleDrivePut
example, you can use the Use Output Stream feature

of the tFileOutputDelimited component to feed a given
tGoogleDrivePut's exposed output stream. For more
information, see tFileOutputDelimited on page 1113.
• content: The content of the uploaded data.
• parentFolderId: The ID of the parent folder.
• fileId: The ID of the file.
Advanced settings


Global Variables

PARENT_FOLDER_ID The ID of the parent folder. This is an After variable and it

returns a string.
FILE_ID The ID of the file. This is an After variable and it returns a

string.
Usage
Usage rule This component can be used as a standalone component to

upload a local file to Google Drive or an end component to
upload data from an input flow of the preceding component
to Google Drive.
Managing files with Google Drive

This scenario describes a Job that uploads two files to an empty folder Talend in the root directory
of Google Drive, then creates a new folder Talend Backup in the root directory and copies one of
1297
tGoogleDrivePut
the two files to the new folder Talend Backup, and finally lists and displays all files and folders in
the root directory of Google Drive on the console.
Creating a Job for managing files with Google Drive

Procedure
1. Create a new Job and add a tGoogleDriveConnection component, two tGoogleDrivePut
components, a tFileInputRaw component, a tGoogleDriveCreate component, a tGoogleDriveCopy
component, a tGoogleDriveList component, and five tLogRow components to the Job.
1298
tGoogleDrivePut
2. Link the first tGoogleDrivePut component to the first tLogRow component using a Row > Main
connection.
3. Do the same to link the tFileInputRaw component to the second tGoogleDrivePut component,
the second tGoogleDrivePut component to the second tLogRow component, the tGoogleDriveCr
eate component to the third tLogRow component, the tGoogleDriveCopy component to the fourth
tLogRow component, the tGoogleDriveList component to the fifth tLogRow component.
4. Link the tGoogleDriveConnection component to the first tGoogleDrivePut component using a
Trigger > On Subjob Ok connection.
5. Do the same to link the first tGoogleDrivePut component to the tFileInputRaw component,
the tFileInputRaw component to the tGoogleDriveCreate component, the tGoogleDriveCreate
component to the tGoogleDriveCopy component, and the tGoogleDriveCopy component to the
tGoogleDriveList component.
Opening a connection to Google Drive

Configure the tGoogleDriveConnection component to connect to Google Drive using a client secret
JSON file.
1299
tGoogleDrivePut
Before you begin

• The client secret JSON file has been downloaded into a local folder through Google API Console.
For more information, see How to access Google Drive using a client secret JSON file on page
1273.
• An empty folder Talend has been created in the root directory of Google Drive.
Procedure
1. Double-click the tGoogleDriveConnection component to open its Basic settings view in the
Component tab.
2. In the Application Name field, enter the application name required by Google Drive to get access
to its API. In this example, it is TalendProject.
3. Select Installed Application (JSON) from the OAuth Method drop-down list.
4. In the Client Secret JSON field, specify the path to the client secret JSON file you have generated,
D:/client_secret.json in this example.
Uploading files to Google Drive

Procedure
1. Double-click the first tGoogleDrivePut component to open its Basic settings view in the
Component tab.
1300
tGoogleDrivePut
2. Select the component that will create the Google Drive connection from the Connection
Component drop-down list, tGoogleDriveConnection_1 in this example.
3. Select by Name from the Access Method drop-down list and in the Destination Folder field, enter
the name of the folder in which the file will be uploaded, Talend in this example.
Note: When accessing a Google Drive resource by its name, if the name matches more than one
resource, an error will be thrown because the resource cannot be identified precisely. In this
case, you can specify the Google Drive resource using a pseudo path hierarchy, like /Talend/
Documentation. This example specifies a folder named Documentation under the folder
Talend under the Google Drive root folder.
4. In the File Name field, enter the name for the file after being uploaded. In this example, it is
Talend Customers.csv.
5. Select Upload Local File from the Upload Mode drop-down list and in the File field, browse
to or enter the path to the file to be uploaded. In this example, it is D:/Downloads/Talend
Customers.csv.
6. Double-click the tFileInputRaw component and on its Basic settings view, select Read the
file as a bytes array in the Mode area and specify the path to the file whose content will
be uploaded in the Filename field, D:/Downloads/Talend Products.txt in this example.
7. Double-click the second tGoogleDrivePut component to open its Basic settings view in the
Component tab.
8. Repeat step 2 on page 1301 to step 3 on page 1301 to configure this component.
9. In the File Name field, enter the name for the file after being uploaded. In this example, it is
Talend Products.txt.
10. Select Upload Incoming content as File from the Upload Mode drop-down list.
1301
tGoogleDrivePut
Creating a new folder in Google Drive

Procedure
1. Double-click tGoogleDriveCreate to open its Basic settings view in the Component tab.
3. In the Parent Folder field, enter the name of the folder in which a new folder will be created. In
this example, it is root.
4. In the New Folder Name field, enter the name of the folder to be created. In this example, it is
Talend Backup.
5. Double-click the third tLogRow component to open its Basic settings view in the Component tab.
6. In the Mode area, select Vertical (each row is a key/value list) for a better display of the results.
Copying a file to the newly created folder

Procedure
1. Double-click the tGoogleDriveCopy component to open its Basic settings view in the Component
tab.
3. Select File from the Copy Mode drop-down list.
4. In the Source field, enter the name of the file to be copied. In this example, it is Talend
Customers.csv.
5. In the Destination Folder Name field, enter the name of the folder to which the file will be copied.
In this example, it is Talend Backup.
6. Select the Rename (set new title) check box and in the Destination Name field, enter a new
name for the file after being copied to the destination folder. In this example, it is Talend
Customers v1.0.csv.
1302
tGoogleDrivePut
7. Double-click the fourth tLogRow component to open its Basic settings view in the Component tab.
Listing files and folders in Google Drive

Procedure
1. Double-click the tGoogleDriveList component to open its Basic settings view in the Component
tab.
3. In the Folder Name field, enter the name of the folder in which the files/folders will be listed. In
this example, it is the root directory of Google Drive and you can use the alias root to refer to it.
4. Select Both from the FileList Type drop-down list to list both files and folders in the root
directory.
5. Select the Include SubDirectories check box to list also the files/folders in the subdirectories.
6. Double-click the fifth tLogRow component to open its Basic settings view in the Component tab.

Procedure
1303
tGoogleDrivePut
1304
tGoogleDrivePut
As shown above, two files Talend Products.txt and Talend Customers.csv were
uploaded to the folder Talend, then a new folder Talend Backup was created in the root
folder and the file Talend Customers.csv was copied to the new folder and renamed to
Talend Customers v1.0.csv, and finally all files and folders in the root directory are listed
on the console.
1305
tGPGDecrypt
tGPGDecrypt
Calls the gpg -d command to decrypt a GnuPG-encrypted file and saves the decrypted file in the
specified directory.
tGPGDecrypt Standard properties

These properties are used to configure tGPGDecrypt running in the Standard Job framework.
The Standard tGPGDecrypt component belongs to the File family.
Basic settings
Input encrypted file File path to the encrypted file.
Output file File path to the output decrypted file.
GPG binary path File path to the GPG command.
Passphrase Enter the passphrase used in encrypting the specified input

file.
To enter the passphrase, click the [...] button next to the
passphrase field, and then in the pop-up dialog box enter
the passphrase between double quotes and click OK to save
the settings.
No TTY Terminal Select this check box to speficy that no TTY terminal is
used by adding the --no-tty option to the decryption
command.
Advanced settings
Global Variables
Global Variables FILE: the name of the output file. This is a Flow variable and
FILEPATH: the path of the output file. This is a Flow variable
check box.
1306
tGPGDecrypt

use from it.
User Guide.
Usage
Decrypting a GnuPG-encrypted file and display its content

The following scenario describes a three-component Job that decrypts a GnuPG-encrypted file and
displays the content of the decrypted file on the Run console.
Dragging and linking the components

Procedure
1. Drop a tGPGDecrypt component, a tFileInputDelimited component, and a tLogRow component
from the Palette to the design workspace.
2. Connect the tGPGDecrypt component to the tFileInputDelimited component using a Trigger >
OnSubjobOk link, and connect the tFileInputDelimited component to the tLogRow component
using a Row > Main link.

Procedure
1. Double-click the tGPGDecrypt to open its Component view and set its properties:
1307
tGPGDecrypt
2. In the Input encrypted file field, browse to the file to be decrypted.

3. In the Output decrypted file field, enter the path to the decrypted file.
Warning:
If the file path contains accented characters, you will get an error message when running the
Job.
4. In the GPG binary path field, browse to the GPG command file.
5. In the Passphrase field, enter the passphrase used when encrypting the input file.
6. Double-click the tFileInputDelimited component to open its Component view and set its
properties:
7. In the File name/Stream field, define the path to the decrypted file, which is the output path you
have defined in the tGPGDecrypt component.
8. In the Header, Footer and Limit fields, define respectively the number of rows to be skipped in the
beginning of the file, at the end of the file and the number of rows to be processed.
9. Use a Built-In schema. This means that it is available for this Job only.
10. Click Edit schema and edit the schema for the component. Click twice the [+] button to add two
columns that you will call idState and labelState.
11. Click OK to validate your changes and close the editor.
1308
tGPGDecrypt
12. Double-click the tLogRow component and set its properties:
13. Use a Built-In schema for this scenario.

14. In the Mode area, define the console display mode according to your preference. In this scenario,
select Table (print values in cells of a table).

Procedure
1. Press Ctrl+S to save your Job
2. Press F6 or click Run from the Run tab to run it.
1309
tGPGDecrypt
Results
The specified file is decrypted and the defined number of rows of the decrypted file are printed on the
Run console.
1310
tGreenplumBulkExec
tGreenplumBulkExec
Improves performance when loading data in a Greenplum database.
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT
statement used to feed a database. These two steps are fused together in the tGreenplumOutp
utBulkExec component, detailed in a separate section. The advantage of using a two step process is
that it makes it possible to transform data before it is loaded in the database.
tGreenplumBulkExec performs an Insert action on the data.
tGreenplumBulkExec Standard properties

These properties are used to configure tGreenplumBulkExec running in the Standard Job framework.
The Standard tGreenplumBulkExec component belongs to the Databases family.
Basic settings

connection.
Guide.
1311
tGreenplumBulkExec
Schema Exact name of the schema.

settings.
operations:
again.
exist.
Filename Name of the file to be loaded.
Warning:
This file is located on the machine specified by the URI
in the Host field so it should be on the same machine as
the database server.
component only.

Job designs.
alend.com).
1312
tGreenplumBulkExec

available:
only.
Advanced settings
Action on data Select the operation you want to perform:

Bulk insert Bulk update The details asked will be different
according to the action chosen.
Copy the OID for each row Retrieve the ID item for each row.
Contains a header line with the names of each column in th Specify that the table contains header.
e file
File type Select the file type to process.
Null string String displayed to indicate that the value is null.
Fields terminated by Character, string or regular expression to separate fields.
Escape char Character of the row to be escaped
Text enclosure Character used to enclose text.
Force not null for columns Define the columns nullability

Force not null: Select the check box next to the column you
want to define as not null.
level.
Usage
Usage rule This component is generally used with a tGreenplumOutp

utBulk component. Used together they offer gains in
performance while feeding a Greenplum database.
1313
tGreenplumBulkExec

unusable.
Guide.
Related scenarios
For more information about tGreenplumBulkExec, see:
• Inserting data in bulk in MySQL database on page 2489.
1314
tGreenplumClose
tGreenplumClose
Closes a connection to the Greenplum database.
tGreenplumClose Standard properties

These properties are used to configure tGreenplumClose running in the Standard Job framework.
The Standard tGreenplumClose component belongs to the Databases family.
Basic settings
Component list Select the tGreenplumConnection component in the list if

more than one connection are planned for the current Job.
Advanced settings
level.
Usage
Usage rule This component is to be used along with Greenplum

components, especially with tGreenplumConnection and
tGreenplumCommit.
Guide.
1315
tGreenplumClose
Related scenarios
1316
tGreenplumCommit
tGreenplumCommit
Commits global transaction in one go instead of repeating the operation for every row or every batch
and thus provides gain in performance.
tGreenplumCommit validates the data processed through the Job into the connected DB. This
component uses an unique connection.
tGreenplumCommit Standard properties

These properties are used to configure tGreenplumCommit running in the Standard Job framework.
The Standard tGreenplumCommit component belongs to the Databases family.
Basic settings

Warning:
tGreenplumCommit to your Job, your data will be
Advanced settings
level.
Usage

tGreenplum* components, especially with the
tGreenplumConnection and tGreenplumRollback
components.
1317
tGreenplumCommit

Guide.
Related scenarios
For tGreenplumCommit related scenarios, see:
• Mapping data using a simple implicit join on page 686.
• Inserting data in mother/daughter tables on page 2426.
1318
tGreenplumConnection
subjobs.
tGreenplumConnection opens a connection to the database for a current transaction.
tGreenplumConnection Standard properties

These properties are used to configure tGreenplumConnection running in the Standard Job framework.
The Standard tGreenplumConnection component belongs to the Databases and the ELT families.
Basic settings


settings.
1319

Advanced settings
commit component.
component.
Usage

other tGreenplum* components, especially with the
tGreenplumCommit and tGreenplumRollback components.
Related scenarios
For tGreenplumConnection related scenarios, see:
• tMysqlConnection on page 2425.
1320
tGreenplumGPLoad
tGreenplumGPLoad
Bulk loads data into a Greenplum table either from an existing data file, an input flow, or directly from
a data flow in streaming mode through a named-pipe.
tGreenplumGPLoad inserts data into a Greenplum database table using Greenplum's gpload utility.
tGreenplumGPLoad Standard properties

These properties are used to configure tGreenplumGPLoad running in the Standard Job framework.
The Standard tGreenplumGPLoad component belongs to the Databases family.
Basic settings

Port Listening port number of the DB server.
Database Name of the Greenplum database.

settings.
Table Name of the table into which the data is to be inserted.
operations before loading the data:
Clear table: The table content is deleted before the data is
loaded.
exist.
again.
1321
tGreenplumGPLoad
Truncate table: The table content is deleted. You do not

Job stops.
Merge: Updates or adds data to the table.
Warning:
key on which the Update and Merge operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to set
as primary key(s). To define the Update/Merge options,
select in the Match Column column the check boxes
corresponding to the column names that you want to use as
a base for the Update and Merge operations, and select in
the Update Column column the check boxes corresponding
to the column names that you want to update. To define
the Update condition, type in the condition that will be
used to update the data.
component only.

Job designs.

available:
only.
Data file Full path to the data file to be used. If this component is
used in standalone mode, this is the name of an existing
data file to be loaded into the database. If this component
is connected with an input flow, this is the name of the file
to be generated and written with the incoming data to later
be used with gpload to load into the database. This field is
hidden when the Use named-pipe check box is selected.
1322
tGreenplumGPLoad
Use named-pipe Select this check box to use a named-pipe. This option is
only applicable when the component is connected with an
input flow. When this check box is selected, no data file is
generated and the data is transferred to gpload through a
named-pipe. This option greatly improves performance in
both Linux and Windows.
Note:
This component on named-pipe mode uses a JNI
interface to create and write to a named-pipe on any
Windows platform. Therefore the path to the associated
JNI DLL must be configured inside the java library path.
The component comes with two DLLs for both 32 and
64 bit operating systems that are automatically provided
in the Studio with the component.
Named-pipe name Specify a name for the named-pipe to be used. Ensure that
the name entered is valid.
Row > Rejects link.
Advanced settings
Use existing control file (YAML formatted) Select this check box to provide a control file to be used
with the gpload utility instead of specifying all the options
explicitly in the component. When this check box is
selected, Data file and the other gpload related options no
longer apply. Refer to Greenplum's gpload manual for de
tails on creating a control file.
Control file Enter the path to the control file to be used, between
double quotation marks, or click [...] and browse to the
control file. This option is passed on to the gpload utility
via the -f argument.
CSV mode Select this check box to include CSV specific parameters
such as Escape char and Text enclosure.
Field separator Character, string, or regular expression used to separate

fields.
Warning:
This is gpload's delim argument. The default value is |. To
improve performance, use the default value.
Escape char Character of the row to be escaped.
Header (skips the first row of data file) Select this check box to skip the first row of the data file.
Additional options Set the gpload arguments in the corresponding table. Click
[+] as many times as required to add arguments to the
table. Click the Parameter field and choose among the
1323
tGreenplumGPLoad
arguments from the list. Then click the corresponding Value

field and enter a value between quotation marks.
LOCAL_HOSTNAME: The host name or IP address of the

local machine on which gpload is running. If this machine
is configured with multiple network interface cards (NICs),
you can specify the host name or IP of each individual NIC
to allow network traffic to use all NICs simultaneously. By
default, the local machine's primary host name or IP is used.
PORT (gpfdist port): The specific port number that the

gpfdist file distribution program should use. You can also
specify a PORT_RANGE to select an available port from
the specified range. If both PORT and PORT_RANGE are
defined, then PORT takes precedence. If neither PORT or
PORT_RANGE is defined, an available port between 8000
and 9000 is selected by default. If multiple host names are
declared in LOCAL_HOSTNAME, this port number is used for
all hosts. This configuration is desired if you want to use all
NICs to load the same file or set of files in a given directory
location.
PORT_RANGE: Can be used instead of PORT (gpfdist port)

to specify a range of port numbers from which gpload can
choose an available port for this instance of the gpfdist file
distribution program.
NULL_AS: The string that represents a null value. The

default is \N (backslash-N) in TEXT mode, and an empty
value with no quotation marks in CSV mode. Any source
data item that matches this string will be considered a null
value.
FORCE_NOT_NULL: In CSV mode, processes each specified

column as though it were quoted and hence not a NULL
value. For the default null string in CSV mode (nothing
between two delimiters), this causes missing values to be
evaluated as zero-length strings.
ERROR_LIMIT (2 or higher): Enables single row error

isolation mode for this load operation. When enabled
and the error limit count is not reached on any Greenplum
segment instance during input processing, all good rows w
ill be loaded and input rows that have format errors will be
discarded or logged to the table specified in ERROR_TABLE
if available. When the error limit is reached, input rows
that have format errors will cause the load operation to
abort. Note that single row error isolation only applies to
data rows with format errors, for example, extra or missing
attributes, attributes of a wrong data type, or invalid client
encoding sequences. Constraint errors, such as primary
key violations, will still cause the load operation to abort
if encountered. When this option is not enabled, the load
operation will abort on the first error encountered.
ERROR_TABLE: When ERROR_LIMIT is declared, specifies an

error table where rows with formatting errors will be logged
when running in single row error isolation mode. You can
then examine this error table to see error rows that were
not loaded (if any).
1324
tGreenplumGPLoad
Log file Browse to or enter the access path to the log file in your d
irectory.
Encoding Define the encoding type manually in the field.
Specify gpload path Select this check box to specify the full path to the gpload
executable. You must check this option if the gpload path is
not specified in the PATH environment variable.
Full path to gpload executable Full path to the gpload executable on the machine in
use. It is advisable to specify the gpload path in the PATH
environment variable instead of selecting this option.
level.
Global Variables

GPLOAD_OUTPUT: the output information when the gpload
utility is the executed. This is an After variable and it
returns a string.
check box.
use from it.
User Guide.
Usage

transformation is required on the data to be loaded on to
the database.
This component can be used as a standalone or an output
component.

1325
tGreenplumGPLoad
Related scenario
For a related use case, see Inserting data in bulk in MySQL database on page 2489.
1326
tGreenplumInput
tGreenplumInput
tGreenplumInput executes a DB query with a strictly defined order which must correspond to the
schema definition and then it passes on the field list to the next component via a Main row link.
tGreenplumInput Standard properties

These properties are used to configure tGreenplumInput running in the Standard Job framework.
The Standard tGreenplumInput component belongs to the Databases family.
Basic settings



settings.
1327
tGreenplumInput
component only.

Job designs.

available:
only.
definition.
Guess schema Click the Guess schema button to retrieve the table schema.
Advanced settings
Use cursor When selected, helps to decide the row set to work with at a
time and thus optimize performance.

columns.
level.
Global Variables

check box.
1328
tGreenplumInput

use from it.
User Guide.
Usage
Usage rule This component covers all possible SQL queries for FireBird
databases.
unusable.
Guide.
Related scenarios
See also related topic: Reading data from different MySQL databases using dynamically loaded
1329
tGreenplumOutput
tGreenplumOutput
Executes the action defined on the table and/or on the data of a table, according to the input flow
from the previous component.
tGreenplumOutput writes, updates, modifies or deletes the data in a database.
tGreenplumOutput Standard properties

These properties are used to configure tGreenplumOutput running in the Standard Job framework.
The Standard tGreenplumOutput component belongs to the Databases family.
Basic settings


1330
tGreenplumOutput
connection.
Guide.

settings.
operations:
again.
exist.
Job stops.
be inserted.
1331
tGreenplumOutput
Warning:
Key in update column , select the check boxes next to the
Delete operation.
component only.

Job designs.
alend.com).

available:
only.
Row > Rejects link.
Advanced settings
1332
tGreenplumOutput






column.
processing.
Note:
option.

batch.
is selected.
level.
Global Variables

1333
tGreenplumOutput

check box.
use from it.
User Guide.
Usage
Usage rule This component covers all possible SQL queries for
Greenplum databases. It allows you to carry out actions on
a table or on the data of a table in a Greenplum database.
It enables you to create a reject flow, with a Row > Rejects
link filtering the data in error. For a usage example, see
Retrieving data in error with a Reject link on page 2474.
unusable.
Guide.
Related scenarios
1334
tGreenplumOutput

1335
tGreenplumOutputBulk
Prepares the file to be used as parameter in the INSERT query to feed the Greenplum database.
operation used to feed a database. These two steps are fused together in the tGreenplumOutp
utBulkExec component, detailed in a separate section. The advantage of using a two step process is
that it makes it possible to transform data before it is loaded in the database.
Writes a file with columns based on the defined delimiter and the Greenplum standards
tGreenplumOutputBulk Standard properties

These properties are used to configure tGreenplumOutputBulk running in the Standard Job framework.
The Standard tGreenplumOutputBulk component belongs to the Databases family.
Basic settings

File Name Name of the file to be generated.
Warning:
This file is generated on the local machine or a shared
folder on the LAN.
records
component only.
1336

Job designs.
alend.com).

available:
only.
Advanced settings
Field separator Character, string or regular expression to separate fields.
Include header Select this check to include the column header.
handling.
tStatCatcher statistics Select this check box to collect log data at the component
level.
Global Variables

check box.
use from it.
1337

User Guide.
Usage
Usage rule This component is to be used along with tGreenplumBulk

Exec component. Used together they offer gains in
performance while feeding a Greenplum database.
Component family Databases/Greenplum
Related scenarios
For use cases in relation with tGreenplumOutputBulk, see the following scenarios:
1338
tGreenplumOutputBulkExec
Provides performance gains during Insert operations to a Greenplum database.
operation used to feed a database. These two steps are fused together in the tGreenplumOutp
utBulkExec component.
tGreenplumOutputBulkExec executes the action on the data provided.
tGreenplumOutputBulkExec Standard properties

These properties are used to configure tGreenplumOutputBulkExec running in the Standard Job
framework.
The Standard tGreenplumOutputBulkExec component belongs to the Databases family.
Basic settings


Currently, only localhost, 127.0.0.1 or the exact IP address of
the local machine is allowed for proper functioning. In other
words, the database server must be installed on the same m
achine where the Studio is installed or where the Job using
tGreenplumOutputBulkExec is deployed.
Database name Name of the database.

settings.
1339
Table Name of the table to be written.

Note that only one table can be written at a time and that
the table must exist for the insert operation to succeed.
operations:
again.
exist.
Clear a table: The table content is deleted. You have the
File Name Name of the file to be generated and loaded.
Warning:
This file is generated on the machine specified by
the URI in the Host field so it should be on the same
machine as the database server.
component only.

Job designs.
alend.com).

available:
only.
1340

Advanced settings

Copy the OID for each row Retrieve the ID item for each row.
Contains a header line with the names of each column in th Specify that the table contains header.
e file
File type Select the file type to process.
Null string String displayed to indicate that the value is null.
Fields terminated by Character, string or regular expression to separate fields.
Escape char Character of the row to be escaped
Force not null for columns Define the columns nullability

Force not null: Select the check box next to the column you
want to define as not null.
tStatCatcherStatistics Select this check box to collect log data at the component
level.
Usage

transformation is required on the data to be loaded onto the
database.
Limitation The database server must be installed on the same machine

where the Studio is installed or where the Job using
tGreenplumOutputBulkExec is deployed, so that the co
mponent functions properly.
Related scenarios
For use cases in relation with tGreenplumOutputBulkExec, see the following scenarios:
1341
tGreenplumRollback
tGreenplumRollback
Avoids to commit part of a transaction involuntarily.
tGreenplumRollback cancels the transaction committed in the connected DB.
tGreenplumRollback Standard properties

These properties are used to configure tGreenplumRollback running in the Standard Job framework.
The Standard tGreenplumRollback component belongs to the Databases family.
Basic settings

Advanced settings
level.
Usage

other tGreenplum* components, especially with the
tGreenplumConnection and tGreenplumCommit
components.
1342
tGreenplumRollback

Guide.
Related scenarios
For tGreenplumRollback related scenario, see Rollback from inserting data in mother/daughter tables
on page 2429.
1343
tGreenplumRow
tGreenplumRow
Acts on the actual DB structure or on the data (although without handling data), depending on the
nature of the query and the database.
The SQLBuilder tool helps you write easily your SQL statements.
tGreenplumRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesn't provide output.
tGreenplumRow Standard properties

These properties are used to configure tGreenplumRow running in the Standard Job framework.
The Standard tGreenplumRow component belongs to the Databases family.
Basic settings

connection.
Guide.
1344
tGreenplumRow

settings.
component only.

Job designs.

available:
only.
Table Name Name of the table to be read.

graphically using SQLBuilder.


definition.
1345
tGreenplumRow
Row > Rejects link.
Advanced settings
use column list.
Note:
Parameter table, define the represented by "?" in the SQL
instruction of the Query field in the Basic Settings tab.
instruction.
Note:
increased

level.
Global Variables
check box.
1346
tGreenplumRow

use from it.
User Guide.
Usage
unusable.
Guide.
Related scenarios
For a related scenario, see:
1347
tGreenplumSCD
tGreenplumSCD
Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table.
tGreenplumSCD reflects and tracks changes in a dedicated Greenplum SCD table.
tGreenplumSCD Standard properties

These properties are used to configure tGreenplumSCD running in the Standard Job framework.
The Standard tGreenplumSCD component belongs to the Business Intelligence and the Databases
families.
Basic settings

data.
connection.
Guide.
Connection type Select the relevant driver on the list.
1348
tGreenplumSCD
Schema Name of the DB schema.

settings.
available:
only.

Guide.

Studio User Guide.
on page 2511.
Source keys include Null Select this check box to allow the source key columns to
have Null values.
Warning:
Special attention should be paid to the uniqueness of the
source key(s) value when this option is selected.
1349
tGreenplumSCD
rows.
Advanced settings
12:00:00.
level.
Global Variables

check box.
use from it.
User Guide.
Usage

1350
tGreenplumSCD

unusable.
Guide.
Related scenario
1351
tGroovy
tGroovy
tGroovy broadens the functionality if the Talend Job, using the Groovy language which is a simplified
Java syntax.
tGroovy allows you to enter customized code which you can integrate in the Talend programme. The
code is run only once.
tGroovy Standard properties

These properties are used to configure tGroovy running in the Standard Job framework.
The Standard tGroovy component belongs to the Custom Code family.
Basic settings
Groovy Script Enter the Groovy code you want to run.
Variables This table has two columns.

Name: Name of the variable called in the code.
Value: Value associated with the variable.
Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component can be used alone or as a subJob along

with one other component.
Limitation Knowledge of the Groovy language is required.
1352
tGroovy
Related Scenarios
• For a scenario using the Groovy code, see Calling a file which contains Groovy code on page
1355.
• For a functional example, see Printing out a variable content on page 1823
1353
tGroovyFile
tGroovyFile
Broadens the functionality of Talend Jobs using the Groovy language which is a simplified Java
syntax.
tGroovyFile allows you to call an existing Groovy script.
tGroovyFile Standard properties

These properties are used to configure tGroovyFile running in the Standard Job framework.
The Standard tGroovyFile component belongs to the Custom Code family.
Basic settings
Groovy File Name and path of the file containing the Groovy code.
Variables This table contains two columns.

Name: Name of the variable called in the code.
Value: Value associated with this variable.
Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component can be used alone or as a subJob along

with another component.
Limitation Knowledge of the Groovy language is required.
1354
tGroovyFile
Calling a file which contains Groovy code

This scenario uses tGroovyFile, on its own. The Job calls a file containing Groovy code in order to
display the file information in the Console.
Setting up the Job

Open the Custom_Code folder in the Palette and drop a tGroovyFile component onto the workspace.
Configuring the tGroovyFile component

Procedure
1. Double-click the component to display the Component view.
2. In the Groovy File field, enter the path to the file containing the Groovy code, or browse to the
file in your directory. In this example, it is D:/Input/Ageducapitaine.txt, and the file contains the
following Groovy codes:
println("The captain is " + age + " years old")
3. In the Variables table, add a line by clicking the [+] button.

4. In the Name column, enter "age", and then in the Value column, enter 50.
Executing the Job

Procedure
The Console displays the information contained in the input file, to which the variable result is
added.
1355
tGroovyFile
1356
tGSBucketCreate
tGSBucketCreate
Creates a new bucket which you can use to organize data and control access to data in Google Cloud
Storage.
tGSBucketCreate Standard properties

These properties are used to configure tGSBucketCreate running in the Standard Job framework.
The Standard tGSBucketCreate component belongs to the Big Data and the Cloud families.
Basic settings
Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.
Bucket name Specify the name of the bucket which you want to create.
Note that the bucket name must be unique across the
Google Cloud Storage system.
For more information about the bucket naming convention,
see https://developers.google.com/storage/docs/
bucketnaming.
Special configure Select this check box to provide the additional configuration
for the bucket to be created.
Project ID Specify the project ID to which the new bucket belongs.
Location Select from the list the location where the new bucket
will be created. Currently, Europe and US are available. By
default, the bucket location is in the US.
Note that once a bucket is created in a specific location, it
cannot be moved to another location.
1357
tGSBucketCreate
Acl Select from the list the desired access control list (ACL) for
the new bucket.
Depending on the ACL on the bucket, the access requests
from users may be allowed or rejected. If you do not specify
a predefined ACL for the new bucket, the predefined
project-private ACL applies.
For more information about ACL, see https://develo
pers.google.com/storage/docs/accesscontrol?hl=en.
rows.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage

tGSBucketList component to check if a new bucket is cre
ated successfully.
Related scenario
For related topics, see Verifing the absence of a bucket, creating it and listing all the S3 buckets on
page 3176.
1358
tGSBucketDelete
tGSBucketDelete
Deletes an empty bucket in Google Cloud Storage so as to release occupied resources.
Note that bucket deletion cannot be undone, so you need to back up any data that you want to keep
before the deletion.
tGSBucketDelete Standard properties

These properties are used to configure tGSBucketDelete running in the Standard Job framework.
The Standard tGSBucketDelete component belongs to the Big Data and the Cloud families.
Basic settings
settings.
box.
Bucket name Specify the name of the bucket that you want to delete.
Make sure that the bucket to be deleted is empty.
rows.
Advanced settings
Global Variables

1359
tGSBucketDelete

check box.
use from it.
User Guide.
Usage

tGSBucketList component to check if the specified bucket is
deleted successfully.
Related scenarios
1360
tGSBucketExist
tGSBucketExist
Checks the existence of a bucket in Google Cloud Storage so as to make further operations.
tGSBucketExist Standard properties

These properties are used to configure tGSBucketExist running in the Standard Job framework.
The Standard tGSBucketExist component belongs to the Big Data and the Cloud families.
Basic settings
settings.
box.
Bucket name Specify the name of the bucket for which you want to perf
orm a check to confirm it exists in Google Cloud Storage.
rows.
Advanced settings
Global Variables
Global Variables BUCKET_EXIST: the existence of a specified bucket. This is a

BUCKET_NAME: the name of a specified bucket. This is a
1361
tGSBucketExist

check box.
use from it.
User Guide.
Usage
Related scenario
page 3176.
1362
tGSBucketList
tGSBucketList
Retrieves a list of buckets from all projects or one specific project in Google Cloud Storage.
tGSBucketList iterates on all buckets within all projects or one specific project in Google Cloud
Storage.
tGSBucketList Standard properties

These properties are used to configure tGSBucketList running in the Standard Job framework.
The Standard tGSBucketList component belongs to the Big Data and the Cloud families.
Basic settings
settings.
box.
Specify project ID Select this check box and in the Project ID field specify a
project ID from which you want to retrieve a list of buckets.
Advanced settings
Global Variables
Global Variables CURRENT_BUCKET_NAME: the current bucket name. This is

NB_BUCKET: the number of buckets. This is an After variable
1363
tGSBucketList

check box.
use from it.
User Guide.
Usage
Usage rule The tGSBucketList component can be used as a standalone

component or as a start component of a process.
Related scenario
page 3176.
1364
tGSClose
tGSClose
Closes an active connection to Google Cloud Storage in order to release the occupied resources.
tGSClose Standard properties

These properties are used to configure tGSClose running in the Standard Job framework.
The Standard tGSClose component belongs to the Big Data and the Cloud families.
Basic settings
Component List Select the tGSConnection component in the list if more than
one connection is planned for the current Job.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is generally used with other Google Cloud
Storage components, particularly tGSConnection.
Related scenario
For a scenario in which tGSClose is used, see Managing files with Google Cloud Storage on page
1378.
1365
tGSConnection
tGSConnection
Provides the authentication information for making requests to the Google Cloud Storage system and
enables the reuse of the connection it creates to Google Cloud Storage.
tGSConnection Standard properties

These properties are used to configure tGSConnection running in the Standard Job framework.
The Standard tGSConnection component belongs to the Big Data and the Cloud families.
Basic settings
settings.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
1366
tGSConnection
Usage
Usage rule This component is generally used with other Google Cloud
Storage components, particularly tGSClose.
Related scenario
For a scenario in which tGSConnection is used, see Managing files with Google Cloud Storage on page
1378.
1367
tGSCopy
tGSCopy
Copies or moves objects within a bucket or between buckets in Google Cloud Storage.
tGSCopy streamlines processes by automating the copy tasks..
tGSCopy Standard properties

These properties are used to configure tGSCopy running in the Standard Job framework.
The Standard tGSCopy component belongs to the Big Data and the Cloud families.
Basic settings
settings.
box.
Source bucket name Specify the name of the bucket from which you want to
copy or move objects.
Source object key Specify the key of the object to be copied.
Source is folder Select this check box if the source object is a folder.
Target bucket name Specify the name of the bucket to which you want to copy
or move objects.
Target folder Specify the target folder to which the objects will be copied
or moved.
Action Select the action that you want to perform on objects from
the list.
• Copy: copies objects from the source bucket or folder
to the target bucket or folder.
1368
tGSCopy
• Move: moves objects from the source bucket or folder

to the target bucket or folder.
Rename Select this check box and in the New name field enter a new
name for the object to be copied or moved.
The Rename check box will not be available if you select
the Source is folder check box.
rows.
Advanced settings
Global Variables
Global Variables SOURCE_BUCKET: the source bucket name. This is an After

SOURCE_OBJECTKEY: the key of a source object. This is an
DESTINATION_BUCKETNAME: the destination bucket name.
DESTINATION_FOLDER: the destination folder. This is an
check box.
use from it.
User Guide.
Usage
Related scenario
For a scenario in which tGSCopy is used, see Managing files with Google Cloud Storage on page 1378.
1369
tGSDelete
tGSDelete
Deletes the objects which match the specified criteria in Google Cloud Storage so as to release the
occupied resources.
tGSDelete Standard properties

These properties are used to configure tGSDelete running in the Standard Job framework.
The Standard tGSDelete component belongs to the Big Data and the Cloud families.
Basic settings
settings.
box.
Key prefix Specify the prefix to delete only objects whose keys begin
with the specified prefix.
Delimiter Specify the delimiter in order to delete only those objects

with key names up to the delimiter.
Specify project ID Select this check box and in the Project ID field enter the
project ID from which you want to delete objects.
Delete object from bucket list Select this check box and complete the Bucket table to
delete objects in the specified buckets.
• Bucket name: type in the name of the bucket from
which you want to delete objects.
• Key prefix: type in the prefix to delete objects whose
keys begin with the specified prefix in the specified
bucket.
• Delimiter: type in the delimiter to delete those objects
with key names up to the delimiter in the specified
bucket.
1370
tGSDelete
If you select the Delete object from bucket list check box,
the Key prefix and Delimiter fields as well as the Specify
project ID check box will not be available.
rows.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component can be used together with the tGSList
component to check if the objects which match the
specified criteria are deleted successfully.
Related scenario
For a scenario in which tGSDelete is used, see Managing files with Google Cloud Storage on page
1378.
1371
tGSGet
tGSGet
Retrieves objects which match the specified criteria from Google Cloud Storage and outputs them to a
local directory.
tGSGet Standard properties

These properties are used to configure tGSGet running in the Standard Job framework.
The Standard tGSGet component belongs to the Big Data and the Cloud families.
Basic settings
settings.
box.
Key prefix Specify the prefix to download only objects which keys
begin with the specified prefix.
Delimiter Specify the delimiter in order to download only those

objects with key names up to the delimiter.
project ID from which you want to obtain objects.
Use keys Select this check box and complete the Keys table to define
the criteria for objects to be downloaded from Google Cloud
Storage.
which you want to download objects.
• Key: type in the key of the object to be downloaded.
• New name: type in a new name for the object to be
downloaded.
If you select the Use keys check box, the Key prefix and
Delimiter fields as well as the Specify project ID check box
1372
tGSGet
and the Get files from bucket list check box will not be
available.
Get files from bucket list Select this check box and complete the Bucket table to
define the criteria for objects to be downloaded from
Google Cloud Storage.
which you want to download objects.
• Key prefix: type in the prefix to download objects
whose keys start with the specified prefix from the
specified bucket.
• Delimiter: specify the delimiter to download those
objects with key names up to the delimiter from the
specified bucket.
If you select the Get files from bucket list check box, the
Key prefix and Delimiter fields as well as the Specify project
ID check box and the Use keys check box will not be avai
lable.
Output directory Specify the directory where you want to store the
downloaded objects.
rows.
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is usually used together with other Google
Cloud Storage components, particularly tGSPut.
1373
tGSGet
Related scenarios
1374
tGSList
tGSList
Retrieves a list of objects from Google Cloud Storage one by one.
tGSList iterates on a list of objects which match the specified criteria in Google Cloud Storage.
tGSList Standard properties

These properties are used to configure tGSList running in the Standard Job framework.
The Standard tGSList component belongs to the Big Data and the Cloud families.
Basic settings
settings.
box.
Key prefix Specify the key prefix so that only the objects whose keys
begin with the specified string will be listed.
Delimiter Specify the delimiter in order to list only those objects with
key names up to the delimiter.
project ID from which you want to retrieve a list of objects.
List objects in bucket list Select this check box and complete the Bucket table to
retrieve objects in the specified buckets.
which you want to retrieve objects.
• Key prefix: type in the prefix to list only objects whose
keys begin with the specified string in the specified
bucket.
• Delimiter: type in the delimiter to list only those
objects with key names up to the delimiter.
1375
tGSList
If you select the List objects in bucket list check box, the
Key prefix and Delimiter fields as well as the Specify project
ID check box will not be available.
Advanced settings
Global Variables
Global Variables CURRENT_BUCKET: the current bucket name. This is a Flow

CURRENT_KEY: the current key. This is a Flow variable and
use from it.
User Guide.
Usage
Usage rule The tGSList component can be used as a standalone

component or as a start component of a process.
Related scenario
For a scenario in which tGSList is used, see Managing files with Google Cloud Storage on page 1378
1376
tGSPut
tGSPut
Uploads files from a local directory to Google Cloud Storage so that you can manage them with
Google Cloud Storage.
tGSPut Standard properties

These properties are used to configure tGSPut running in the Standard Job framework.
The Standard tGSPut component belongs to the Big Data and the Cloud families.
Basic settings
settings.
box.
Bucket name Type in the name of the bucket into which you want to
upload files.
Local directory Type in the full path of or browse to the local directory
where the files to be uploaded are located.
Google Storage directory Type in the Google Storage directory to which you want to
upload files.
Use files list Select this check box and complete the Files table.
• Filemask: enter the filename or filemask using
• New name: enter a new name for the file after being
uploaded.
rows.
1377
tGSPut
Advanced settings
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component can be used together with other

components, particularly the tGSGet component.
Managing files with Google Cloud Storage

The scenario describes a Job which uploads files from the local directory to a bucket in Google Cloud
Storage, then performs copy, move and delete operations on those files, and finally lists and displays
the files in relevant buckets on the console.
1378
tGSPut
Prerequisites: You have purchased a Google Cloud Storage account and created three buckets under
the same Google Storage directory. In this example, the buckets created are bighouse, bed_room, and
study_room.

About this task
To design the Job, proceed as follows:
Procedure
1. Drop the following components from the Palatte to design the workspace: one tGSConnection
component, one tGSPut component, two tGSCopy components, one tGSDelete component, one
1379
tGSPut
tGSList component, one tIterateToFlow component, one tLogRow component and one tGSClose
component.
2. Connect tGSConnection to tGSPut using a Trigger > On Subjob Ok link.
3. Connect tGSPut to the first tGSCopy using a Trigger > On Subjob Ok link.
4. Do the same to connect the first tGSCopy to the second tGSCopy, connect the second tGSCopy to
tGSDelete, connect tGSDelete to tGSList, and connect tGSList to tGSClose.
5. Connect tGSList to tIterateToFlow using a Row > Iterate link.
6. Connect tIterateToFlow to tLogRow using a Row > Main link.

Opening a connection to Google Cloud Storage
Procedure
1. Double-click the tGSConnection component to open its Basic settings view in the Component tab.
2. Navigate to the Google APIs Console in your web browser to access the Google project hosting
the Cloud Storage services you need to use.
3. Click Google Cloud Storage > Interoperable Access to open its view, and copy the access key and
secret key.
4. In the Component view of the Studio, paste the access key and secret key to the corresponding
fields respectively.
Uploading files to Google Cloud Storage
Procedure
1. Double-click the tGSPut component to open its Basic settings view in the Component tab.
2. Select the Use an existing connection check box and then select the connection you have
configured earlier.
3. In the Bucket name field, enter the name of the bucket into which you want to upload files. In this
example, bighouse.
4. In the Local directory field, browse to the directory from which the files will be uploaded, D:/Input/
House in this example.
1380
tGSPut
The files under this directory are shown below:
Copying all files from one bucket to another bucket
Procedure
1. Double-click the first tGSCopy component to open its Basic settings view in the Component tab.
configured earlier.
3. In the Source bucket name field, enter the name of the bucket from which you want to copy files,
bighouse in this example.
4. Select the Source is a folder check box. All files from the bucket bighouse will be copied.
5. In the Target bucket name field, enter the name of the bucket into which you want to copy files,
bed_room in this example.
6. Select Copy from the Action list.
Moving a file from one bucket to another bucket and renaming it
Procedure
1. Double-click the second tGSCopy component to open its Basic settings view in the Component
tab.
1381
tGSPut
configured earlier.
3. In the Source bucket name field, enter the name of the bucket from which you want to move files,
bighouse in this example.
4. In the Source object key field, enter the key of the object to be moved, computer_01.txt in this
example.
5. In the Target bucket name field, enter the name of the bucket into which you want to move files,
study_room in this example.
6. Select Move from the Action list. The specified source file computer_01.txt will be moved from the
bucket bighouse to study_room.
7. Select the Rename check box. In the New name field, enter a new name for the moved file. In this
example, the new name is laptop.txt.
Deleting a file in one bucket
Procedure
1. Double-click the tGSDelete component to open its Basic settings view in the Component tab.
configured earlier.
1382
tGSPut
3. Select the Delete object from bucket list check box. Fill in the Bucket table with the file
information that you want to delete.
In this example, the file computer_03.csv will be deleted from the bucket bed_room whose files are
copied from the bucket bighouse.
Listing all files in the three buckets
Procedure
1. Double-click the tGSList component to open its Basic settings view in the Component tab.
configured earlier.
3. Select the List objects in bucket list check box. In the Bucket table, enter the name of the three
buckets in the Bucket name column, bighouse, study_room, and bed_room.
4. Double-click the tIterateToFlow component to open its Basic settings view in the Component tab.
5. Click Edit schema to define the data to pass on to tLogRow.

In this example, add two columns bucketName and key, and set their types to Object.
1383
tGSPut
6. The Mapping table will be populated with the defined columns automatically.
In the Value column, enter globalMap.get("tGSList_2_CURRENT_BUCKET") for the bucketName
column and globalMap.get("tGSList_2_CURRENT_KEY") for the key column. You can also press Ctrl +
Space and then choose the appopriate variable.
7. Double-click the tLogRow component to open its Basic settings view in the Component tab.
8. Select Table (print values in cells of a table) for a better view of the results.
Closing the connection to Google Cloud Storage
Procedure
1. Double-click the tGSClose component to open its Basic settings view in the Component tab.
2. Select the connection you want to close from the Component List.

Procedure
1384
tGSPut
The files in the three buckets are displayed. As expected, at first, the files from the bucket
bighouse are copied to the bucket bed_room, then the file computer_01.txt from the bucket
bighouse is moved to the bucket study_room and renamed to be laptop.txt, finally the file
computer_03.csv is deleted from the bucket bed_room.
1385
tHashInput
tHashInput
Reads from the cache memory data loaded by tHashOutput to offer high-speed data feed, facilitating
transactions involving a large amount of data.
The components of the Technical family are normally hidden from the Palette by default. For more
information about how to show them on the Palette, see Talend Studio User Guide.
tHashInput Standard properties

These properties are used to configure tHashInput running in the Standard Job framework.
The Standard tHashInput component belongs to the Technical family.
Basic settings
in the Repository.
available:
only.

component only. Related topic: see the Talend Studio User
Guide.

Repository, hence can be reused. Related topic: see the
Link with a tHashOutput Select this check box to connect to a tHashOutput

component. It is always selected by default.
Component list Drop-down list of available tHashOutput components.
Clear cache after reading Select this check box to clear the cache after reading the
data loaded by a certain tHashOutput component. This way,
the following tHashInput components, if any, will not be
able to read the cached data loaded by that tHashOutput
component.
1386
tHashInput
Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is used along with tHashOutput. It reads

from the cache memory data loaded by tHashOutput.
Together, these twin components offer high-speed data
access to facilitate transactions involving a massive amount
of data.
Reading data from the cache memory for high-speed data

access
The following Job reads from the cache memory a huge amount of data loaded by two tHashOutput
components and pass it to a tFileOutputDelimited. The goal of this scenario is to show the speed
at which mass data is read and written. In practice, data feed generated in this way can be used as
lookup table input for some use cases where a big amount of data needs to be referenced.

Procedure
1. Drag and drop the following components from the Palette to the workspace: tFixedFlowInput (X2),
tHashOutput (X2), tHashInput and tFileOutputDelimited.
2. Connect the first tFixedFlowInput to the first tHashOutput using a Row > Main link.
3. Connect the second tFixedFlowInput to the second tHashOutput using a Row > Main link.
4. Connect the first subJob (from tFixedFlowInput_1) to the second subJob (to tFixedFlowInput_2)
using an OnSubjobOk link.
5. Connect tHashInput to tFileOutputDelimited using a Row > Main link.
1387
tHashInput
6. Connect the second subJob to the last subJob using an OnSubjobOk link.

Configuring data inputs and hash cache
Procedure
1. Double-click the first tFixedFlowInput component to display its Basic settings view.
2. Select Built-In from the Schema drop-down list.
Note:
You can select Repository from the Schema drop-down list to fill in the relevant fields
automatically if the relevant metadata has been stored in the Repository. For more information
about Metadata, see the Talend Studio User Guide.
1388
tHashInput
3. Click Edit schema to define the data structure of the input flow. In this case, the input has two
columns: ID and ID_Insurance, and then click OK to close the dialog box.
4. Fill in the Number of rows field to specify the entries to output, e.g. 50000.
5. Select the Use Single Table check box. In the Values table and in the Value column, assign values
to the columns, e.g. 1 for ID and 3 for ID_Insurance.
6. Perform the same operations for the second tFixedFlowInput component, with the only difference
in the values. That is, 2 for ID and 4 for ID_Insurance in this case.
7. Double-click the first tHashOutput to display its Basic settings view.
8. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema
from the previous component. Select Keep all from the Keys management drop-down list and
keep the Append check box selected.
9. Perform the same operations for the second tHashOutput component, and select the Link with a
tHashOutput check box.
Configuring data retrieval from hash cache and data output
Procedure
1. Double-click tHashInput to display its Basic settings view.
1389
tHashInput
2. Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure,
which is the same as that of tHashOutput.
3. Select tHashOutput_1 from the Component list drop down list.
4. Double-click tFileOutputDelimited to display its Basic settings view.
5. Select Built-In from the Property Type drop-down list. In the File Name field, enter the full path
and name of the file, e.g. "E:/Allr70207V5.0/Talend-All-r70207-V5.0.0NB/workspace/out.csv".
6. Select the Include Header check box and click Sync columns to retrieve the schema from the
previous component.

Procedure
2. Press F6, or click Run on the Run tab to execute the Job.
Results
You can find that mass entries are written and read very rapidly.
1390
tHashInput
Clearing the memory before loading data to it in case an

iterator exists in the same subJob
In this scenario, the usage of the Append option of tHashOutput is demonstrated as it helps remove
repetitive or unwanted data in case an iterator exists in the same subJob as tHashOutput.
To build the Job, do the following:

Procedure
1. Drag and drop the following components from the Palette to the workspace: tLoop,
tFixedFlowInput, tHashOutput, tHashInput and tLogRow.
2. Connect tLoop to tFixedFlowInput using a Row > Iterate link.
3. Connect tFixedFlowInput to tHashOutput using a Row > Main link.
4. Connect tHashInput to tLogRow using a Row > Main link.
5. Connect tLoop to tHashInput using an OnSubjobOk link.

Configuring data input and hash cache
Procedure
1. Double-click the tLoop component to display its Basic settings view.
2. Select For as the loop type. Type in 1, 2 1 in the From, To and Step fields respectively. Keep the
Values are increasing check box selected.
3. Double-click the tFixedFlowInput component to display its Basic settings view.
1391
tHashInput
4. Select Built-In from the Schema drop-down list.
Note:
You can select Repository from the Schema drop-down list to fill in the relevant fields
automatically if the relevant metadata has been stored in the Repository. For more information
about Metadata, see the Talend Studio User Guide.
5. Click Edit schema to define the data structure of the input flow. In this case, the input has one
column: Name.

7. Fill in the Number of rows field to specify the entries to output, for example 1.
8. Select the Use Single Table check box. In the Values table, assign a value to the Name field, e.g.
Marx.
9. Double-click tHashOutput to display its Basic settings view.
1392
tHashInput
from the previous component. Select Keep all from the Keys management drop-down list and
deselect the Append check box.
Configuring data retrieval from hash cache and data output
Procedure
1. Double-click tHashInput to display its Basic settings view.
2. Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure,
which is the same as that of tHashOutput.
3. Select tHashOutput_2 from the Component list drop-down list.
4. Double-click tLogRow to display its Basic settings view.
from the previous component. In the Mode area, select Table (print values in cells of a table).

Procedure
2. Press F6, or click Run on the Run tab to execute the Job.
You can find that only one row was output although two rows were generated by tFixedFlowInpu
t.
1393
tHashInput
1394
tHashOutput
tHashOutput
Loads data to the cache memory to offer high-speed access, facilitating transactions involving a large
amount of data.
It should be noted that loading data will consume a lot of memory to store records for each record
has an overhead. The number of inputted entries also impacts the usage of memory.
The components of the Technical family are normally hidden from the Palette by default. For more
information about how to show them on the Palette, see Talend Studio User Guide.
tHashOutput Standard properties

These properties are used to configure tHashOutput running in the Standard Job framework.
The Standard tHashOutput component belongs to the Technical family.
Basic settings
in the Repository.
available:
only.

Guide.

Repository, hence can be reused. Related topic: see the
Link with a tHashOutput Select this check box to connect to a tHashOutput

component.
1395
tHashOutput
Note:
If multiple tHashOutput components are linked in this
way, the data loaded to the cache by all of them can be
read by a tHashInput component that is linked with any
of them.
Component list Drop-down list of available tHashOutput components.
Data write model Drop-down list of available data write modes.
Keys management Drop-down list of available keys management modes.

• Keep all: writes all the data received to the cache
memory.
• Keep first: writes only the first record to the cache
memory if multiple records received have the same key
value.
Append Selected by default, this option is designed to append data

to the memory in case an iterator exists in the same subJob.
If it is unchecked, tHashOutput will clear the memory before
loading data to it.
Note:
If Link with a tHashOutput is selected, this check box will
be hidden but is always enabled.
Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component writes data to the cache memory and
is closely related to tHashInput. Together, these twin
1396
tHashOutput
components offer high-speed data access to facilitate

transactions involving a massive amount of data.
Related scenarios
• Reading data from the cache memory for high-speed data access on page 1387.
• Clearing the memory before loading data to it in case an iterator exists in the same subJob on
page 1391.
1397
tHBaseClose
tHBaseClose
Closes an HBase connection you have established in your Job.
tHBaseClose closes an active connection to an HBase database.
tHBaseClose Standard properties

These properties are used to configure tHBaseClose running in the Standard Job framework.
The Standard tHBaseClose component belongs to the Big Data and the Databases NoSQL families.
Fabric.
Basic settings
Component list Select the tHBaseConnection component in the list if more

Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is to be used along with HBase

components, especially with tHBaseConnection.
Prerequisites Before starting, ensure that you have met the Loopback IP
prerequisites expected by your database.
The Hadoop distribution must be properly installed, so as to
1398
tHBaseClose

stored in MapR.
Related scenario
For a scenario in which tHBaseClose is used, see Exchanging customer data with HBase on page
1411.
1399
tHBaseConnection
tHBaseConnection
Establishes an HBase connection to be reused by other HBase components in your Job.
tHBaseConnection opens a connection to an HBase database.
tHBaseConnection Standard properties

These properties are used to configure tHBaseConnection running in the Standard Job framework.
The Standard tHBaseConnection component belongs to the Big Data and the Databases NoSQL
families.
Fabric.
Basic settings
Property type Either Built-in or Repository

- Built-in : no property data stored centrally.
- Repository : select the repository file in which the
not provide.
1400
tHBaseConnection

distribution.
issues on your own.
Note:

Hortonworks.
HBase version Select the version of the Hadoop distribution you are using.
you are using.
Hadoop version of the distribution This list is displayed only when you have selected Custom
from the distribution list to connect to a cluster not yet
officially supported by the Studio. In this situation, you need
to select the Hadoop version of this custom cluster, that is
to say, Hadoop 1 or Hadoop 2.
this property.
Inspect the classpath for configurations Select this check box to allow the component to check the
configuration files in the directory you have set with the
$HADOOP_CONF_DIR variable and directly read parameters
from these files in this directory. This feature allows you to
easily change the Hadoop configuration for the component
to switch between different environments, for example,
from a test environment to a production environment.
In this situation, the fields or options used to configure
Hadoop connection and/or Kerberos security are hidden.
1401
tHBaseConnection
If you want to use certain parameters such as the Kerberos

parameters but these parameters are not included in these
Hadoop configuration files, you need to create a file called
talend-site.xml and put this file into the same directory
defined with $HADOOP_CONF_DIR. This talend-site.xml file
should read as follows:

<configuration>
<property>
<name>talend.k
erberos.authentication </name>
<value>kinit </value>
<description> Set the
Kerberos authentication method to
use. Valid values are: kinit or
keytab. </description>
</property>
<property>
<name>talend.k
erberos.keytab.principal </name>
<value>[email protected] </
value>
keytab's principal name. </
description>
</property>
<property>
<name>talend.k
erberos.keytab.path </name>
<value>/kdc/user.keytab </
value>
keytab's path. </description>
</property>
<property>
<name>talend.encryption </
name>
<value>none </value>
encryption method to use. Valid
values are: none or ssl. </
description>
</property>
<property>
<name>talend.s
sl.trustStore.path </name>
<value>ssl </value>
<description> Set
SSL trust store path. </
description>
</property>
<property>
<name>talend.s
sl.trustStore.password </name>
<value>ssl </value>
<description> Set SSL
trust store password. </
description>
</property>
</configuration>
1402
tHBaseConnection
The parameters read from these configuration files override

the default ones used by the Studio. When a parameter
does not exist in these configuration files, the default one is
used.
Use kerberos authentication If the database to be used is running with Kerberos security,
select this check box, then, enter the principal names in the
displayed fields. You should be able to find the information
in the hbase-site.xml file of the cluster to be used.
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
If you need to use a Kerberos keytab file to log in, select
Use a keytab to authenticate. A keytab file contains pairs
of Kerberos principals and encrypted keys. You need to
enter the principal to be used in the Principal field and the
access path to the keytab file itself in the Keytab field. This
keytab file must be stored in the machine in which your Job
actually runs, for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
Advanced settings
Properties If you need to use custom configuration for your HBase,

complete this table with the property or properties to be
customized. Then at runtime, the customized property or
properties will override those corresponding ones defined
earlier for your HBase.
For example, you need to define the value of the
dfs.replication property as 1 for the HBase configuration.
Then you need to add one row to this table using the plus
button and type in the name and the value of this property
in this row.
tStatCatcher Statistics Select this check box to collect the log data at a
component level.
Global Variables

1403
tHBaseConnection

check box.
use from it.
User Guide.
Usage
Usage rule This component is generally used with other HBase

components, particularly tHBaseClose.
stored in MapR.
Related scenario
For a scenario in which tHBaseConnection is used, see Exchanging customer data with HBase on page
1411.
1404
tHBaseInput
tHBaseInput
Reads data from a given HBase database and extracts columns of selection.
HBase is a distributed, column-oriented database that hosts very large, sparsely populated tables on
clusters.
tHBaseInput extracts columns corresponding to schema definition. Then it passes these columns to
the next component via a Main row link.
HBase filters
This table presents the HBase filters available in Talend Studio and the parameters required by those
filters.
Filter type Filter column Filter Filter Filter Filter Objective

family operation value comparator
type
Single Column Yes Yes Yes Yes Yes It compares the values of a given
Value Filter column against the value defined
for the Filter value parameter.
If the filtering condition is met,
all columns of the row will be ret
urned.
Family filter Yes Yes Yes It returns the columns of the

family that meets the filtering
condition.
Qualifier filter Yes Yes Yes It returns the columns whose

column qualifiers match the
filtering condition.
Column prefix Yes Yes It returns all columns of which the

filter qualifiers have the prefix defined
for the Filter column parameter.
Multiple Yes (Multiple Yes It works the same way as a

column prefix prefixes are Column prefix filter does but
filter separated allows specifying multiple
by comma, prefixes.
for example,
id,id_1,id_2)
Column range Yes (The ends Yes It allows intra row scanning and
filter of a range are returns all matching columns of a
separated by scanned row.
comma. )
Row filter Yes Yes Yes It filters on row keys and returns
all rows that matches the filtering
condition.
Value filter Yes Yes Yes It returns only columns that have
a specific value.
1405
tHBaseInput
The use explained above of the listed HBase filters is subject to revisions made by Apache in its
Apache HBase project; therefore, in order to fully understand how to use these HBase filters, we
recommend reading Apache's HBase documentation.
tHBaseInput Standard properties

These properties are used to configure tHBaseInput running in the Standard Job framework.
The Standard tHBaseInput component belongs to the Big Data and the Databases NoSQL families.
Fabric.
Basic settings

are stored.

not provide.
1406
tHBaseInput

distribution.
issues on your own.
Note:

Hortonworks.
you are using.
this property.
1407
tHBaseInput

on the fly.
available:
only.
component only.

Job designs.
Set table Namespace mappings Enter the string to be used to construct the mapping
between an Apache HBase table and a MapR table.
For the valid syntax you can use, see http://doc.mapr.com/
display/MapR40x/Mapping+Table+Namespace+Between
+Apache+HBase+Tables+and+MapR+Tables.
1408
tHBaseInput
Table name Type in the name of the table from which you need to
extract columns.
Define a row selection Select this check box and then in the Start row and the
End row fields, enter the corresponding row keys to specify
the range of the rows you want the current component to
extract.
Different from the filters you can set using Is by filter
requiring the loading of all records before filtering the ones
to be used, this feature allows you to directly select only the
rows to be used.
Mapping Complete this table to map the columns of the table to be

used with the schema columns you have defined for the
data flow to be processed.
an error occurs.
Advanced settings
level.
Properties If you need to use custom configuration for your database,

properties will override the corresponding ones used by the
Studio.
dfs.replication property as 1 for the database configuration.
in this row.
Note:
This table is not available when you are using an
existing connection by selecting the Using an existing
connection check box in the Basic settings view.
Is by filter Select this check box to use filters to perform fine-grained

data selection from your database, such as selection of keys,
or values, based on regular expressions.
Once selecting it, the Filter table that is used to define
filtering conditions becomes available.
This feature leverages filters provided by HBase and subject
to constraints explained in Apache HBase documentation.
Therefore, advanced knowledge of HBase is required to
make full use of these filters.
Logical operation Select the operator you need to use to define the logical
relation between filters. This available operators are:
1409
tHBaseInput
• And: every defined filtering conditions must

be satisfied. It represents the relationship
FilterList.Operator.MUST_PASS_ALL
• Or: at least one of the defined filtering conditions
must be satisfied. It represents the relationship:
FilterList.Operator.MUST_PASS_ONE
Filter Click the button under this table to add as many rows as
required, each row representing a filter. The parameters you
may need to set for a filter are:
• Filter type: the drop-down list presents pre-existing
filter types that are already defined by HBase. Select
the type of the filter you need to use.
• Filter column: enter the column qualifier on which you
need to apply the active filter. This parameter becomes
mandatory depending on the type of the filter and
of the comparator you are using. For example, it is
not used by the Row Filter type but is required by the
Single Column Value Filter type.
• Filter family: enter the column family on which you
need to apply the active filter. This parameter becomes
mandatory depending on the type of the filter and
of the comparator you are using. For example, it is
not used by the Row Filter type but is required by the
Single Column Value Filter type.
• Filter operation: select from the drop-down list the
operation to be used for the active filter.
• Filter Value: enter the value on which you want to use
the operator selected from the Filter operation drop-
down list.
• Filter comparator type: select the type of the
comparator to be combined with the filter you are
using.
Depending on the Filter type you are using, some or each of
the parameters become mandatory. For further information,
see HBase filters on page 1405
Retrieve timestamps Select this check box to load the timestamps of an HBase
column into the data flow.
• Retrieve from an HBase column: select the HBase
column which is tracked for changes in order to
retrieve its corresponding timestamps.
• Write to a schema column: select the column you
have defined in the schema to store the retrieved
timestamps.
The type of this column must be Long.
Global Variables

check box.
1410
tHBaseInput

use from it.
User Guide.
Usage
Usage rule This component is a start component of a Job and always

needs an output link.
stored in MapR.
Exchanging customer data with HBase

In this scenario, a six-component Job is used to exchange customer data with a given HBase.
1411
tHBaseInput
The six components are:

• tHBaseConnection: creates a connection to your HBase database.
• tFixedFlowInput: creates the data to be written into your HBase. In the real use case, this
component could be replaced by the other input components like tFileInputDelimited.
• tHBaseOutput: writes the data it receives from the preceding component into your HBase.
• tHBaseInput: extracts the columns of interest from your HBase.
• tLogRow: presents the execution result.
• tHBaseClose: closes the transaction.
To replicate this scenario, proceed as the following sections illustrate.
Note:
Before starting the replication, your Hbase and Zookeeper service should have been correctly
installed and well configured. This scenario explains only how to use Talend solution to make data
transaction with a given HBase.

About this task
To do this, proceed as follows:
Procedure
1. Drop tHBaseConnection, tFixedFlowInput, tHBaseOutput, tHBaseInput, tLogRow and tHBaseClose
from Palette onto the Design workspace.
2. Right-click tHBaseConnection to open its contextual menu and select the Trigger > On Subjob Ok
link from this menu to connect this component to tFixedFlowInput.
1412
tHBaseInput
3. Do the same to create the OnSubjobOk link from tFixedFlowInput to tHBaseInput and then to
tHBaseClose.
4. Right-click tFixedFlowInput and select the Row > Main link to connect this component to
tHBaseOutput.
5. Do the same to create the Main link from tHBaseInput to tLogrow.
Results
The components to be used in this scenario are all placed and linked. Then you need continue to
configure them sucessively.
Configuring the connection

About this task
To configure the connection to your Zookeeper service and thus to the HBase of interest, proceed as
follows:
Procedure
1. On the Design workspace of your Studio, double-click the tHBaseConnection component to open
its Component view.
2. Select Hortonworks Data Platform 1.0 from the HBase version list.
3. In the Zookeeper quorum field, type in the name or the URL of the Zookeeper service you are
using. In this example, the name of the service in use is hbase.
4. In the Zookeeper client port field, type in the number of client listening port. In this example, it is
2181.
5. If the Zookeeper znode parent location has been defined in the Hadoop cluster you are
connecting to, you need to select the Set zookeeper znode parent check box and enter the value
of this property in the field that is displayed.
Configuring the process of writing data into the HBase

About this task
To do this, proceed as follows:
1413
tHBaseInput
Procedure
1. On the Design workspace, double-click the tFixedFlowInput component to open its Component
view.
2. In this view, click the three-dot button next to Edit schema to open the schema editor.
3. Click the plus button three times to add three rows and in the Column column, rename the three
rows respectively as: id, name and age.
4. In the Type column, click each of these rows and from the drop-down list, select the data type of
every row. In this scenario, they are Integer for id and age, String for name.
box.
6. In the Mode area, select the Use Inline Content (delimited file) to display the fields for editing.
1414
tHBaseInput
7. In the Content field, type in the delimited data to be written into the HBase, separated with the
semicolon ";". In this example, they are:
1;Albert;23
2;Alexandre;24
3;Alfred-Hubert;22
4;Andre;40
5;Didier;28
6;Anthony;35
7;Artus;32
8;Catherine;34
9;Charles;21
10;Christophe;36
11;Christian;67
12;Danniel;54
13;Elisabeth;58
14;Emile;32
15;Gregory;30
8. Double-click tHBaseOutput to open its Component view.
Note: If this component does not have the same schema of the preceding component, a
warning icon appears. In this case, click the Sync columns button to retrieve the schema from
the preceding one and once done, the warning icon disappears.
configured earlier. In this example, it is tHBaseConnection_1.
10. In the Table name field, type in the name of the table to be created in the HBase. In this example,
it is customer.
11. In the Action on table field, select the action of interest from the drop-down list. In this scenario,
select Drop table if exists and create. This way, if a table named customer exists already in the
HBase, it will be disabled and deleted before creating this current table.
12. Click the Advanced settings tab to open the corresponding view.
1415
tHBaseInput
13. In the Family parameters table, add two rows by clicking the plus button, rename them as family1
and family2 respectively and then leave the other columns empty. These two column families will
be created in the HBase using the default family performance options.
Note: The Family parameters table is available only when the action you have selected in the
Action on table field is to create a table in HBase. For further information about this Family
parameters table, see tHBaseOutput on page 1419.
14. In the Families table of the Basic settings view, enter the family names in the Family name
column, each corresponding to the column this family contains. In this example, the id and the
age columns belong to family1 and the name column to family2.
Note: These column families should already exist in the HBase to be connected to; if not, you
need to define them in the Family parameters table of the Advanced settings view for creating
them at runtime.
Configuring the process of extracting data from the HBase

About this task
To do this, perform the following operations:
Procedure
1. Double-click tHBaseInput to open its Component view.
1416
tHBaseInput
configured earlier. In this example, it is tHBaseConnection_1.
3. Click the three-dot button next to Edit schema to open the schema editor.
4. Click the plus button three times to add three rows and rename them as id, name and age
respectively in the Column column. This means that you extract these three columns from the
HBase.
5. Select the types for each of the three columns. In this example, Integer for id and age, String for
name.
box.
7. In the Table name field, type in the table from which you extract the columns of interest. In this
scenario, the table is customer.
8. In the Mapping table, the Column column has been already filled automatically since the schema
was defined, so simply enter the name of every family in the Column family column, each
corresponding to the column it contains.
9. Double-click tHBaseClose to open its Component view.
1417
tHBaseInput
10. In the Component List field, select the connection you need to close. In this example, this
connection is tHBaseConnection_1.
Executing the Job

To execute this Job, press F6.
These columns of interest are extracted and you can process them according to your needs.
Login to your HBase database, you can check the customer table this Job has created.
1418
tHBaseOutput
tHBaseOutput
Writes columns of data into a given HBase database.
tHBaseOutput receives data from its preceding component, creates a table in a given HBase database
and writes the received data into this HBase table.
tHBaseOutput Standard properties

These properties are used to configure tHBaseOutput running in the Standard Job framework.
The Standard tHBaseOutput component belongs to the Big Data and the Databases NoSQL families.
Fabric.
Basic settings

are stored.

1419
tHBaseOutput

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
you are using.
this property.
1420
tHBaseOutput
on the fly.
available:
only.
component only.

Job designs.
1421
tHBaseOutput

alend.com).
Set table Namespace mappings Enter the string to be used to construct the mapping
between an Apache HBase table and a MapR table.
For the valid syntax you can use, see http://doc.mapr.com/
display/MapR40x/Mapping+Table+Namespace+Between
+Apache+HBase+Tables+and+MapR+Tables.
Table name Type in the name of the HBase table you need create.
Action on table Select the action you need to take for creating an HBase
table.
Custom Row Key Select this check box to use the customized row keys. Once
selected, the corresponding field appears. Then type in the
user-defined row key to index the rows of the HBase table
being created.
For example, you can type in "France"+Numer
ic.sequence("s1",1,1) to produce the row key series:
France1, France2, France3 and so on.
Families Complete this table to map the columns of the table to be

used with the schema columns you have defined for the
data flow to be processed.
The Column column of this table is automatically filled
once you have defined the schema; in the Family name
column, enter the column families you want to create
or use to group the columns in the Column column. For
further information about a column family, see Apache
documentation at Column families.
Custom timestamp column Select a Long column from your schema to provide
timestamps for the HBase columns to be created or updated
by tHBaseOutput.
rows.
Advanced settings
Use batch mode Select this check box to activate the batch mode for data
processing.
Batch size Specify the number of records to be processed in each

batch.
is selected.
Properties If you need to use custom configuration for your database,

properties will override the corresponding ones used by the
Studio.
1422
tHBaseOutput

dfs.replication property as 1 for the database configuration.
in this row.
Note:
This table is not available when you are using an
existing connection by selecting the Using an existing
connection check box in the Basic settings view.
level.
Family parameters Type in the names and, when needs be, the custom
performance options of the column families to be created.
These options are all attributes defined by the HBase data
model, so for further explanation about these options, see
Apache's HBase documentation.
Note: The parameter Compression type allows you to

select the format for output data compression.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is normally an end component of a Job and

always needs an input link.
1423
tHBaseOutput

stored in MapR.
Related scenario
For related scenario to the Standard version of tHBaseOutput, see Exchanging customer data with
HBase on page 1411.
1424
tHCatalogInput
tHCatalogInput
Reads data from an HCatalog managed Hive database and send data to the component that follows.
The tHCatalogInput component reads data from the specified HCatalog managed database and sends
data in the data flow to the console or to a specified local file by connecting this component to a
proper component.
tHCatalogInput Standard properties

These properties are used to configure tHCatalogInput running in the Standard Job framework.
The Standard tHCatalogInput component belongs to the Big Data family.
Fabric.
Basic settings

available:
only.

Guide.

1425
tHCatalogInput

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
HCatalog version Select the version of the Hadoop distribution you are using.
you are using.
1426
tHCatalogInput
Templeton hostname Fill this field with the URL of Templeton Webservice.
Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.
Templeton port Fill this field with the port of URL of Templeton Webservice.
By default, this value is 50111.
Note:
+UsingWebHCat.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
on the fly.
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
1427
tHCatalogInput

Database The database in which the HCatalog managed tables are

placed. By default, this database is the Hive one named
default.
Table Fill this field to operate on one or multiple tables in the

specified database.
Partition Fill this field to specify one or more partitions for the
partition operation on a specified table. When you specify
multiple partitions, use commas to separate every two
partitions and use double quotation marks to quote the
partition string.
If you are reading a non-partitioned table, leave this field
empty.
Note:
For further information about Partition, see https://cwiki.
apache.org/Hive/.
Username Fill this field with the username for the Hive database
authentication.
rows.
Advanced settings

Custom encoding Select the encoding from the list or select Custom and
docs.oracle.com.
1428
tHCatalogInput

Retrieve the HCatalog logs Select this check box to retrieve log files generated during
HCatalog operations.
Standard Output Folder Fill this field with the path to which log files are stored.
Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.
Error Output Folder Fill this field with the path to which error log files are
stored.
Note:
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is commonly used as the starting

component in a Job.
HCatalog is built on top of the Hive metastore to provide
read and write interface for Pig and MapReduce, so that the
latter systems can use the metadata of Hive to easily read
and write data in HDFS.
1429
tHCatalogInput
For further information, see Apache documentation about

HCatalog: https://cwiki.apache.org/confluence/display/Hive/
HCatalog.

stored in MapR.
Limitation When Use kerberos authentication is selected, the

component cannot work with IBM JVM.
Related scenario
For a related scenario, see Managing HCatalog tables on Hortonworks Data Platform on page 1444.
1430
tHCatalogLoad
tHCatalogLoad
Reads data directly from HDFS and writes this data into an established HCatalog managed table.
tHCatalogLoad Standard properties

These properties are used to configure tHCatalogLoad running in the Standard Job framework.
The Standard tHCatalogLoad component belongs to the Big Data family.
Fabric.
Basic settings

not provide.
1431
tHCatalogLoad

distribution.
issues on your own.
Note:

Hortonworks.
you are using.
Note:
+UsingWebHCat.
Note:
+UsingWebHCat.
1432
tHCatalogLoad

on the fly.
a Talend Jobserver.
Database Enter the name of the database you need to write data in.
This database must already exist.
Table Enter the name of the table you need to write data in. This
table must already exist.
Partition Fill this field to specify one or more partitions for the pa
rtition operation on the specified table. When you specify
partition string.
empty.
Note:
apache.org/Hive/.
Username Fill this field with the username for the DB authentication.
File location Enter the absolute path pointing to the HDFS location from
which data is read.
rows.
1433
tHCatalogLoad
Advanced settings
Standard Output Folder Fill this field with the path to which log files are stored.
Note:
Error Output Folder Fill this field with the path to which error log files are
stored.
Note:
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component can be used in a single-component subJob.

HCatalog.

1434
tHCatalogLoad

stored in MapR.

Related scenario
1435
tHCatalogOperation
tHCatalogOperation
Prepares the HCatalog managed database/table/partition to be processed.
tHCatalogOperation manages the data stored in HCatalog managed Hive database/table/partition.
tHCatalogOperation Standard properties

These properties are used to configure tHCatalogOperation running in the Standard Job framework.
The Standard tHCatalogOperation component belongs to the Big Data family.
Fabric.
Basic settings

not provide.
1436
tHCatalogOperation

distribution.
issues on your own.
Note:

Hortonworks.
you are using.
Note:
+UsingWebHCat.
By default, the value for this field is 50111.
Note:
+UsingWebHCat.
1437
tHCatalogOperation

on the fly.
a Talend Jobserver.
Operation on Select an object from the list for the DB operation as

follows:
Database: The HCatalog managed database in HDFS.
Table: The HCatalog managed table in HDFS.
Partition: The partition specified by the user.
Operation Select an action from the list for the DB operation. For
further information about the DB operation in HDFS, see
https://cwiki.apache.org/Hive/.
Create the table only it doesn't exist already Select this check box to avoid creating duplicate table when
you create a table.
Note:
This check box is enabled only when you have selected
Table from the Operation on list.
Database Fill this field with the name of the database in which the
HCatalog managed tables are placed.
Table Fill this field to operate on one or multiple tables in a

database or on a specified HDFS location.
1438
tHCatalogOperation
Note:
This field is enabled only when you have selected Table
from the Operation on list. For further information about
the operation on Table, see https://cwiki.apache.org/Hiv
e/.
Partition Fill this field to specify one or more partitions for the
partition operation on a specified table. When you specify
multiple partitions, use comma to separate every two
partition string.
empty.
Note:
This field is enabled only when you select Partition from
the Operation on list. For further information about the
operation on Partition, see https://cwiki.apache.org/Hiv
e/.
Database location Fill this field with the location of the database file in HDFS.
Note:
This field is enabled only when you select Database from
the Operation on list.
Database description The description for the database to be created.
Note:
This field is enabled only when you select Database from
the Operation on list.
Create an external table Select this field to create an external table in an alternative
path defined in the Set HDFS location field in the Advanced
settings view. For further information about creating
external table, see https://cwiki.apache.org/Hive/.
Note:
This check box is enabled only when you select Table
from the Operation on list and Create/Drop and create/
Drop if exist and create from the Operation list.
Format Select a file format from the list to specify the format of the
external table you want to create:
TEXTFILE: Plain text files.
RCFILE: Record Columnar files. For further information
about RCFILE, see https://cwiki.apache.org/confluence/
display/Hive/RCFile.
1439
tHCatalogOperation
Note:
RCFILE is only available starting with Hive 0.6.0. This
list is enabled only when you select Table from the
Operation on list and Create/Drop and create/Drop if
exist and create from the Operation list.
Set partitions Select this check box to set the partition schema by clicking
the Edit schema to the right of Set partitions check box.
The partition schema is either built-in or remote in the
Repository.
Note:
This check box is enabled only when you select
Table from the Operation on list and Create/Drop and
create/Drop if exist and create from the Operation list.
You must follow the rules of using partition schema in
HCatalog managed tables. For more information about
the rules in using partition schema, see https://cwiki.
apache.org/confluence/display/Hive/HCatalog.

Guide.

Set the user group to use Select this check box to specify the user group.
Note:
Drop/Drop if exist/Drop and create/Drop if exist and
create from the Operation list. By default, the value for
this field is root. For more information about the user
group in the server, contact your system administrator.
Option Select a clause when you drop a database.
Note:
This list is enabled only when you select Database from
the Operation on list and Drop/Drop if exist/Drop and
For more information about Drop operation on database,
see https://cwiki.apache.org/Hive/.
Set the permissions to use Select this check box to specify the permissions needed by
the operation you select from the Operation list.
1440
tHCatalogOperation
Note:
Drop/Drop if exist/Drop and create/Drop if exist and
create from the Operation list. By default, the value for
this field is rwxrw-r-x. For more information on user
permissions, contact your system administrator.
Set File location Enter the directory in which partitioned data is stored.
Note:
Partition from the Operation on list and Create/Drop and
For further information about storing partitioned data in
HDFS, see https://cwiki.apache.org/Hive/.
rows.
Advanced settings
Comment Fill this field with the comment for the table you want to
create.
Note:
This field is enabled only when you select Table from
the Operation on list and Create/Drop and create/Drop
if exist and create from the Operation list in the Basic
settings view.
Set HDFS location Select this check box to specify an HDFS location to which
the table you want to create is saved. Deselect it to save the
table you want to create in the warehouse directory defined
in the key hive.metastore.warehouse.dir in Hive configuration
file hive-site.xml.
Note:
create/Drop if exist and create from the Operation list
in the Basic settings view. For further information about
saving data in HDFS, see https://cwiki.apache.org/Hive/.
Set row format(terminated by) Select this check box to use and define the row formats
when you want to create a table:
Field: Select this check box to use Field as the row format.
The default value for this field is "\u0001". You can also
specify a customized char in this field.
Collection Item: Select this check box to use Collection
Item as the row format. The default value for this field is
"\u0002". You can also specify a customized char in this
field.
1441
tHCatalogOperation
Map Key: Select this check box to use Map Key as the row
format. The default value for this field is "\u0003". You can
also specify a customized char in this field.
Line: Select this check box to use Line as the row format.
The default value for this field is "\n". You can also specify a
customized char in this field.
Note:
create/Drop if exist and create from the Operation list
in the Basic settings view. For further information about
row formats in the HCatalog managed table, see https://
cwiki.apache.org/Hive/.
Properties Click [+] to add one or more lines to define table properties.
The table properties allow you to tag the table definition
with your own metadata key/value pairs. Make sure that
values in both Key row and Value row must be quoted in
Note:
This table is enabled only when you select
Database/Table from the Operation on list and
Create/Drop and create/Drop if exist and create from
the Operation list in the Basic settings view. For further
information about table properties, see https://cwiki.
apache.org/Hive/.
Standard Output Folder Browse to, or enter the directory where the log files are
stored.
Note:
Error Output Folder Browse to, or enter the directory where the error log files
are stored.
Note:
Global Variables

1442
tHCatalogOperation

check box.
use from it.
User Guide.
Usage
Usage rule This component is commonly used in a single-component

subJob.
HCatalog.

stored in MapR.

1443
tHCatalogOperation
Managing HCatalog tables on Hortonworks Data Platform

This scenario describes a six-component Job that includes the common operations for the HCatalog
table management on Hortonworks Data Platform. Sub-sections in this scenario covers DB operations
including:
• Creating a table to the database in HDFS;
• Writing data to the HCatalog managed table;
• Writing data to the partitioned table using tHCatalogLoad;
• Reading data from the HCatalog managed table;
• Outputting the data read from the table in HDFS.
Note:
Knowledge of Hive Data Definition Language and HCatalog Data Definition Language is required.
For further information about Hive Data Definition Language, see https://cwiki.apache.org/
confluence/display/Hive/LanguageManual+DDL. For further information about HCatalog Data
Definition Language, see https://cwiki.apache.org/confluence/display/HCATALOG/Design
+Document+-+Java+APIs+for+HCatalog+DDL+Commands.
Setting up the Job

Procedure
1. Drop the following components from the Palette to the design workspace: tHCatalogOperation,
tHCatalogLoad, tHCatalogInput, tHCatalogOutput, tFixedFlowInput, and tFileOutputDelimited.
2. Right-click tHCatalogOperation to connect it to tFixedFlowInput component using a

Trigger>OnSubjobOk connection.
1444
tHCatalogOperation
3. Right-click tFixedFlowInput to connect it to tHCatalogOutput using a Row > Main connection.

4. Right-click tFixedFlowInput to connect it to tHCatalogLoad using a Trigger > OnSubjobOk
connection.
5. Right-click tHCatalogLoad to connect it to the tHCatalogInput component using a Trigger >
6. Right-click tHCatalogInput to connect it to tFileOutputDelimited using a Row > Main connection.
Creating a table in HDFS

Procedure
1. Double-click tHCatalogOperation to open its Basic settings view.
2. Click Edit schema to define the schema for the table to be created.
1445
tHCatalogOperation
3. Click [+] to add at least one column to the schema and click OK when you finish setting the
schema. In this scenario, the columns added to the schema are: name, country and age.
4. Fill the Templeton hostname field with URL of the Templeton webservice you are using. In this
scenario, fill this field with "192.168.0.131".
5. Fill the Templeton port field with the port for Templeton hostname. By default, the value for this
field is "50111"
6. Select Table from the Operation on list and Drop if exist and create from the Operation list to
create a table in HDFS.
7. Fill the Database field with an existing database name in HDFS. In this scenario, the database
name is "talend".
8. Fill the Table field with the name of the table to be created. In this scenario, the table name is
"Customer".
9. Fill the Username field with the username for the DB authentication.
10. Select the Set the user group to use check box to specify the user group. The default user group is
"root", you need to specify the value for this field according to real practice.
11. Select the Set the permissions to use check box to specify the user permission. The default value
for this field is "rwxrwxr-x".
12. Select the Set partitions check box to enable the partition schema.
13. Click the Edit schema button next to the Set partitions check box to define the partition schema.
14. Click [+] to add one column to the schema and click OK when you finish setting the schema. In
this scenario, the column added to the partition schema is: match_age.
Writing data to the existing table

Procedure
2. Click Edit schema to define a same schema as the one you defined in tHCatalogOperation.
3. Fill the Number of rows field with integer 8.
1446
tHCatalogOperation
4. Select Use Inline Table in the Mode area.

5. Click [+] to add new lines in the inline table.
6. Double-click tHCatalogOutput to open its Basic settings view.
8. Fill the NameNode URI field with the URI to the NameNode. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber; WebHDFS with SSL is not supported yet.
9. Fill the File name field with the HDFS location of the file you write data to. In this scenario, the
file location is "/user/hdp/Customer/Customer.csv".
10. Select Overwrite from the Action list.
11. Fill the Templeton hostname field with URL of the Templeton webservice you are using. In this
scenario, fill this field with "192.168.0.131".
12. Fill the Templeton port field with the port for Templeton hostname. By default, the value for this
field is "50111"
13. Fill the Database field, the Table field, the Username field with the same value you specified in
tHCatalogOperation.
14. Fill the Partition field with "match_age=27".
15. Fill the File location field with the HDFS location to which the table will be saved. In this
example, use "hdfs://192.168.0.131:8020/user/hdp/Customer".
Writing data to the partitioned table using tHCatalogLoad

Procedure
1. Double-click tHCatalogLoad to open its Basic settings view.
1447
tHCatalogOperation

3. Do the rest of the settings in the same way as configuring tHCatalogOperation.
Reading data from the table in HDFS

Procedure
1. Double-click tHCatalogInput to open its Basic settings view.
2. Click Edit schema to define the schema of the table to be read from the database.
1448
tHCatalogOperation
3. Click [+] to add at least one column to the schema. In this scenario, the columns added to the
schema are age and name.
5. Do the rest of the settings in the same way as configuring tHCatalogOperation.
Outputting the data read from the table in HDFS to the console
Procedure
3. Select Table from the Mode area.
Job execution
Press CTRL+S to save your Job and F6 to execute it.
1449
tHCatalogOperation
The data of the restricted table read from the HDFS is displayed onto the console.
Type in http://talend-hdp:50075/browseDirectory.jsp?dir=/user/hdp/Customer&namenodeInfoPort=50070
to the address bar of your browser to view the table you created:
1450
tHCatalogOperation
Click the Customer.csv link to view the content of the table you created.
1451
tHCatalogOperation
1452
tHCatalogOutput
tHCatalogOutput
Receives data from its incoming flow and writes this data into an HCatalog managed table.
tHCatalogOutput Standard properties

These properties are used to configure tHCatalogOutput running in the Standard Job framework.
The Standard tHCatalogOutput component belongs to the Big Data family.
Fabric.
Basic settings

available:
only.
component only.

Job designs.
alend.com).
1453
tHCatalogOutput
not provide.
distribution.
issues on your own.
Note:
1454
tHCatalogOutput

Hortonworks.
you are using.
on the fly.
a Talend Jobserver.
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
File name Browse to, or enter the location of the file which you write
data to. This file is created automatically if it does not exist.
Action Select a DB operation in HDFS:

Create: Creates a file with data using the file name defined
in the File Name field.
1455
tHCatalogOutput
Overwrite: Overwrites the data in the file specified in the

File Name field.
Append: Inserts the data into the file specified in the File
Name field. The specified file is created automatically if it
does not exist.
Note:
+UsingWebHCat.
Note:
+UsingWebHCat.
Database Fill this field to specify an existing database in HDFS.
Table Fill this field to specify an existing table in HDFS.
Partition Fill this field to specify one or more partitions for the pa
rtition operation on the specified table. When you specify
partition string.
empty.
Note:
apache.org/Hive/.
File location Fill this field with the path to which source data file is sto
red.
rows.
1456
tHCatalogOutput
Advanced settings

Custom encoding Select the encoding from the list or select Custom and
docs.oracle.com.
Standard Output Folder Browse to, or enter the directory where the log files are st
ored.
Note:
Error Output Folder Browse to, or enter the directory where the error log files
are stored.
Note:
1457
tHCatalogOutput
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is commonly used together with an input

component.
HCatalog.

1458
tHCatalogOutput

stored in MapR.
Related scenario
1459
tHDFSCompare
tHDFSCompare
Compares two files in HDFS and based on the read-only schema, generates a row flow that presents
the comparison information.
tHDFSCompare helps to control the quality of the data processed.
tHDFSCompare Standard properties

These properties are used to configure tHDFSCompare running in the Standard Job framework.
The Standard tHDFSCompare component belongs to the Big Data and the File families.
Fabric.
Basic settings

not provide.
1460
tHDFSCompare

distribution.
issues on your own.
Note:

Hortonworks.
Version Select the version of the Hadoop distribution you are using.
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
1461
tHDFSCompare
authentication key generated upon the registration of

the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.
on the fly.
a Talend Jobserver.
1462
tHDFSCompare
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.
Group Enter the membership including the authentication user

under which the HDFS instances were started. This field is
available depending on the distribution you are using.
Comparison mode Select the mode to be applied on the comparison.
File to compare Browse, or enter the path to the file in HDFS you need to
check for quality control.
Reference file Browse, or enter the path to the file in HDFS the comparison
is based on.
If differences detected, display and If no differences Type in a message to be displayed in the Run console based
detected, display on the result of the comparison.
Print to console Select this check box to display the message in the Run
console.
Advanced settings
docs.oracle.com.
1463
tHDFSCompare

Global Variables
Global Variables DIFFERENCE: the result of the comparison. This is a Flow

check box.
use from it.
User Guide.
Usage
Usage rule tHDFSCompare can be standalone component or send the

information it generates to its following component.
Talend Studio .
unusable.
Guide.
1464
tHDFSCompare

stored in MapR.
Limitation JRE 1.6+ is required.
Related scenarios
1465
tHDFSConnection
tHDFSConnection
Connects to a given HDFS so that the other Hadoop components can reuse the connection it creates
to communicate with this HDFS.
tHDFSConnection provides connection to the Hadoop distributed file system (HDFS) of interest at
runtime.
tHDFSConnection Standard properties

These properties are used to configure tHDFSConnection running in the Standard Job framework.
The Standard tHDFSConnection component belongs to the Big Data and the File families.
Fabric.
Basic settings

not provide.
1466
tHDFSConnection

distribution.
issues on your own.
Note:

Hortonworks.
you are using.
• HDFS
by Talend.
cluster.
defined become:
1467
tHDFSConnection

Azure portal.
Job.

<configuration>
<property>
<name>talend.k
</property>
<property>
<name>talend.k
value>
description>
</property>
<property>
<name>talend.k
value>
1468
tHDFSConnection

</property>
<property>
name>
description>
</property>
<property>
<name>talend.s
<value>ssl </value>
<description> Set
description>
</property>
<property>
<name>talend.s
<value>ssl </value>
description>
</property>
</configuration>
used.
on the fly.
1469
tHDFSConnection

a Talend Jobserver.
User name User authentication name of HDFS.

Use datanode hostname Select the Use datanode hostname check box to allow the
Job to access datanodes via their hostnames. This actually
sets the dfs.client.use.datanode.hostname property to true.
Setup HDFS encryption configurations If the HDFS transparent encryption has been enabled
in your cluster, select the Setup HDFS encryption
configurations check box and in the HDFS encryption key
provider field that is displayed, enter the location of the
KMS proxy.
For further information about the HDFS transparent
encryption and its KMS proxy, see Transparent Encryption in
HDFS.
1470
tHDFSConnection
Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is generally used with other Hadoop

components.

stored in MapR.
1471
tHDFSConnection
Limitations JRE 1.6+ is required.
Related scenarios
1472
tHDFSCopy
tHDFSCopy
copies a source file or folder into a target directory in HDFS and removes this source if required.
tHDFSCopy Standard properties

These properties are used to configure tHDFSCopy running in the Standard Job framework.
The Standard tHDFSCopy component belongs to the Big Data and the File families.
Fabric.
Basic settings

not provide.
1473
tHDFSCopy

distribution.
issues on your own.
Note:

Hortonworks.
you are using.
• HDFS
by Talend.
cluster.
defined become:
1474
tHDFSCopy

Azure portal.
Job.
on the fly.
a Talend Jobserver.
1475
tHDFSCopy

used.

Source file or directory Browse to, or enter the path pointing to the data to be used
in the file system.
Target location Browse to, or enter the directory in HDFS to which you need
to copy the data.
Rename To rename the file or folder copied to the target location,

select this check box to display the New name field, then,
enter the new name.
Copy merge Select this check box to merge the part files generated at
the end of a MapReduce computation.
Once selecting it, you need to enter the name of the final
merged file in the Merge name field.
Remove source Select this check box to remove the source file or folder
once this source is copied to the target location.
Override target file (This option does not override the Select this check box to override the file already existing in
directory) the target location. This option does not override the folder.
Advanced settings
1476
tHDFSCopy
Global Variables
Global Variables DESTINATION_FILEPATH: the destination file path. This is

SOURCE_FILEPATH: the source file path. This is an After
check box.
use from it.
User Guide.
Usage
Usage rule tHDFSCopy is a standalone component.
Talend Studio .
unusable.
Guide.

1477
tHDFSCopy

stored in MapR.
Related scenario
Related topic, see Procedure on page 990
Related topic, see Iterating on a HDFS directory on page 1523
1478
tHDFSDelete
tHDFSDelete
Deletes a file located on a given Hadoop distributed file system (HDFS).
tHDFSDelete Standard properties

These properties are used to configure tHDFSDelete running in the Standard Job framework.
The Standard tHDFSDelete component belongs to the Big Data and the File families.
Fabric.
Basic settings

not provide.
1479
tHDFSDelete

distribution.
issues on your own.
Note:

Hortonworks.
you are using.
• HDFS
by Talend.
cluster.
defined become:
1480
tHDFSDelete

Azure portal.
Job.
on the fly.
a Talend Jobserver.
User name User authentication name of HDFS.
1481
tHDFSDelete

File or Directory Path Browse to, or enter the path to the file or folder to be
deleted on HDFS.
Advanced settings
Hadoop properties If you need to use custom configuration for the Hadoop of
interest, complete this table with the property or properties
to be customized. Then at runtime, the customized property
or properties will override those corresponding ones
defined earlier for the same Hadoop.
Hadoop, see the Hadoop documentation.
level.
Global Variables
Global Variables DELETE_PATH: the path to the deleted file or folder. This is
check box.
use from it.
User Guide.
Usage
Usage rule This component is used to compose a single-component Job

or subJob.
Talend Studio .
1482
tHDFSDelete

unusable.
Guide.

stored in MapR.
Related scenarios
1483
tHDFSExist
tHDFSExist
Checks whether a file exists in a specific directory in HDFS.
tHDFSExist Standard properties

These properties are used to configure tHDFSExist running in the Standard Job framework.
The Standard tHDFSExist component belongs to the Big Data and the File families.
Fabric.
Basic settings

not provide.
1484
tHDFSExist

distribution.
issues on your own.
Note:

Hortonworks.
you are using.
• HDFS
by Talend.
cluster.
defined become:
1485
tHDFSExist

Azure portal.
Job.
on the fly.
a Talend Jobserver.
1486
tHDFSExist
used.

HDFS directory Browse to, or enter the path pointing to the data to be used
in the file system.
File name or relative path Enter the name of the file you want to check whether this
file exists. Or if needs be, browse to the file or enter the
path to the file, relative to the directory you entered in
HDFS directory.
Advanced settings
Global Variables
1487
tHDFSExist

check box.
use from it.
User Guide.
Usage
Usage rule tHDFSExist is a standalone component.
Talend Studio .
unusable.
Guide.

1488
tHDFSExist

stored in MapR.
Checking the existence of a file in HDFS

In this scenario, the two-component Job checks whether a specific file exists in HDFS and returns a
message to indicate the result of the verification.
In the real-world practice, you can take further action to process the file checked according to the
verification result, using the other HDFS components provided with the Studio.
Launch the Hadoop distribution in which you want to check the existence of a particular file. Then,
proceed as follows:

Procedure
1. In the Integration perspective of Talend Studio , create an empty Job, named hdfsexist_file for
2. Drop tHDFSExist and tMsgBox onto the workspace.
3. Connect them using the Trigger > Run if link.
Configuring the connection to HDFS

Procedure
1. Double-click tHDFSExist to open its Component view.
1489
tHDFSExist
2. In the Version area, select the Hadoop distribution you are connecting to and its version.
3. In the Connection area, enter the values of the parameters required to connect to the HDFS.
In the real-world practice, you may use tHDFSConnection to create a connection and reuse it from
the current component. For further information, see tHDFSConnection on page 1466.
4. In the HDFS Directory field, browse to, or enter the path to the folder where the file to be checked
is. In this example, browse to /user/ychen/data/hdfs/out/dest.
5. In the File name or relative path field, enter the name of the file you want to check the existence.
For example, output.csv.
Defining the message to be returned

Procedure
1. Double-click tMsgBox to open its Component view.
2. In the Title field, enter the title to be used for the pop-up message box to be created.
3. In the Buttons list, select OK. This defines the button to be displayed on the message box.
4. In the Icon list, select Icon information.
5. In the Message field, enter the message you want to displayed once the file checking is done. In
this example, enter "This file does not exist!".
1490
tHDFSExist
Defining the condition

Procedure
1. Click the If link to open the Basic settings view, where you are able to define the condition for
checking the existence of this file.
2. In the Condition box, press Ctrl+Space to access the variable list and select the global variable
EXISTS. Type an exclamation mark before the variable to negate the meaning of the variable.
Executing the Job

Procedure
Press F6 to execute this Job.
Results
Once done, a message box pops up to indicate that this file called output.csv does not exist in the
directory you defined earlier.
In the HDFS we check the existence of the file, browse to this directory specified, you can see that this
file does not exist.
1491
tHDFSExist
1492
tHDFSGet
tHDFSGet
Copies files from Hadoop distributed file system(HDFS), pastes them in a user-defined directory and if
needs be, renames them.
tHDFSGet connects to Hadoop distributed file system, helping to obtain large-scale files with
optimized performance.
tHDFSGet Standard properties

These properties are used to configure tHDFSGet running in the Standard Job framework.
The Standard tHDFSGet component belongs to the Big Data and the File families.
Fabric.
Basic settings

1493
tHDFSGet

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
you are using.
• HDFS
by Talend.
cluster.
defined become:
1494
tHDFSGet

Azure portal.
Job.
on the fly.
a Talend Jobserver.
1495
tHDFSGet

used.

in the file system.
Local directory Browse to, or enter the local directory to store the files
obtained from HDFS.
one.
records.

- File mask: type in the file name to be selected from HDFS.
Regular expression is available.
- New name: give a new name to the obtained file.
free rows.
Advanced settings
level.
1496
tHDFSGet

Global Variables

TRANSFER_MESSAGES: file transferred information. This is
check box.
use from it.
User Guide.
Usage
Usage rule This component combines HDFS connection and data

extraction, thus used as a single-component subJob to move
data from HDFS to an user-defined local directory.
Different from the tHDFSInput and the tHDFSOutput
components, it runs standalone and does not generate input
or output flow for the other components.
It is often connected to the Job using OnSubjobOk or
OnComponentOk link, depending on the context.
Talend Studio .
1497
tHDFSGet

unusable.
Guide.

stored in MapR.
Computing data with Hadoop distributed file system

The following scenario describes a simple Job that creates a file in a defined directory, get it into and
out of HDFS, subsequently store it to another local directory, and read it at the end of the Job.
Setting up the Job

Procedure
1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput,
tFileOutputDelimited, tHDFSPut, tHDFSGet, tFileInputDelimited and tLogRow.
2. Connect tFixedFlowInput to tFileOutputDelimited using a Row > Main connection.
1498
tHDFSGet
3. Connect tFileInputDelimited to tLogRow using a Row > Main connection.

4. Connect tFixedFlowInput to tHDFSPut using an OnSubjobOk connection.
5. Connect tHDFSPut to tHDFSGet using an OnSubjobOk connection.
6. Connect tHDFSGet to tFileInputDelimitedusing an OnSubjobOk connection.

Procedure
1. Double-click tFixedFlowInput to define the component in its Basic settings view.
2. Set the Schema to Built-In and click the three-dot [...] button next to Edit Schema to describe the
data structure you want to create from internal variables. In this scenario, the schema contains
one column: content.
1499
tHDFSGet
3. Click the plus button to add the parameter line.

4. Click OK to close the dialog box and accept to propagate the changes when prompted by the
studio.
5. In Basic settings, define the corresponding value in the Mode area using the Use Single Table
option. In this scenario, the value is "Hello world!".
Configuring the tFileOutputDelimited component

Procedure
1. Double-click tFileOutputDelimited to define the component in its Basic settings view.
2. Click the [...] button next to the File Name field and browse to the output file you want to write
data in, in.txt in this example.
Loading the data from the local file

Procedure
1. Double-click tHDFSPut to define the component in its Basic settings view.
1500
tHDFSGet
2. Select, for example, Apache 0.20.2 from the Hadoop version list.
3. In the NameNode URI, the Username and the Group fields, enter the connection parameters to
the HDFS. If you are using WebHDFS, the location should be webhdfs://masternode:portnumber;
4. Next to the Local directory field, click the three-dot [...] button to browse to the folder with
the file to be loaded into the HDFS. In this scenario, the directory has been specified while
configuring tFileOutputDelimited: C:/hadoopfiles/putFile/.
5. In the HDFS directory field, type in the intended location in HDFS to store the file to be loaded. In
this example, it is /testFile.
6. Click the Overwrite file field to stretch the drop-down.
7. From the menu, select always.
8. In the Files area, click the plus button to add a row in which you define the file to be loaded.
9. In the File mask column, enter *.txt to replace newLine between quotation marks and leave the
New name column as it is. This allows you to extract all the .txt files in the specified directory
without changing their names. In this example, the file is in.txt.
Getting the data from the HDFS

Procedure
1. Double-click tHDFSGet to define the component in its Basic settings view.
1501
tHDFSGet
2. Select, for example, Apache 0.20.2 from the Hadoop version list.
3. In the NameNode URI, the Username, the Group fields, enter the connection parameters to the
HDFS. If you are using WebHDFS, the location should be webhdfs://masternode:portnumber;
4. In the HDFS directory field, type in location storing the loaded file in HDFS. In this example, it is /
testFile.
5. Next to the Local directory field, click the three-dot [...] button to browse to the folder intended to
store the files that are extracted out of the HDFS. In this scenario, the directory is: C:/hadoopfiles/
getFile/.
6. Click the Overwrite file field to stretch the drop-down.
7. From the menu, select always.
8. In the Files area, click the plus button to add a row in which you define the file to be extracted.
9. In the File mask column, enter *.txt to replace newLine between quotation marks and leave the
New name column as it is. This allows you to extract all the .txt files from the specified directory
in the HDFS without changing their names. In this example, the file is in.txt.
Reading data from the HDFS and saving the data locally
Procedure
1. Double-click tFileInputDelimited to define the component in its Basic settings view.
1502
tHDFSGet
2. Set property type to Built-In.

3. Next to the File Name/Stream field, click the three-dot button to browse to the file you have
obtained from the HDFS. In this scenario, the directory is C:/hadoopfiles/getFile/in.txt.
4. Set Schema to Built-In and click Edit schema to define the data to pass on to the tLogRow
component.
5. Click the plus button to add a new column.

6. Click OK to close the dialog box and accept to propagate the changes when prompted by the
studio.
Executing the Job

Save the Job and press F6 to execute it.
The in.txt file is created and loaded into the HDFS.
1503
tHDFSGet
The file is also extracted from the HDFS by tHDFSGet and is read by tFileInputDelimited.
1504
tHDFSInput
tHDFSInput
Extracts the data in a HDFS file for other components to process it.
tHDFSInput reads a file located on a given Hadoop distributed file system (HDFS) and puts the data of
interest from this file into a Talend schema. Then it passes the data to the component that follows.
tHDFSInput Standard properties

These properties are used to configure tHDFSInput running in the Standard Job framework.
The Standard tHDFSInput component belongs to the Big Data and the File families.
Fabric.
Basic settings

are stored.
available:
only.
component only.

Job designs.
1505
tHDFSInput
not provide.
distribution.
issues on your own.
1506
tHDFSInput
Note:

Hortonworks.
you are using.
• HDFS
by Talend.
cluster.
defined become:
Azure portal.
Job.
1507
tHDFSInput

on the fly.
a Talend Jobserver.
used.

File Name Browse to, or enter the path pointing to the data to be used
in the file system.
If the path you set points to a folder, this component will
read all of the files stored in that folder. Furthermore, if
sub-folders exist in that folder and you need to read files in
the sub-folders, select the Include sub-directories if path is
directory check box in the Advanced settings view.
Type Select the type of the file to be processed. The type of the
file may be:
• Text file.
• Sequence file: a Hadoop sequence file consists of
binary key/value pairs and is suitable for the Map/
Reduce framework. For further information, see http://
wiki.apache.org/hadoop/SequenceFile.
1508
tHDFSInput
Once you select the Sequence file format, the Key

column list and the Value column list appear to allow
you to select the keys and the values of that Sequence
file to be processed.

This field is not available for a Sequence file.

header and set 1 for the data with header at the first row.
Custom encoding You may encounter encoding issues when you process
the stored data. In that situation, select this check box to
display the Encoding list.
Select the encoding from the list or select Custom and
docs.oracle.com.
This option is not available for a Sequence file.
Compression Select the Uncompress the data check box to uncompress t

he input data.
Hadoop provides different compression formats that help
reduce the space needed for storing files and speed up data
transfer. When reading a compressed file, the Studio needs
to uncompress it before being able to feed it to the input
flow.
Advanced settings
Include sub-directories if path is directory Select this check box to read not only the folder you have
specified in the File name field but also the sub-folders in
that folder.
1509
tHDFSInput

level. Note that this check box is not available in the Map/
Reduce version of the component.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component needs an output link.
Talend Studio .
unusable.
Guide.
1510
tHDFSInput

stored in MapR.
Using HDFS components to work with Azure Data Lake

Storage (ADLS)
This scenario describes how to use the HDFS components to read data from and write data to Azure
Data Lake Storage.
• tFixedFlowInput: it provides sample data to the Job.
• tHDFSOutput: it writes sample data to Azure Data Lake Store.
• tHDFSInput: it reads sample data from Azure Data Lake Store.
• tLogRow: it displays the output of the Job on the console of the Run view of the Job.
Grant your application the access to your ADLS Gen2

Before you begin
An Azure subscription is required.
Procedure
1. Create your Azure Data Lake Storage Gen2 account if you do not have it yet.
1511
tHDFSInput
• For more details, see Create an Azure Data Lake Storage Gen2 account from the Azure
documentation.
2. Create an Azure Active Directory application on your Azure portal. For more details about how to
do this, see the "Create an Azure Active Directory application" section in Azure documentation:
Use portal to create an Azure Active Directory application.
3. Obtain the application ID, object ID and the client secret of the application to be used from the
portal.
a) On the list of the registered applications, click the application you created and registered in
the previous step to display its information blade.
b) Click Overview to open its blade, and from the top section of the blade, copy the Object ID and
the application ID displayed as Application (client) ID. Keep them somewhere safe for later
use.
c) Click Certificates & secrets to open its blade and then create the authentication key (client
secret) to be used on this blade in the Client secrets section.
4. Back to the Overview blade of the application to be used, click Endpoints on the top of this blade,
copy the value of OAuth 2.0 token endpoint (v1) from the endpoint list that appears and keep it
somewhere safe for later use.
5. Set the read and write permissions to the ADLS Gen2 filesystem to be used for the service
principal of your application.
It is very likely that the administrator of your Azure system has included your account and your
applications in the group that has access to a given ADLS Gen2 storage account and a given ADLS
Gen2 filesystem. In this case, ask your administrator to ensure that you have the proper access and
then ignore this step.
a) Start your Microsoft Azure Storage Explorer and find your ADLS Gen2 storage account on the
Storage Accounts list.
If you have not installed Microsoft Azure Storage Explorer, you can download it from the
Microsoft Azure official site.
b) Expand this account and the Blob Containers node under it; then click the ADLS Gen2
hierarchical filesystem to be used under this node.
1512
tHDFSInput
Example
The filesystem in this image is for demonstration purposes only. Create the filesystem to be
used under the Blob Containers node in your Microsoft Azure Storage Explorer, if you do not
have one yet.
c) On the blade that is opened, click Manage Access to open its wizard.
d) At the bottom of this wizard, add the object ID of your application to the Add user or group
field and click Add.
e) Select the object ID just added from the Users and groups list and select all the permission for
Access and Default.
f) Click Save to validate these changes and close this wizard.
Creating an HDFS Job in the Studio

Procedure
1. On the Integration perspective, drop the following components from the Palette onto the design
workspace: tFixedFlowInput, tHDFSOutput, tHDFSInput and tLogRow.
2. Connect tFixedFlowInput to tHDFSOutput using a Row > Main link.
3. Do the same to connect tHDFSInput to tLogRow.
4. Connect tFixedFlowInput to tHDFSInput using a Trigger > OnSubjobOk link.
1513
tHDFSInput
Results
Configuring the HDFS components to work with Azure Data Lake Storage
Procedure
1. Double-click tFixedFlowInput to open its Component view to provide sample data to the Job.
The sample data to be used contains only one row with two column: id and name.
3. Click the [+] button to add the two columns and rename them to id and name.
4. Click OK to close the schema editor and validate the schema.
5. In the Mode area, select Use single table.
The id and the name columns automatically appear in the Value table and you can enter the
values you want within double quotation marks in the Value column for the two schema values.
6. Double-click tHDFSOutput to open its Component view.
1514
tHDFSInput
Example
7. In the Version area, select Hortonworks or Cloudera depending on the distribution you are using.
In the Standard framework, only these two distributions with ADLS are supported by the HDFS
components.
8. From the Scheme drop-down list, select ADLS. The ADLS related parameters appear in the
Component view.
9. In the URI field, enter the NameNode service of your application. The location of this service is
actually the address of your Data Lake Store.
For example, if your Data Lake Storage name is data_lake_store_name, the NameNode URI
to be used is adl://data_lake_store_name.azuredatalakestore.net.
10. In the Client ID and the Client key fields, enter, respectively, the authentication ID and the
authentication key generated upon the registration of the application that the current Job you are
developing uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate permissions to access Azure Data Lake.
You can check this on the Required permissions view of this application on Azure. For further
information, see Azure documentation Assign the Azure AD application to the Azure Data Lake
Storage account file or folder.
This application must be the one to which you assigned permissions to access your Azure Data
Lake Storage in the previous step.
11. In the Token endpoint field, copy-paste the OAuth 2.0 token endpoint that you can obtain from
the Endpoints list accessible on the App registrations page on your Azure portal.
1515
tHDFSInput
12. In the File name field, enter the directory to be used to store the sample data on Azure Data Lake
Storage.
13. From the Action drop-down list, select Create if the directory to be used does not exist yet on
Azure Data Lake Storage; if this folder already exists, select Overwrite.
14. Do the same configuration for tHDFSInput.
15. If you run your Job on Windows, following this procedure to add the winutils.exe program to your
Job.
16. Press F6 to run your Job.
1516
tHDFSList
tHDFSList
tHDFSList retrieves a list of files or folders based on a filemask pattern and iterates on each unity.
tHDFSList Standard properties

These properties are used to configure tHDFSList running in the Standard Job framework.
The Standard tHDFSList component belongs to the Big Data and the File families.
Fabric.
Basic settings

not provide.
1517
tHDFSList

distribution.
issues on your own.
Note:

Hortonworks.
you are using.
• HDFS
by Talend.
cluster.
defined become:
1518
tHDFSList

Azure portal.
Job.
on the fly.
a Talend Jobserver.
1519
tHDFSList

used.

HDFS Directory Browse to, or enter the path pointing to the data to be used
in the file system.
FileList Type Select the type of input you want to iterate on from the list:
Files if the input is a set of files,
Directories if the input is a set of directories,
Both if the input is a set of the above two types.
Case Sensitive Set the case mode from the list to either create or not
create case sensitive filter on filenames.
Use Glob Expressions as Filemask This check box is selected by default. It filters the results
using a Global Expression (Glob Expressions).
Files Click the plus button to add as many filter lines as needed:
Filemask: in the added filter lines, type in a filename or a
filemask using special characters or regular expressions.
Order by The folders are listed first of all, then the files. You can
choose to prioritise the folder and file order either:
By default: alphabetical order, by folder then file;
By file name: alphabetical order or reverse alphabetical
order;
By file size: smallest to largest or largest to smallest;
By modified date: most recent to least recent or least recent
to most recent.
Note:
If ordering by file name, in the event of identical file
names then modified date takes precedence. If ordering
by file size, in the event of identical file sizes then file
name takes precedence. If ordering by modified date,
in the event of identical dates then file name takes
precedence.
Order action Select a sort order by clicking one of the following radio
buttons:
ASC: ascending order;
DESC: descending order;
Advanced settings
Use Exclude Filemask Select this check box to enable Exclude Filemask field to
exclude filtering condition based on file type:
1520
tHDFSList
Exclude Filemask: Fill in the field with file types to be

excluded from the Filemasks in the Basic settings view.
Note:
File types in this field should be quoted with double
quotation marks and seperated by comma.
Global Variables

CURRENT_FILEDIRECTORY: the current file directory. This is
CURRENT_FILEEXTENSION: the extension of the current file.
NB_FILE: the number of files iterated upon so far. This is a
check box.
1521
tHDFSList

use from it.
User Guide.
Usage
Usage rule tHDFSList provides a list of files or folders from a defined

HDFS directory on which it iterates.
Talend Studio .
unusable.
Guide.

Row: Iterate

Row: Iterate.
lize.

Studio User Guide.

1522
tHDFSList

stored in MapR.
Iterating on a HDFS directory

This scenario uses a two-component Job to iterate on a specified directory in HDFS so as to select the
files from there towards a local directory.
Preparing the data to be used

Procedure
Create the files to be iterated on in the HDFS you want to use. In this scenario, two files are created in
the directory: /user/ychen/data/hdfs/out.
1523
tHDFSList
You can design a Job in the Studio to create the two files. For further information, see tHDFSPut on
page 1548 or tHDFSOutput on page 1528.

Procedure
1. In the Integration perspective of Talend Studio , create an empty Job, named HDFSList for
2. Drop tHDFSList and tHDFSGet onto the workspace.
3. Connect them using the Row > Iterate link.
Configuring the iteration

Procedure
1. Double-click tHDFSList to open its Component view.
1524
tHDFSList
In the real-world practice, you may use tHDFSConnection to create a connection and reuse it from
the current component. For further information, see tHDFSConnection on page 1466.
4. In the HDFS Directory field, enter the path to the folder where the files to be iterated on are. In
this example, as presented earlier, the directory is /user/ychen/data/hdfs/out/.
5. In the FileList Type field, select File.
6. In the Files table, click to add one row and enter * between the quotation marks to iterate on
any files existing.
Selecting the files

Procedure
1. Double-click tHDFSGet to open its Component view.
1525
tHDFSList
In the real-world practice, you may have used tHDFSConnection to create a connection; then you
can reuse it from the current component. For further information, see tHDFSConnection on page
1466.
4. In the HDFS directory field, enter the path to the folder holding the files to be retrieved.
To do this with the auto-completion list, place the mouse pointer in this field, then, press Ctrl
+Space to display the list and select the tHDFSList_1_CURRENT_FILEDIRECTORY variable to reuse
the directory you have defined in tHDFSList. In this variable, tHDFSList_1 is the label of the
component. If you label it differently, select the variable accordingly.
Once selecting this variable, the directory reads, for example, ((String)globalMap.get("tHDF
SList_1_CURRENT_FILEDIRECTORY")) in this field.
For further information about how to label a component, see the Talend Studio User Guide.
5. In the Local directory field, enter the path, or browse to the folder you want to place the selected
files in. This folder will be created if it does not exist. In this example, it is C:/hdfsFiles.
6. In the Overwrite file field, select always.
7. In the Files table, click to add one row and enter * between the quotation marks in the
Filemask column in order to get any files existing.
Executing the Job

Procedure
Press F6 to execute this Job.
1526
tHDFSList
Results
Once done, you can check the files created in the local directory.
1527
tHDFSOutput
tHDFSOutput
Writes data flows it receives into a given Hadoop distributed file system (HDFS).
tHDFSOutput Standard properties

These properties are used to configure tHDFSOutput running in the Standard Job framework.
The Standard tHDFSOutput component belongs to the Big Data and the File families.
Fabric.
Basic settings

are stored.
available:
only.
component only.

Job designs.
1528
tHDFSOutput

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
1529
tHDFSOutput
you are using.
• HDFS
by Talend.
cluster.
defined become:
Azure portal.
Job.
1530
tHDFSOutput

on the fly.
a Talend Jobserver.
used.

File Name Browse to, or enter the location of the file which you write
Type Select the type of the file to be processed. The type of the
file may be:
• Text file.
• Sequence file: a Hadoop sequence file consists of
binary key/value pairs and is suitable for the Map/
Reduce framework. For further information, see http://
wiki.apache.org/hadoop/SequenceFile.
Once you select the Sequence file format, the Key
column list and the Value column list appear to allow
you to select the keys and the values of that Sequence
file to be processed.
Action Select an operation in HDFS:

File Name field.
does not exist.
1531
tHDFSOutput

docs.oracle.com.
Compression Select the Compress the data check box to compress the
output data.
flow.
Note that when the type of the file to be written is
Sequence File, the compression algorithm is embedded
within the container files (the part- files) of this sequence
file. These files can be read by a Talend component
such as tHDFSInput within MapReduce Jobs and other
applications that understand the sequence file format.
Alternatively, when the type is Text File, the output files
can be accessed with standard compression utilities that
understand the bzip2 or gzip container files.
Include header Select this check box to output the header of the data.
Advanced settings
1532
tHDFSOutput

level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component needs an input component.
Talend Studio .
unusable.
Guide.

1533
tHDFSOutput

stored in MapR.
Related scenario
• Related topic, see Writing data in a delimited file on page 1116.
• Related topic, see Computing data with Hadoop distributed file system on page 1498.
1534
tHDFSOutputRaw
tHDFSOutputRaw
Transfers data of different formats such as hierarchical data in the form of a single column into a
given HDFS file system.
tHDFSOutputRaw receives a single-column input flow and writes the data into HDFS.
tHDFSOutputRaw Standard properties

These properties are used to configure tHDFSOutputRaw running in the Standard Job framework.
The Standard tHDFSOutputRaw component belongs to the Big Data family.
Fabric.
Basic settings

are stored.
only.
1535
tHDFSOutputRaw

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
you are using.
1536
tHDFSOutputRaw
• HDFS
by Talend.
cluster.
defined become:
Azure portal.
Job.
on the fly.
1537
tHDFSOutputRaw

a Talend Jobserver.
Use Datanode hostname Select the Use datanode hostname check box to allow the
Job to access datanodes via their hostnames. This actually
sets the dfs.client.use.datanode.hostname property to true.
When connecting to a S3N filesystem, you must select this
check box.
used.

File Name Browse to, or enter the location of the file which you write
Action Select an operation in HDFS:

File Name field.
does not exist.
docs.oracle.com.
Compression Select the Compress the data check box to compress the
output data.
1538
tHDFSOutputRaw

flow.
Note that when the type of the file to be written is
Sequence File, the compression algorithm is embedded
within the container files (the part- files) of this sequence
file. These files can be read by a Talend component
such as tHDFSInput within MapReduce Jobs and other
applications that understand the sequence file format.
Alternatively, when the type is Text File, the output files
can be accessed with standard compression utilities that
understand the bzip2 or gzip container files.
an error occurs.
Advanced settings
level.
Global Variables
1539
tHDFSOutputRaw

check box.
use from it.
User Guide.
Usage
Usage rule This component needs an input component that provides

the data of a single column. This column must be labeled to
content and its type must be Object.
For example, you can:
• use tConvertType to convert a column from String to
Object, or
• use tJavaRow to add the data to be processed into the
globalMap object so that this data becomes available
as a global variable for the other components such
as tFixedFlowInput to construct this required single
column.
For further information about tConvertType, see
tConvertType on page 504.
For further information about tJavaRow, see tJavaRow on
page 1845.
For further information about tFixedFlowInput, see
tFixedFlowInput on page 1200.
For further information about how to use a global variable,
see the section describing how to use contexts and
variables in Talend Studio User Guide.
Talend Studio .
unusable.
1540
tHDFSOutputRaw

Guide.

stored in MapR.
Related Scenario
Once you have properly configured the connection to HDFS for this component, this component works
exactly the same way as tFileOutputRaw.
For further information about tFileOutputRaw, see tFileOutputRaw on page 1153.
1541
tHDFSProperties
tHDFSProperties
Creates a single row flow that displays the properties of a file processed in HDFS.
tHDFSProperties Standard properties

These properties are used to configure tHDFSProperties running in the Standard Job framework.
The Standard tHDFSProperties component belongs to the Big Data and the File families.
Fabric.
Basic settings

not provide.
1542
tHDFSProperties

distribution.
issues on your own.
Note:

Hortonworks.
you are using.
• HDFS
by Talend.
cluster.
defined become:
1543
tHDFSProperties

Azure portal.
Job.
on the fly.
a Talend Jobserver.
1544
tHDFSProperties

used.

File Browse to, or enter the path pointing to the data to be used
in the file system.
Get file checksum Select this check box to generate and output the MD5
information of the file processed.
Note that this is an HDFS only checksum and not a true
MD5 hash that can be compared with the MD5 value
obtained, for example, from tFileInputProperties, For further
information about this component, see tFileInputProperties
on page 1079.
Advanced settings
1545
tHDFSProperties
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule tHDFSProperties can be standalone component or send the

information it generates to its following component.
Talend Studio .
unusable.
Guide.

1546
tHDFSProperties

stored in MapR.
Related scenario
Related topic, see Procedure on page 1159
Related topic, see Iterating on a HDFS directory on page 1523
1547
tHDFSPut
tHDFSPut
Connects to Hadoop distributed file system to load large-scale files into it with optimized
performance.
tHDFSPut copies files from an user-defined directory, pastes them into a given Hadoop distributed
file system(HDFS) and if needs be, renames these files.
tHDFSPut Standard properties

These properties are used to configure tHDFSPut running in the Standard Job framework.
The Standard tHDFSPut component belongs to the Big Data and the File families.
Fabric.
Basic settings

1548
tHDFSPut

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
you are using.
• HDFS
by Talend.
cluster.
defined become:
1549
tHDFSPut

Azure portal.
Job.
on the fly.
a Talend Jobserver.
1550
tHDFSPut

used.

Local directory Local directory where are stored the files to be loaded into
HDFS.
in the file system.
one.
Use Perl5 Regex Expression as Filemask Select this check box if you want to use Perl5 regular
expressions in the Files field as file filters. This is useful
when the name of the file to be used contains special
characters such as parentheses.
For information about Perl5 regular expression syntax, see
Perl5 Regular Expression Syntax.

- File mask: type in the file name to be selected from the
local directory. Regular expression is available.
- New name: give a new name to the loaded file.
free rows.
Advanced settings
level.
1551
tHDFSPut

Global Variables

TRANSFER_MESSAGES: file transferred information. This is
check box.
use from it.
User Guide.
Usage
Usage rule This component combines HDFS connection and data

extraction, thus usually used as a single-component subJob
to move data from a user-defined local directory to HDFS.
Different from the tHDFSInput and the tHDFSOutput
components, it runs standalone and does not generate input
or output flow for the other components.
It is often connected to the Job using OnSubjobOk or
OnComponentOk link, depending on the context.
Talend Studio .
unusable.
1552
tHDFSPut

Guide.

stored in MapR.
Related scenario
For related scenario, see Computing data with Hadoop distributed file system on page 1498.
1553
tHDFSRename
tHDFSRename
Renames the selected files or specified directory on HDFS.
tHDFSRename Standard properties

These properties are used to configure tHDFSRename running in the Standard Job framework.
The Standard tHDFSRename component belongs to the Big Data and the File families.
Fabric.
Basic settings

not provide.
1554
tHDFSRename

distribution.
issues on your own.
Note:

Hortonworks.
you are using.
• HDFS
by Talend.
cluster.
defined become:
1555
tHDFSRename

Azure portal.
Job.
on the fly.
a Talend Jobserver.
1556
tHDFSRename
used.

in the file system.
Overwrite file Select the options to overwrite or not the existing file with
the new one.
Files Click the [+] button to add the lines you want to use as
filters:
New name: name to give to the HDFS file after the transfer.
rows.
Advanced settings
1557
tHDFSRename
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is used to compose a single-component Job

or subJob.
Talend Studio .
unusable.
Guide.

1558
tHDFSRename

stored in MapR.
Related scenario
For related scenario, see Computing data with Hadoop distributed file system on page 1498.
1559
tHDFSRowCount
tHDFSRowCount
Reads a file in HDFS row by row in order to determine the number of rows this file contains.
tHDFSRowCount counts the number of rows in a file in HDFS. If the file to be processed is a Hadoop
sequence file type or a large dataset, it is recommended to use a tAggregateRow to count the records.
tHDFSRowCount Standard properties

These properties are used to configure tHDFSRowCount running in the Standard Job framework.
The Standard tHDFSRowCount component belongs to the Big Data and the File families.
Fabric.
Basic settings
Property Type Built-In: You create and store the schema locally for this
component only.
Job designs.
not provide.
1560
tHDFSRowCount

distribution.
issues on your own.
Note:

Hortonworks.
you are using.
• HDFS
by Talend.
cluster.
defined become:
1561
tHDFSRowCount

Azure portal.
Job.
on the fly.
a Talend Jobserver.
1562
tHDFSRowCount
used.

File name Browse to, or enter the path pointing to the data to be used
in the file system.
Ignore empty rows Select this check box to skip the empty rows.
docs.oracle.com.
Compression Select the Uncompress the data check box to uncompress t

he input data.
flow.
Advanced settings
1563
tHDFSRowCount
Global Variables
Global Variables COUNT: the number of rows in a file. This is a Flow variable
check box.
use from it.
User Guide.
Usage
Usage rule tHDFSRowCount is a standalone component; it must be

used with a OnSubjobOk connection to tJava in order to
return the row count.
The valid code for tJava to get this count could be:
System.out.pri
nt(((Integer)globalMap.get("
tHDFSRowCount_1_COUNT")));
In this example, tHDFSRowCount_1 is the label of this

component in a Job, so it may vary among different use
cases; COUNT is the global variable of tHDFSRowCount,
representing the integer flow of the row count.
For further information about how to label a component or
how to use a global variable in a Job, see the Talend Studio
User Guide.
Talend Studio .
unusable.
1564
tHDFSRowCount

Guide.

stored in MapR.
Related scenarios
1565
tHiveClose
tHiveClose
Closes connection to a Hive database.
tHiveClose closes an active connection to a database.
tHiveClose Standard properties

These properties are used to configure tHiveClose running in the Standard Job framework.
The Standard tHiveClose component belongs to the Big Data and the Databases families.
Basic settings
Component list If there is more than one connection used in the Job, select
tHiveConnection from the list.
Advanced settings
component level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is to be used along with other Hive

components, especially with tHiveConnection as
tHiveConnection allows you to open a connection for the
transaction which is underway.
1566
tHiveClose

Guide.

stored in MapR.
Related scenarios
1567
tHiveConnection
tHiveConnection
Establishes a Hive connection to be reused by other Hive components in your Job.
tHiveConnection opens a connection to a Hive database.
tHiveConnection Standard properties

These properties are used to configure tHiveConnection running in the Standard Job framework.
The Standard tHiveConnection component belongs to the Big Data, the Databases and the ELT
families.
Basic settings
Qubole.

services.
1568
tHiveConnection
to be used.

this configuration.
1569
tHiveConnection

+HiveServer2.
Note:

save the settings.
1570
tHiveConnection

Hive Metastore.
connection.
1571
tHiveConnection

• Cloudera CDH4 +
field that appears.
JobHistory server.
check box.
1572
tHiveConnection


are stored.
not provide.
distribution.
1573
tHiveConnection

issues on your own.
Note:

Hortonworks.
you are using.

<configuration>
<property>
<name>talend.k
</property>
<property>
<name>talend.k
value>
description>
</property>
<property>
<name>talend.k
value>
1574
tHiveConnection
</property>
<property>
name>
description>
</property>
<property>
<name>talend.s
<value>ssl </value>
<description> Set
description>
</property>
<property>
<name>talend.s
<value>ssl </value>
description>
</property>
</configuration>
used.
Note that this option is available only in Hive Standalone
mode with Hive 2.
working with is:
by Talend .
1575
tHiveConnection

this component.
integration.
this property.
those jar file(s).
Advanced settings

1576
tHiveConnection
system.
computations.
1577
tHiveConnection
using in that host.
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is generally used with other Hive

components, particularly tHiveClose.

1578
tHiveConnection

stored in MapR.
Connecting to a custom Hadoop distribution

As explained in the properties table, when you select the Custom option from the Distribution drop-
down list, you are connecting to a Hadoop distribution different from any of the Hadoop distributions
provided on that Distribution list in the Studio.
After selecting this Custom option, click the button to display the Import custom definition dialog
box and proceed as follows:
Procedure
1. Depending on your situation, select Import from existing version or Import from zip to configure
the custom Hadoop distribution to be connected to.
• If you have the zip file of the custom Hadoop distribution you need to connect to, select
Import from zip. Talend community provides this kind of zip files that you can download
from http://www.talendforge.org/exchange/index.php.
• Otherwise, select Import from existing version to import an officially supported Hadoop
distribution as base so as to customize it by following the wizard.
Note that the check boxes in the wizard allow you to select the Hadoop element(s) you need to
import. All the check boxes are not always displayed in your wizard depending on the context in
which you are creating the connection. For example, if you are creating this connection for a Hive
component, then only the Hive check box appears.
2. Whether you have selected Import from existing version or Import from zip, verify that each check
box next to the Hadoop element you need to import has been selected..
3. Click OK and then in the pop-up warning, click Yes to accept overwriting any custom setup of jar
files previously implemented.
1579
tHiveConnection
Once done, the Custom Hadoop version definition dialog box becomes active.
This dialog box lists the Hadoop elements and their jar files you are importing.
4. If you have selected Import from zip, click OK to validate the imported configuration.
If you have selected Import from existing version as base, you should still need to add more jar
files to customize that version. Then from the tab of the Hadoop element you need to customize,
for example, the HDFS/HCatalog tab, click the [+] button to open the Select libraries dialog box.
5. Select the External libraries option to open its view.
6. Browse to and select any jar file you need to import.
7. Click OK to validate the changes and to close the Select libraries dialog box.
Once done, the selected jar file appears on the list in the tab of the Hadoop element being
configured.
Note that if you need to share the custom Hadoop setup with another Studio, you can
export this custom connection from the Custom Hadoop version definition window using the
1580
tHiveConnection
button.
8. In the Custom Hadoop version definition dialog box, click OK to validate the customized
configuration. This brings you back to the Distribution list in the Basic settings view of the
component.
Results
Now that the configuration of the custom Hadoop version has been set up and you are back to the
Distribution list, you are able to continue to enter other parameters required by the connection.
If the custom Hadoop version you need to connect to contains YARN and you want to use it, select the
Use YARN check box next to the Distribution list.
A video is available in the following link to demonstrate, by taking HDFS as example, how to set up
the connection to a custom Hadoop cluster, also referred to as an unsupported Hadoop distribution:
How to add an unsupported Hadoop distribution to the Studio.
1581
tHiveConnection
Creating a partitioned Hive table

This scenario illustrates how to use tHiveConnection, tHiveCreateTable and tHiveLoad to create a
partitioned Hive table and write data in it.
Note that tHiveCreateTable and tHiveLoad are available only when you are using one of the Talend
solutions with Big Data.
The sample data to be used in this scenario is employee information of a company, reading as follows:
1;Lyndon;Fillmore;21-05-2008;US
2;Ronald;McKinley;15-08-2008
3;Ulysses;Roosevelt;05-10-2008
4;Harry;Harrison;23-11-2007
5;Lyndon;Garfield;19-07-2007
6;James;Quincy;15-07-2008
7;Chester;Jackson;26-02-2008
8;Dwight;McKinley;16-07-2008
9;Jimmy;Johnson;23-12-2007
10;Herbert;Fillmore;03-04-2008
The information contains some employees' names and the dates when they are registered in a HR
system. Since these employees work for the US subsidiary of the company, you will create a US
partition for this sample data.
Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to
access the Hive database to be used.
Note that if you are using the Windows operating system, you have to create a tmp folder at the root
of the disk where the Studio is installed.
Then proceed as follows:

Procedure
1582
tHiveConnection
For further information about how to create a Job, see the chapter describing how to designing a
Job in Talend Studio User Guide.
2. Drop tHiveConnection, tHiveCreateTable and tHiveLoad onto the workspace.
3. Connect them using the Trigger > On Subjob OK link.
Configuring the connection to Hive

About this task
Configuring tHiveConnection
Procedure
1. Double-click tHiveConnection to open its Component view.
2. From the Property type list, select Built-in. If you have created the connection to be used in
Repository, then select Repository, click the button to open the Repository content dialog
box and select that connection. This way, the Studio will reuse that set of connection information
for this Job.
For further information about how to create a Hadoop connection in Repository, see the chapter
describing the Hadoop cluster node of the Talend Open Studio for Big Data Getting Started Guide .
3. In the Version area, select the Hadoop distribution to be used and its version. If you cannot find
from the list the distribution corresponding to yours, select Custom so as to connect to a Hadoop
distribution not officially supported in the Studio.
For a step-by-step example about how to use this Custom option, see Connecting to a custom
Hadoop distribution on page 1579.
4. In the Connection area, enter the connection parameters to the Hive database to be used.
1583
tHiveConnection
5. In the Name node field, enter the location of the master node, the NameNode, of the distribution
to be used. For example, talend-hdp-all:50300. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS with SSL is not supported yet.
6. In the Job tracker field, enter the location of the JobTracker of your distribution. For example,
hdfs://talend-hdp-all:8020.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs
described in Apache's documentation on http://hadoop.apache.org/.
Configuring tHiveConnection
Procedure
1. Double-click tHiveConnection to open its Component view.
2. From the Property type list, select Built-in. If you have created the connection to be used in
Repository, then select Repository, click the button to open the Repository content dialog
box and select that connection. This way, the Studio will reuse that set of connection information
for this Job.
For further information about how to create a Hadoop connection in Repository, see the chapter
describing the Hadoop cluster node of the Talend Open Studio for Big Data Getting Started Guide .
3. In the Version area, select the Hadoop distribution to be used and its version. If you cannot find
from the list the distribution corresponding to yours, select Custom so as to connect to a Hadoop
distribution not officially supported in the Studio.
For a step-by-step example about how to use this Custom option, see Connecting to a custom
Hadoop distribution on page 1579.
4. In the Connection area, enter the connection parameters to the Hive database to be used.
1584
tHiveConnection
5. In the Name node field, enter the location of the master node, the NameNode, of the distribution
to be used. For example, talend-hdp-all:50300. If you are using WebHDFS, the location should be
webhdfs://masternode:portnumber; WebHDFS with SSL is not supported yet.
6. In the Job tracker field, enter the location of the JobTracker of your distribution. For example,
hdfs://talend-hdp-all:8020.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs
described in Apache's documentation on http://hadoop.apache.org/.
Creating the Hive table

Defining the schema
Procedure
1. Double-click tHiveCreateTable to open its Component view.
2. Select the Use an existing connection check box and from Component list, select the connection
configured in the tHiveConnection component you are using for this Job.
3. Click the button next to Edit schema to open the schema editor.
4. Click the button four times to add four rows and in the Column column, rename them to Id,
FirstName, LastName and Reg_date, respectively.
1585
tHiveConnection
Note that you cannot use the Hive reserved keywords to name the columns, such as location or
date.
5. In the Type column, select the type of the data in each column. In this scenario, Id is of the Integer
type, Reg_date is of the Date type and the others are of the String type.
6. In the DB type column, select the Hive type of each column corresponding to their data types you
have defined. For example, Id is of INT and Reg_date is of TIMESTAMP.
7. In the Data pattern column, define the pattern corresponding to that of the raw data. In this
example, use the default one.
Defining the table settings
Procedure
1. In Table name field, enter the name of the Hive table to be created. In this scenario, it is
employees.
2. From the Action on table list, select Create table if not exists.
3. From the Format list, select the data format that this Hive table in question is created for. In this
scenario, it is TEXTFILE.
4. Select the Set partitions check box to add the US partition as explained at the beginning of this
scenario. To define this partition, click the button next to Edit schema that appears.
5. Leave the Set file location check box clear to use the default path for Hive table.
6. Select the Set Delimited row format check box to display the available options of row format.
7. Select the Field check box and enter a semicolon (;) as field separator in the field that appears.
8. Select the Line check box and leave the default value as line separator.
Writing data to the table

About this task
Configuring tHiveLoad
1586
tHiveConnection
Procedure
1. Double-click tHiveLoad to open its Component view.
3. From the Load action field, select LOAD to write data from the file holding the sample data that is
presented at the beginning of this scenario.
4. In the File path field, enter the directory where the sample data is stored. In this example, the
data is stored in the HDFS system to be used.
In the real-world practice, you can use tHDFSOutput to write data into the HDFS system and you
need to ensure that the Hive application has the appropriate rights and permissions to read or
even move the data.
For further information about tHDFSOutput, see tHDFSOutput on page 1528.
for further information about the related rights and permissions, see the documentation or
contact the administrator of the Hadoop cluster to be used.
Note if you need to read data from a local file system other than the HDFS system, ensure that the
data to be read is stored in the local file system of the machine in which the Job is run and then
select the Local check box in this Basic settings view. For example, when the connection mode to
Hive is Standalone, the Job is run in the machine where the Hive application is installed and thus
the data should be stored in that machine.
5. In the Table name field, enter the name of the target table you need to load data in. In this
scenario, it is employees.
6. From the Action on file list, select APPEND.
7. Select the Set partitions check box and in the field that appears, enter the partition you need to
add data to. In this scenario, this partition is country='US'.
Configuring tHiveLoad
Procedure
1. Double-click tHiveLoad to open its Component view.
1587
tHiveConnection
3. From the Load action field, select LOAD to write data from the file holding the sample data that is
presented at the beginning of this scenario.
4. In the File path field, enter the directory where the sample data is stored. In this example,
the data is stored in the HDFS system to be used. In the real-world practice, you can use
tHDFSOutput to write data into the HDFS system and you need to ensure that the Hive application
has the appropriate rights and permissions to read or even move the data.
For further information about the related rights and permissions, see the documentation or
contact the administrator of the Hadoop cluster to be used.
Note if you need to read data from a local file system other than the HDFS system, ensure that the
data to be read is stored in the local file system of the machine in which the Job is run and then
select the Local check box in this Basic settings view. For example, when the connection mode to
Hive is Standalone, the Job is run in the machine where the Hive application is installed and thus
the data should be stored in that machine.
5. In the Table name field, enter the name of the target table you need to load data in. In this
scenario, it is employees.
6. From the Action on file list, select APPEND.
7. Select the Set partitions check box and in the field that appears, enter the partition you need to
add data to. In this scenario, this partition is country='US'.
Executing the Job

Then you can press F6 to run this Job.
Once done, the Run view is opened automatically, where you can check the execution process.
You can as well verify the results in the web console of the Hadoop distribution used.
1588
tHiveConnection
If you need to obtain more details about the Job, it is recommended to use the web console of the
Jobtracker provided by the Hadoop distribution you are using.
Creating a JDBC Connection to Azure HDInsight Hive

This scenario illustrates how to use tHiveConnection, tHiveInput and tHiveClose to create a JDBC
Connection to HDInsight Hive.
1589
tHiveConnection
Prerequisites
Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to
access the Hive database to be used.
Configuring a DataBase Connection to Hive

About this task
This example uses version 3.6 of Azure HDInsight.
Procedure
1. In the Repository view, extend the Metadata drop-down menu.
2. Click Db Connections, and then right-click Create Connection .
3. Give a name to your connection.
4. Click Next.
5. Set up the connection configuration similarly to the following table:
1590
tHiveConnection
DB Type Select Hive.
Hadoop Cluster Select None.
Distribution Select Horton Works.

HDInsight is leveraging Horton Works distribution on the
backend. This will allow you to use Horton Works libraries
to connect to HDInsighs.
DB Type Select Hive.
1591
tHiveConnection
Version Select Hortonworks Data Platform V2.6.0.3-8 [Built in].
Hive Model Select Standalone.
Login Fill in the fields as required.

Password
Server
Port Input 443.

You will be able to communicate through the proxy port
since the HDInsight cluster sits behind a proxy by default.
DataBase Leave default.
Additional JDBC Setting Input transportMode=http;ssl=true;

httpPath=/hive2, where:
• transportMode=http sets the transport mode
to HTTP instead of the default Hive JDBC transport
mode.
• SSL=true enables SSL.
• httpPath=/hive2 sets the HTTP endpoint.
6. Click Test Connection to ensure the Talend Studio connects successfully to the cluster.
Building the Job

Procedure
1. From the Repository view of the Talend Studio, right-click Job Designs, and then click Create
Standard Job.
2. Give a name to your Job.
3. Click Finish.
1592
tHiveConnection
4. Add a tPreJob component to your workspace.

5. Add a tHiveConnection component to your workspace.
6. Double click the tHiveConnection component and choose Repository as the Property Type and the
Database Connection created above.
7. Right-click the tPreJob component.

8. Select Trigger > On Component Ok and connect the tPreJob to the tHiveConnection.
1593
tHiveConnection
9. Add a tHiveInput component to your workspace.

10. Select it and check the box Use an existing connection, then select the tHiveConnection
component in the Component List drop-down menu.
11. In the Query field, input show tables to run a query displaying the available tables in the
database.
12. Add a tLogRow component to your workspace.

13. Right-click the tHiveInput component and select Row > Main.
14. Click the tLogRow component to connect both components. They will display the information
from the query above.
15. From the Component tab of the tLogRow, select Table (print values in celles of a table).
16. Add a tPostJob component to your workspace.

17. Add a tHiveClose component to your workspace.
1594
tHiveConnection
18. Connect the tPostJob component to the tHiveClose component using an On Component Ok
connection to close the connection opened.
19. From the Run tab, click Run to run the Job and ensure of a successful connection to Hive on
HDInsight and of the readability of the table data.
1595
tHiveCreateTable
tHiveCreateTable
Creates Hive tables that fit a wide range of Hive data formats.
A proper Hive data format such as RC or ORC allows you to obtain a better performance in processing
data with Hive.
tHiveCreateTable connects to the Hive database to be used and creates a Hive table that is dedicated
to data of the format you specify.
tHiveCreateTable Standard properties

These properties are used to configure tHiveCreateTable running in the Standard Job framework.
The Standard tHiveCreateTable component belongs to the Big Data and the Databases families.
Fabric.
Basic settings
Qubole.

services.
1596
tHiveCreateTable
to be used.

this configuration.
1597
tHiveCreateTable

+HiveServer2.
Note:

save the settings.
1598
tHiveCreateTable

Hive Metastore.
connection.
1599
tHiveCreateTable

• Cloudera CDH4 +
field that appears.
JobHistory server.
check box.
1600
tHiveCreateTable


connection.
Guide.
not provide.
1601
tHiveCreateTable

distribution.
issues on your own.
Note:

Hortonworks.
you are using.
available:
only.
1602
tHiveCreateTable
component only.

Job designs.
alend.com).
Table Name Name of the table to be created.
Action on table Select the action to be carried out for creating a table.
Format Select the data format to which the table to be created is

dedicated.
The available data formats vary depending on the version of
the Hadoop distribution you are using.
Note that when the file format to be used is PARQUET, you
might be prompted to find the specific PARQUET jar file and
install it into the Studio.
• When the connection mode to Hive is Embedded,
the Job is run in your local machine and calls this jar
installed in the Studio.
• When the connection mode to Hive is Standalone,
the Job is run in the server hosting Hive and this jar
file is sent to the HDFS system of the cluster you are
connecting to. Therefore, ensure that you have properly
defined the NameNode URI in the corresponding field
of the Basic settings view.
This jar file can be downloaded from Apache's site. You can
Inputformat class and Outputformat class These fields appear only when you have selected
INPUTFORMAT and OUTPUTFORMAT from the Format list.
These fields allow you to enter the name of the jar files to
be used for the data formats not available in the Format list.
Storage class Enter the name of the storage handler to be used for
creating a non-native table (Hive table stored and managed
in other systems than Hive, for example, Cassandra or
MongoDB).
This field is available only when you have selected
STORAGE from the Format list.
For further information about a storage handler, see https://
cwiki.apache.org/confluence/display/Hive/StorageHandlers.
Set partitions Select this check box to add partition columns to the table
to be created. Once selecting it, you need to define the
schema of the partition columns you need to add.
1603
tHiveCreateTable
Set file location If you want to create a Hive table in a directory other
than the default one, select this check box and enter the
directory in HDFS you want to use to hold the table content.
This is typical useful when you need to create an external
Hive table by selecting the Create an external table check
box in the Advanced settings tab.
Use S3 endpoint The Use S3 endpoint check box is displayed when you
have selected the Set file location check box to create an
external Hive table.
Once this Use S3 endpoint check box is selected, you need
to enter the following parameters in the fields that appear:
• S3 bucket: enter the name of the bucket in which you
need to create the table.
• Bucket name: enter the name of the bucket in which
you want to store the dependencies of your Job. This
bucket must already exist on S3.
• Temporary resource folder: enter the directory in
which you want to store the dependencies of your Job.
For example, enter temp_resources to write the
dependencies in the /temp_resources folder in the
bucket.
If this folder already exists at runtime, its contents are
overwritten by the upcoming dependencies; otherwise,
this folder is automatically created.
• Access key and Secret key: enter the authentication
information required to connect to the Amazon S3
bucket to be used.
To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog box
enter the password between double quotes and click
OK to save the settings.
Note that the format of the S3 file is S3N (S3 Native
Filesystem).
Since a Hive table created in S3 is actually an external
table, this Use S3 endpoint check box must be used with
the Create an external table case being selected.
Advanced settings
Like table Select this check box and enter the name of the Hive table
you want to copy. This allows you to copy the definition of
an existing table without copying its data.
For further information about the Like parameter, see
Apache's information about Hive's Data Definition
Language.
Create an external table Select this check box to make the table to be created an
external Hive table. This kind of Hive table leaves the raw
data where it is if the data is in HDFS.
An external table is usually the better choice for accessing
shared data existing in a file system.
For further information about an external Hive table, see
Apache's documentation about Hive.
1604
tHiveCreateTable
Table comment Enter the description you want to use for the table to be cre
ated.
As select Select this check box and enter the As select state
ment for creating a Hive table that is based on a Select
statement.
Set clustered_by or skewed_by statement Enter the Clustered by statement to cluster the data of
a table or a partition into buckets, or/and enter the Skewed
by statement to allow Hive to extract the heavily skewed
data and put it into separate files. This is typically used for
obtaining better performance during queries.
SerDe properties If you are using the SerDe row format, you can add any
custom SerDe properties to override the default ones used
by the Hadoop engine of the Studio.
Table properties Add any custom Hive table properties you want to override
the default ones used by the Hadoop engine of the Studio.
1605
tHiveCreateTable

system.
computations.
The memory parameters to be set are Map (in Mb), Reduce
(in Mb) and ApplicationMaster (in Mb). These fields allow
you to dynamically allocate memory to the map and the
reduce computations and the ApplicationMaster of YARN.
using in that host.
level.
Global Variables
check box.
use from it.
User Guide.
Usage
Usage rule This component works standalone.
1606
tHiveCreateTable

Row format Set Delimited row format
Set SerDe row format
Die on error
unusable.
Guide.

stored in MapR.
1607
tHiveCreateTable

Related scenario
For a related scenario, see Creating a partitioned Hive table on page 1582.
1608
tHiveInput
tHiveInput
Extracts data from Hive and sends the data to the component that follows.
tHiveInput is the dedicated component to the Hive database (the Hive data warehouse system). It can
execute a given HiveQL query in order to extract the data from Hive.
When ACID is enabled on the Hive side, a Spark Job cannot delete or update a table and unless data is
compacted, this Job cannot correctly read aggregated data from a Hive table, either. This is a known
limitation described in the Spark bug tracking system: https://issues.apache.org/jira/browse/SPAR
K-15348.
tHiveInput Standard properties

These properties are used to configure tHiveInput running in the Standard Job framework.
The Standard tHiveInput component belongs to the Big Data and the Databases families.
Fabric.
Basic settings
Qubole.
1609
tHiveInput
services.
to be used.
Access Key and Secret Key Enter the authentication information obtained from
Google for tHiveInput to read temporary data from
Google Storage.
tab view under the Google Cloud Storage tab of the
project from the Google APIs Console.
secret key field, and then in the pop-up dialog box enter
save the settings.

1610
tHiveInput

this configuration.
+HiveServer2.
Note:

1611
tHiveInput

save the settings.
Hive Metastore.
1612
tHiveInput

connection.
• Cloudera CDH4 +
field that appears.
JobHistory server.
1613
tHiveInput

check box.

are stored.
connection.
Guide.
1614
tHiveInput

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
you are using.
1615
tHiveInput

available:
only.

Guide.

Studio User Guide.


Guess schema Click this button to retrieve the schema from the table.
This query uses Parquet objects When available, select this check box to indicate that the
table to be handled uses the PARQUET format and thus
make the component to call the required jar file.
1616
tHiveInput

definition.
For further information about the Hive query language, see
https://cwiki.apache.org/confluence/display/Hive/Languag
eManual.
Note: Compressed data in the form of Gzip or Bzip2 can

be processed through the query statements. For details,
see https://cwiki.apache.org/confluence/display/Hive/
CompressedStorage.
reduce the space needed for storing files and speed up
data transfer. When reading a compressed file, the Studio
needs to uncompress it before being able to feed it to
the input flow.
working with is:
by Talend .
this component.
Advanced settings

1617
tHiveInput

columns.
Note:
Clear the Trim all the String/Char columns check box to
enable Trim column in this field.
1618
tHiveInput
system.
computations.
using in that host.
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component offers the benefit of flexible DB queries

and covers all possible Hive QL queries.
HBase Configuration Store by HBase
1619
tHiveInput
Note:
Available only when the Use an existing connection
check box is clear
Zookeeper quorum
Zookeeper client port
Define the jars to register for HBase
Register jar for HBase
unusable.
Guide.

1620
tHiveInput

stored in MapR.
Related scenarios
For a scenario about how an input component is used in a Job, see Writing columns from a MySQL
database to an output file using tMysqlInput on page 2440.
You need to keep in mind the parameters required by Hadoop, such as NameNode and Jobtracker,
when configuring this component since the component needs to connect to a Hadoop distribution.
1621
tHiveLoad
tHiveLoad
Writes data of different formats into a given Hive table or to export data from a Hive table to a
directory.
tHiveLoad connects to a given Hive database and copies or moves data into an existing Hive table or
a directory you specify.
The tHiveLoad component first prepares the lines to be written to Hive before eventually writing
them to Hive. This approach is more efficient with regard to Hive than the line-bye-line approach
typically employed by an output component. For this reason, tHiveOutput does not exist in a Job
designed in the Standard framework.
tHiveLoad Standard properties

These properties are used to configure tHiveLoad running in the Standard Job framework.
The Standard tHiveLoad component belongs to the Big Data and the Databases families.
Fabric.
Basic settings
Qubole.
1622
tHiveLoad
services.
to be used.

this configuration.
1623
tHiveLoad

+HiveServer2.
Note:

save the settings.
1624
tHiveLoad

Hive Metastore.
connection.
1625
tHiveLoad

• Cloudera CDH4 +
field that appears.
JobHistory server.
check box.
1626
tHiveLoad


are stored.
connection.
Guide.
1627
tHiveLoad

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
you are using.
Load action Select the action you need to carry for writing data into the
specified destination.
• When you select LOAD, you are moving or copying data
from a directory you specify.
• When you select INSERT, you are moving or copying
data based on queries.
the framework you need to use to perform the INSERT
action.
working with is:
by Talend .
1628
tHiveLoad
this component.
Target type This drop-down list appears only when you have selected
INSERT from the Load action list.
Select from this list the type of the location you need to
write data in.
• If you select Table as destination, you can still choose
to append data to or overwrite the contents in the
specified table.
• If you select Directory as destination, you are
overwriting the contents in the specified directory
Table name Enter the name of the Hive table you need to write data in.
Note that with the INSERT action, this field is available only
when you have selected Table from the Target type list.
File path Enter the directory you need to read data from or write data
in, depending on the action you have selected from the
Load action list.
• If you have selected LOAD: this is the path to the data
you want to copy or move into the specified Hive table.
• If you have selected INSERT: this is the directory to
which you want to export data from a Hive table. With
this action, the File path field is available only when
you have selected Directory from the Target type list.
The target table uses the Parquet format If the table in which you need to write data is a PARQUET
table, select this check box.
Then from the Compression list that appears, select the
compression mode you need to use to handle the PARQUET
file. The default mode is Uncompressed.
Action on file Select the action to be carried out for writing data.
1629
tHiveLoad
This list is available only when the target is a Hive

table; if the target is a directory, the action to be used is
automatically OVERWRITE.
Query This field appears when you have selected INSERT from the
Load action list.
Enter the appropriate query for selecting the data to be
exported to the specified Hive table or directory.
Local Select this check box to use the Hive LOCAL statement for
accessing a local directory. Note that this local directory is
actually in the machine in which the Job is run. Therefore,
when the connection mode to Hive is Standalone, the Job is
run in the machine where the Hive application is installed
and thus this local directory is in that machine.
This statement is used along with the directory you have
defined in the File path field. Therefore, this Local check
box is available only when the File path field is available.
• If you are using the LOAD action, tHiveLoad copies the
local data to the target table.
• If you are using the INSERT action, tHiveLoad copies
data to a local directory.
• If you leave this Local check box clear, the directory
defined in the File path field is assumed to be in the
HDFS system to be used and data will be moved to the
target location.
For further information about this LOCAL statement, see
Apache's documentation about Hive's Language.
Set partitions Select this check box to use the Hive Partition clause in
loading or inserting data in a Hive table. You need to enter
the partition keys and their values to be used in the field
that appears.
For example, enter contry='US', state='CA'. This makes a
partition clause reading Partition (contry='US',
state='CA'), that is to say, a US and CA partition.
Also, it is recommended to select the Create partition if not
exist check box that appears to ensure that you will not
create a duplicate partition.
Advanced settings

1630
tHiveLoad
system.
1631
tHiveLoad

computations.
using in that host.
level.
Global Variables
check box.
use from it.
User Guide.
Usage
Usage rule This component works standalone and supports writing a

wide range of data formats such as RC, ORC or AVRO.
1632
tHiveLoad

unusable.
Guide.

stored in MapR.
Related scenario
For a related scenario, see Creating a partitioned Hive table on page 1582
1633
tHiveRow
tHiveRow
Acts on the actual DB structure or on the data without handling data itself, depending on the nature
of the query and the database.
tHiveRow executes the HiveQL query stated in the specified database. The row suffix means the
component implements a flow in the Job design although it does not provide output.
The SQLBuilder tool helps you write your HiveQL statements easily.
This component can also perform queries in a HBase database once the Store by HBase check box is
available and you have selected this check box.
tHiveRow Standard properties

These properties are used to configure tHiveRow running in the Standard Job framework.
The Standard tHiveRow component belongs to the Big Data and the Databases families.
Basic settings
Qubole.
1634
tHiveRow
services.
to be used.

this configuration.
1635
tHiveRow

+HiveServer2.
Note:

save the settings.
1636
tHiveRow

Hive Metastore.
connection.
1637
tHiveRow

• Cloudera CDH4 +
field that appears.
JobHistory server.
check box.
1638
tHiveRow


are stored.
connection.
Guide.
1639
tHiveRow

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
you are using.
working with is:
by Talend .
this component.
1640
tHiveRow

available:
only.

Guide.

Studio User Guide.


This query uses Parquet objects When available, select this check box to indicate that the
table to be handled uses the PARQUET format and thus
make the component to call the required jar file.
1641
tHiveRow

definition.
For further information about the Hive query language, see
https://cwiki.apache.org/confluence/display/Hive/Languag
eManual.
Note: Compressed data in the form of Gzip or Bzip2 can

be processed through the query statements. For details,
see https://cwiki.apache.org/confluence/display/Hive/
CompressedStorage.
reduce the space needed for storing files and speed up
data transfer. When reading a compressed file, the Studio
needs to uncompress it before being able to feed it to
the input flow.
Row > Rejects link.
integration.
this property.
those jar file(s).
1642
tHiveRow
Advanced settings

use column list.
Note:
1643
tHiveRow

system.
computations.
using in that host.
level.
Global Variables
check box.
use from it.
1644
tHiveRow

User Guide.
Usage

and covers all possible Hive QL queries.
tHiveRow can capture the Application_ID values and write
them in the Job logs once you have activated Log4j and
set the Log4j output level to Info for your Job involving
tHiveRow.
• For further information about how to define the Log4j
output level at an individual Job level, search for
customizing log4j output level at runtime on Talend
Help Center (https://help.talend.com).
• For further information about how to configure Log4j
at the Studio level so as to apply the configuration to
all Jobs, search for configuring Log4j on Talend Help
unusable.
Guide.

1645
tHiveRow
stored in MapR.
Connecting to a security-enabled MapR

When designing a Job, set up the authentication configuration in the component you are using
depending on how your MapR cluster is secured.
MapR supports the two following methods of authenticating a user and generating a MapR security
ticket for this user: a username/password pair and Kerberos.
For further information about the MapR security mechanism, see MapR security architecture.
For a scenario about how to secure a MapR cluster, see Getting started with MapR security.
The different security scenarios you may face with your MapR cluster:
• When your MapR cluster is secured with Kerberos only, you only need to set up the typical
Hadoop Kerberos configuration for your Job in the Studio.
• When your MapR cluster is secured with both the Kerberos mechanism and the MapR ticket
security mechanism, you need to accordingly set up the configuration for both of them in your Job
in the Studio.
For details about how to configure the MapR ticket security mechanism in the Studio, see Setting
up the MapR ticket authentication on page 1646.
• When your MapR cluster is secured with the MapR ticket security mechanism only, proceed
as explained in Setting up the MapR ticket authentication on page 1646 to set up the MapR
authentication configuration for your Job in the Studio.
For an example of how to configure Kerberos authentication for a Talend Job, see How to use
Kerberos in Talend Studio with Big Data.
Although this example uses Cloudera for demonstration, the operations it describes are generic and
thus applicable to MapR as well.
Setting up the MapR ticket authentication

Before you begin
• The MapR distribution you are using is from version 4.0.1 onwards and you have selected it as the
cluster to connect to in the component to be configured.
• The MapR cluster has been properly installed and is running.
1646
tHiveRow
• Ensure that you have installed the MapR client in the machine where the Studio is, and added the
MapR client library to the PATH variable of that machine. According to MapR's documentation,
the library or libraries of a MapR client corresponding to each OS version can be found under
MAPR_INSTALL\ hadoop\hadoop-VERSION\lib\native. For example, the library for Windows is
\lib\native\MapRClient.dll in the MapR client jar file. For further information, see the following
link from MapR: http://www.mapr.com/blog/basic-notes-on-configuring-eclipse-as-a-hadoop-
development-environment-for-mapr.
Without adding the specified library or libraries, you may encounter the following error: no
MapRClient in java.library.path.
• This section explains only the authentication parameters to be used to connect to MapR. You still
need to define the other parameters required by your Job.
For further information, see the documentation about each component you are using.
About this task

In a Standard Job, you need to set up this configuration in the Basic settings tab of a Hadoop-related
component to be used by your Job.
In the tab, you need to proceed as follows:
Procedure
1. Select the Force MapR ticket authentication check box to display the related parameters to be
defined.
2. In the Username field, enter the username to be authenticated and in the Password field, specify
the password used by this user.
To enter the password, click the [...] button next to the password field, and then in the pop-up
dialog box enter the password between double quotes and click OK to save the settings.
A MapR security ticket is generated for this user by MapR and stored in the machine where the Job
you are configuring is executed.
3. If the Group field is available in this tab, you need to enter the name of the group to which the
user to be authenticated belongs.
4. In the Cluster name field, enter the name of the MapR cluster you want to use this username to
connect to.
This cluster name can be found in the mapr-clusters.conf file located in /opt/mapr/conf of the
cluster.
5. In the Ticket duration field, enter the length of time (in seconds) during which the ticket is valid.
Setting the environment variable for a custom MapR ticket location (optional)
If the default MapR ticket location, /tmp/maprticket_<uid>, has been changed, set
MAPR_TICKETFILE_LOCATION environment variable accordingly in the machine in which your Job is
executed.
As MapR does not provide any API to specify a MapR ticket, setting the environment variable is the
only way to use a custom MapR ticket location in your Job. For further information about this issue,
see this post from the MapR forum.
This procedure is necessary only when you are storing the MapR tickets in a custom location. If you
use the default MapR ticket location, skip this procedure.
1647
tHiveRow
Setting the environment variable for a custom MapR ticket location on Mac (optional)
About this task

This procedure is relevant only when you are storing the MapR tickets in a custom location and you
are using Mac to run your Studio.
Procedure
1. In the machine in which your Job is executed, add these lines to ~/.bashrc:
Example
export MAPR_TICKETFILE_LOCATION=/Users/$USER/maprticket_$UID
launchctl setenv MAPR_TICKETFILE_LOCATION /Users/$USER/maprticket_$UID
2. Shutdown your Studio if it is open and each and every time you boot your Mac workstation, open a
terminal session before starting the Studio.
Setting the environment variable for a custom MapR ticket location on other operating systems
(optional)
About this task

This procedure is relevant only when you are storing the MapR tickets in a custom location and you
are not using Mac to run your Studio. If you use the default MapR ticket location, skip this procedure.
Procedure
1. In the machine in which your Job is executed, run the following command in a commandline
terminal to set the MAPR_TICKETFILE_LOCATION variable in memory.
Example
set MAPR_TICKETFILE_LOCATION=<your_custom_location>
2. Shutdown your Studio if it is open and use the same terminal to restart your Studio.
If you use a Talend JobServer to run your Job, use the same terminal to restart this JobServer.
This way, your Job retrieves this custom location from memory.
Using a custom MapR security configuration in the mapr.login.conf file (optional)
If the default security configuration of your MapR cluster has been changed, you need to configure the
Job to be executed to take this custom security configuration into account.
MapR specifies its security configuration in the mapr.login.conf file located in /opt/mapr/conf of the
cluster. For further information about this configuration file and the Java service it uses behind, see
mapr.login.conf and JAAS.
If no change has been made in the mapr.login.conf file, skip this procedure.
About this task

To configure your Job, you need to define the related parameters in the Basic settings tab and the
Advanced settings tab of the Component view of the component you want your Job to use to connect
to MapR.
Proceed as follows to do the configuration:
1648
tHiveRow
Procedure
1. Verify what has been changed about this mapr.login.conf file.
You should be able to obtain the related information from the administrator or the developer of
your MapR cluster.
2. If the location of the MapR configuration files has been changed to somewhere else in the
cluster, that is to say, the MapR Home directory has been changed, select the Set the MapR Home
directory check box and enter the new Home directory. Otherwise, leave this check box clear and
the default Home directory is used.
3. If the login module to be used in the mapr.login.conf file has been changed, select the Specify the
Hadoop login configuration check box and enter the module to be called from the mapr.login.conf
file. Otherwise, leave this check box clear and the default login module is used.
For example, enter kerberos to call the hadoop_kerberos module or hybrid to call the hadoop_hybrid
module.
Related scenarios
You need to keep in mind the parameters required by Hadoop, such as NameNode and Jobtracker,
when configuring this component since the component needs to connect to a Hadoop distribution.
1649
tHSQLDbInput
tHSQLDbInput
Executes a DB query with a strictly defined order which must correspond to the schema definition and
then it passes on the field list to the next component via a Main row link.
tHSQLDbInput reads a database and extracts fields based on a query.
tHSQLDbInput Standard properties

These properties are used to configure tHSQLDbInput running in the Standard Job framework.
The Standard tHSQLDbInput component belongs to the Databases family.
Basic settings


Running Mode Select on the list the Server Mode corresponding to your DB
setup among the four propositions :
HSQLDb Server, HSQLDb WebServer, HSQLDb In Process
Persistent, HSQLDb In Memory.
Use TLS/SSL sockets Select this check box to enable the secured mode if req
uired.
Database Alias Alias name of the database.

settings.
DB path Specify the directory to the database you want to connect

to. This field is available only to the HSQLDb In Process
Persistent running mode.
1650
tHSQLDbInput
Note:
By default, if the database you specify in this field does
not exist, it will be created automatically. If you want
to change this default setting, modify the connection
parameter set in the Additional JDBC parameter field in
the Advanced settings view
Db name Enter the database name that you want to connect to. This
field is available only to the HSQLDb In Process Persistent
running mode and the HSQLDb In Memory running mode.

Guide.

Studio User Guide.

available:
only.
definition.
Advanced settings

connection you are creating. When the running mode is
HSQLDb In Process Persistent , you can set the connection
property ifexists=true to allow connection to an existing
database only and avoid creating a new database.

columns.
1651
tHSQLDbInput
level.
Global Variables
Global Variables NB_LINE: Indicates the number of lines processed. This is an

QUERY: Indicates the query to be processed. This is a Flow
User Guide.
Note:
A Flow variable means it functions during the execution
of a component while an After variable means it
functions after the execution of a component.
Usage
HSQLDb databases.

Row: Main; Iterate
Trigger: Run if; On Component Ok; On Component Error; On
Subjob Ok; On Subjob Error.

Row: Iterate;

Studio User Guide.

Related scenarios
1652
tHSQLDbOutput
tHSQLDbOutput
tHSQLDbOutput writes, updates, makes changes or suppresses entries in a database.
tHSQLDbOutput Standard properties

These properties are used to configure tHSQLDbOutput running in the Standard Job framework.
The Standard tHSQLDbOutput component belongs to the Databases family.
Basic settings


setupamong the four propositions :
uired.

settings.

1653
tHSQLDbOutput
Note:
Db name Enter the database name that you want to connect to. This
operations:
again.
exist.
Job stops.
be inserted.
Warning:
Delete operation.
1654
tHSQLDbOutput

available:
only.
component only.

Job designs.
alend.com).
Row > Rejects link.
Advanced settings

Note:
global variables.

1655
tHSQLDbOutput





column.
level.
Global Variables

use from it.
User Guide.
Usage
1656
tHSQLDbOutput
a table in a MySQL database. It also allows you to create a


Row: Main; Reject

Row: Main;

Studio User Guide.

Related scenarios
1657
tHSQLDbRow
tHSQLDbRow
Acts on the actual DB structure or on the data (although without handling data), depending on the
nature of the query and the database.
The SQLBuilder tool helps you write easily your SQL statements.
tHSQLDbRow is the specific component for this database query. It executes the SQL query stated onto
the specified database. The row suffix means the component implements a flow in the job design
although it doesn't provide output.
tHSQLDbRow Standard properties

These properties are used to configure tHSQLDbRow running in the Standard Job framework.
The Standard tHSQLDbRow component belongs to the Databases family.
Basic settings

setup among the four propositions :
uired.
Database Alias Name of the database

settings.

1658
tHSQLDbRow
Note:
Database Enter the database name that you want to connect to. This

Guide.

Studio User Guide.

available:
only.



definition.
Row > Rejects link.
1659
tHSQLDbRow
Advanced settings

use column list.

level.
Global Variables
Global Variables QUERY: Indicates the query to be processed. This is a Flow

User Guide.
Note:
A Flow variable means it functions during the execution
of a component while an After variable means it
functions after the execution of a component.
Usage

Row: Main; Reject; Iterate

Row: Main; Iterate

Studio User Guide.

1660
tHSQLDbRow

Related scenarios
1661
tHttpRequest
tHttpRequest
Sends an HTTP request to the server and outputs the response information locally.
tHttpRequest sends an HTTP request to the server end and gets the corresponding response
information from the server end.
tHttpRequest Standard properties

These properties are used to configure tHttpRequest running in the Standard Job framework.
The Standard tHttpRequest component belongs to the Internet family.
Basic settings
available:
only.
Guide.

projects and Job designs. Related topic: see Talend Studio
User Guide.
Sync columns Click this button to retrieve the schema from the preceding
component.
URI Type in the Uniform Resource Identifier (URI) that identifies

the data resource on the server. A URI is similar to a URL,
but more general.
Method Select an HTTP method to define the action to be

performed:
Post: Sends data (for example HTML form data) to the server
end.
1662
tHttpRequest
Get: Retrieves data from the server end.
Post parameters from file Browse to, or enter the path to the file that is used to
provide parameters (request body) to the POST method.
Write response content to file Select this check box to save the HTTP response to a local
file. You can either type in the file path in the input field or
click the three-dot button to browse to the file path.
Create directory if not exists Select this check box to create the directory defined in the
Write response content to file field if it does not exist.
This check box appears only when the Write response
content to file check box is selected and is cleared by
default.
Headers Type in the name-value pair(s) for HTTP headers to define

the parameters of the requested HTTP operation.
Key: Fill in the name of the header field of an HTTP header.
Value: Fill in the content of the header field of an HTTP
header.
For more information about definition of HTTP headers,
please refer to:
en.wikipedia.org/wiki/List_of_HTTP_headers.
Need authentication Select this check box to fill in a user name and a password
in the corresponding fields if authentication is needed:
user: Fill in the user name for the authentication.
password: Fill in the password for the authentication.
settings.
an error occurs.
Advanced settings
Set timeout Select this check box to specify the connect and read
timeout values in the following two fields:
• Connect timeout(s): Enter the connect timeout value in
seconds. An exception will occur if the timeout expires
before the connection can be established. The value of
0 indicates an infinite time out. By default, the connect
timeout value is 30.
• Read timeout(s): Enter the read timeout value in
seconds. An exception will occur if the timeout expires
before there is data available for read. By default, the
read timeout value is 0, which indicates an infinite
time out.
at a Job level and at each component level.
1663
tHttpRequest
Global Variables

check box.
CONNECTED: the result of whether a connection to the
server established. This is an After variable and it returns a
boolean.
RESPONSE_CODE: the response code returned by the remote
HTTP server. This is an After variable and it returns an
integer.
use from it.
User Guide.
Usage
Usage rule This component can be used in sending HTTP requests to

server and saving the response information. This component
can be used as a standalone component.
Sending a HTTP request to the server and saving the

response information to a local file
This scenario describes a two-component Job that uses the GET method to retrieve information from
the server end and writes the response to a local file as well as to the console.

Procedure
1. In the Integration perspective of the Studio, create a Job from the Job Designs node in the
Repository tree view.
2. Drop the following components from the Palette onto the design workspace: tHttpRequest and
tLogRow.
3. Connect the tHttpRequest component to the tLogRow component using a Row > Main connection.
1664
tHttpRequest
Configuring the GET request

Procedure
1. Double-click the tHttpRequest component to open its Basic settings view and define the
component properties.
2. Fill in the URI field with "http://192.168.0.63:8081/testHttpRequest/build.xml". Note that this URI is
for demonstration purposes only and it is not a live address.
3. From the Method list, select GET.
4. Select the Write response content to file check box and fill in the input field on the right with the
file path by manual entry, D:/test.txt for this use case.
5. Select the Need authentication check box and fill in the user and password, both tomcat in this
use case.
Executing the Job

About this task
Then you can run this Job.
The tLogRow component is used to present the execution result of the Job.
Procedure
1. If you want to configure how the result is presented by tLogRow, double-click the component to
open its Component view and in the Mode area, select the Table (print values in cells of a table)
check box.
Results
Once done, the response information from the server is saved and displayed.
1665
tHttpRequest
Sending a POST request from a local JSON file

In this scenario, a four-component Job is used to read parameters from a given JSON file and send it in
a POST request to a web site.
The JSON file to be used reads as follows:
{"echo":
[
{
"data":"e=hello"
}
]
}
From that file, tFileInputJSON reads the e parameter and its value hello and tHttpRequest sends
the pair to http://echo.itcuties.com/, an URL provided for demonstration by an online programming
community, www.itcuties.com.
Note that the e parameter is required by http://echo.itcuties.com/.

Procedure
1. In the Integration perspective of the Studio, create an empty Job, named httpRequestPostDemo
for example, from the Job Designs node in the Repository tree view.
2. Drop tFileInputJSON, tFileOutputDelimited, tHttpRequest and tLogRow onto the workspace.
3. Connect tFileInputJSON to tHttpRequest using the Trigger > On Subjob Ok link.
4. Connect the other components using the Row > Main link.
1666
tHttpRequest
Reading the JSON file

Procedure
1. Double-click tFileInputJSON to open its Component view.
2. Select JsonPath without loop from the Read By drop-down list.

4. Click the [+] button to add one row and name it, for example, to data.
box.
6. In the Filename field, browse, or enter the path to the source JSON file in which the parameter to
be sent is stored.
7. In the Mapping table, the data column you defined in the previous step in the component schema
has been automatically added. In the JSONPath query column of this table, enter the JSON path,
in double quotation marks, to extract the parameter to be sent. In this scenario, the path is
echo[0].data.
Writing the parameter to a flat file

Procedure
1. Double-click tFileOutputDelimited to open its Component view.
1667
tHttpRequest
2. In the File name field, browse, or enter the path to the flat file in which you want to write the
extracted parameter. This file will be created if it does not exist. In this example, it is C:/tmp/
postParamsFile.txt.
Posting the parameter

Procedure
1. Double-click tHttpRequest to open its Component view.
2. In the URI field, enter the server address to which the parameter is to be sent. In this scenario, it is
http://echo.itcuties.com/.
3. From the Method list, select POST.
4. In the Post parameters from file field, browse, or enter the path to the flat file that contains the
parameter to be used. As defined earlier with the tFileOutputDelimited component, this path is C:/
tmp/postParamsFile.txt.
Executing the Job

Press F6 to run this Job.
The tLogRow component is used to present the execution result of the Job.
1668
tHttpRequest
You can read that the site receiving the parameter returns answers.
1669
tImpalaClose
tImpalaClose
Closes connection to an Impala database.
tImpalaClose closes an active connection to a given Impala database.
tImpalaClose Standard properties

These properties are used to configure tImpalaClose running in the Standard Job framework.
The Standard tImpalaClose component belongs to the Big Data family.
Fabric.
Basic settings
tImpalaConnection from the list.
Advanced settings
component level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is to be used along with the other Impala
components, especially with tImpalaConnection as
tImpalaConnection allows you to open a connection for the
transaction which is underway.
1670
tImpalaClose

Guide.

stored in MapR.
Related scenarios
1671
tImpalaConnection
tImpalaConnection
Establishes an Impala connection to be reused by other Impala components in your Job.
tImpalaConnection opens a connection to an Impala database.
tImpalaConnection Standard properties

These properties are used to configure tImpalaConnection running in the Standard Job framework.
The Standard tImpalaConnection component belongs to the Big Data family.
Fabric.
Basic settings

not provide.
1672
tImpalaConnection

distribution.
issues on your own.
Note:

Hortonworks.
Impala version Select the version of the Hadoop distribution you are using.
you are using.
Port DB server listening port.
Username DB user authentication data.
Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
on the fly.
1673
tImpalaConnection
Advanced settings
component level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is generally used with other Impala

components, particularly tImpalaClose.

stored in MapR.
1674
tImpalaConnection
Related scenario
This component is used in the similar way as a tHiveConnection component is. For further
information, see Creating a partitioned Hive table on page 1582.
1675
tImpalaCreateTable
tImpalaCreateTable
Creates Impala tables that fit a wide range of Impala data formats.
tImpalaCreateTable connects to the Impala database to be used and creates an Impala table that is
dedicated to data of the format you specify.
tImpalaCreateTable Standard properties

These properties are used to configure tImpalaCreateTable running in the Standard Job framework.
The Standard tImpalaCreateTable component belongs to the Big Data family.
Fabric.
Basic settings

connection.
Guide.
1676
tImpalaCreateTable

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
you are using.
1677
tImpalaCreateTable
on the fly.
available:
only.
component only.

Job designs.
alend.com).
Table Name Name of the table to be created.
Action on table Select the action to be carried out for creating a table.
1678
tImpalaCreateTable
Format Select the data format to which the table to be created is

dedicated.
The available data formats vary depending on the version of
the Hadoop distribution you are using.
Set partitions Select this check box to add partition columns to the table
to be created. Once selecting it, you need to define the
schema of the partition columns you need to add.
Set file location If you want to create an Impala table in a directory other
than the default one, select this check box and enter the
directory in HDFS you want to use to hold the table content.
This is typical useful when you need to create an external
Impala table by selecting the Create an external table
check box in the Advanced settings tab.
Use S3 endpoint The Use S3 endpoint check box is displayed when you
have selected the Set file location check box to create an
external Impala table.
Once this Use S3 endpoint check box is selected, you need
to enter the following parameters in the fields that appear:
• S3 bucket: enter the name of the bucket in which you
need to create the table.
• Bucket name: enter the name of the bucket in which
you want to store the dependencies of your Job. This
bucket must already exist on S3.
• Temporary resource folder: enter the directory in
which you want to store the dependencies of your Job.
For example, enter temp_resources to write the
dependencies in the /temp_resources folder in the
bucket.
If this folder already exists at runtime, its contents are
overwritten by the upcoming dependencies; otherwise,
this folder is automatically created.
• Access key and Secret key: enter the authentication
information required to connect to the Amazon S3
bucket to be used.
To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog box
enter the password between double quotes and click
OK to save the settings.
1679
tImpalaCreateTable
Note that the format of the S3 file is S3N (S3 Native

Filesystem).
Since an Impala table created in S3 is actually an external
table, this Use S3 endpoint check box must be used with the
Create an external table case being selected.
Advanced settings
Like table Select this check box and enter the name of the Impala
table you want to copy. This allows you to copy the
definition of an existing table without copying its data.
For further information about the Like parameter, see
Cloudera's information about Impala's Data Definition
Language.
Create an external table Select this check box to make the table to be created an
external Impala table. This kind of Impala table leaves the
raw data where it is if the data is in HDFS.
An external table is usually the better choice for accessing
shared data existing in a file system.
For further information about an external Impala table, see
Cloudera's documentation about Impala.
Table comment Enter the description you want to use for the table to be cre
ated.
As select Select this check box and enter the As select statement
for creating an Impala table that is based on a Select
statement.
Table properties Add any custom Impala table properties you want to
override the default ones used by the Hadoop engine of the
Studio.
level.
Global Variables

check box.
use from it.
User Guide.
1680
tImpalaCreateTable
Usage
Row format Set Delimited row format
Die on error
unusable.
Guide.

stored in MapR.
1681
tImpalaCreateTable
Related scenario
This component is used in the similar way as a tHiveCreateTable component is. For further
information, see Creating a partitioned Hive table on page 1582.
1682
tImpalaInput
tImpalaInput
Executes the select queries to extract the corresponding data and sends the data to the component
that follows.
tImpalaInput is the dedicated component to the Impala database (the Impala data warehouse system).
It executes the given Impala SQL query in order to extract the data of interest from Impala. It provides
the SQLBuilder tool to help you write your Impala SQL statements easily.
tImpalaInput Standard properties

These properties are used to configure tImpalaInput running in the Standard Job framework.
The Standard tImpalaInput component belongs to the Big Data family.
Fabric.
Basic settings
Repository : Select the repository file in which the

connection.
Guide.
1683
tImpalaInput

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
you are using.
1684
tImpalaInput
on the fly.
available:
only.

Guide.

Studio User Guide.

1685
tImpalaInput

Guess schema Click this button to retrieve the schema from the table.

definition.
Advanced settings

columns.
Note:
Clear the Trim all the String/Char columns check box to
enable Trim column in this field.
level.
Global Variables

check box.
use from it.
User Guide.
Usage

and covers all possible Impala SQL queries.
1686
tImpalaInput

unusable.
Guide.

stored in MapR.
Related scenarios
For a scenario about how an input component is used in a Job, see Writing columns from a MySQL
database to an output file using tMysqlInput on page 2440.
1687
tImpalaLoad
tImpalaLoad
Writes data of different formats into a given Impala table or to export data from an Impala table to a
directory.
tImpalaLoad connects to a given Impala database and copies or moves data into an existing Impala
table or a directory you specify.
tImpalaLoad Standard properties

These properties are used to configure tImpalaLoad running in the Standard Job framework.
The Standard tImpalaLoad component belongs to the Big Data family.
Fabric.
Basic settings

connection.
Guide.
1688
tImpalaLoad

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
you are using.
1689
tImpalaLoad
on the fly.
Load action Select the action you need to carry for writing data into the
specified destination.
• When you select LOAD, you are moving or copying data
from a directory you specify.
• When you select INSERT, you are moving or copying
data based on queries.
Target type This drop-down list appears only when you have selected
INSERT from the Load action list.
Select from this list the type of the location you need to
write data in.
• If you select Table as destination, you can still choose
to append data to or overwrite the contents in the
specified table. This is the only option in the current
release.
Action Select whether you want to OVERWRITE the old data

already existing in the destination or only APPEND the new
data to the existing one.
Table name Enter the name of the Hive table you need to write data in.
Note that with the INSERT action, this field is available only
when you have selected Table from the Target type list.
File path Enter the directory you need to read data from.
Query This field appears when you have selected INSERT from the
Load action list.
Enter the appropriate query for selecting the data to be
exported to the specified Impala table or directory.
Set partitions Select this check box to use the Impala Partition clause
in loading or inserting data in a Impala table. You need to
enter the partition keys and their values to be used in the
field that appears.
1690
tImpalaLoad
For example, enter contry='US', state='CA'. This makes a

partition clause reading Partition (contry='US',
state='CA'), that is to say, a US and CA partition.
Also, it is recommended to select the Create partition if
not exist check box that appears to ensure that you will not
create a duplicate partition.
Advanced settings
level.
Global Variables
check box.
use from it.
User Guide.
Usage
unusable.
1691
tImpalaLoad

Guide.

stored in MapR.
Related scenario
This component is used in the similar way as a tHiveLoad component is. For further information, see
Creating a partitioned Hive table on page 1582.
1692
tImpalaOutput
tImpalaOutput
Executes the action defined on the data contained in the table, based on the flow incoming from the
preceding component in the Job.
tImpalaOutput connects to an Impala database (the Impala data warehouse system) and writes data in
an Impala table.
tImpalaOutput Standard properties

These properties are used to configure tImpalaOutput running in the Standard Job framework.
The Standard tImpalaOutput component belongs to the Big Data family.
Fabric.
Basic settings

connection.
Guide.
1693
tImpalaOutput

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
you are using.
1694
tImpalaOutput
on the fly.
available:
only.

Guide.

Studio User Guide.
Table Name Name of the table you need to write data in.
Action Select whether you want to OVERWRITE the old data

already existing in the destination or only APPEND the new
data to the existing one.
1695
tImpalaOutput
Extended insert Select this check box to combine multiple rows of data
into one single INSERT action. This can speed up the insert
operation.
Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
Usage

unusable.
Guide.

1696
tImpalaOutput

stored in MapR.
Related scenarios
For a scenario about how an output component is used in a Job, see Inserting a column and altering
data using tMysqlOutput on page 2466.
1697
tImpalaRow
tImpalaRow
Acts on the actual DB structure or on the data (although without handling data).
The SQLBuilder tool helps you write your Impala SQL statements easily. tImpalaRow is the dedicated
component for this database. It executes the Impala SQL query stated in the specified database. The
Row suffix means the component implements a flow in the Job design although it does not provide
output.
tImpalaRow Standard properties

These properties are used to configure tImpalaRow running in the Standard Job framework.
The Standard tImpalaRow component belongs to the Big Data family.
Fabric.
Basic settings

connection.
Guide.
1698
tImpalaRow

not provide.
distribution.
issues on your own.
Note:

Hortonworks.
you are using.
1699
tImpalaRow
on the fly.
available:
only.

Guide.

Studio User Guide.

1700
tImpalaRow


definition.
Row > Rejects link.
Advanced settings
use column list.
Note:
level.
Global Variables
check box.
use from it.
User Guide.
Usage

1701
tImpalaRow
unusable.
Guide.

stored in MapR.
Related scenarios
1702
tImpalaRow

1703
tInfiniteLoop
tInfiniteLoop
Executes a task or a Job automatically, based on a loop.
tInfiniteLoop runs an infinite loop on a task.
tInfiniteLoop Standard properties

These properties are used to configure tInfiniteLoop running in the Standard Job framework.
The Standard tInfiniteLoop component belongs to the Orchestration family.
Basic settings
Wait at each iteration (in milliseconds) Enter the time delay between iterations.
Advanced settings
level.
Global Variables

check box.
CURRENT_ITERATION: the sequence number of the current
iteration. This is a Flow variable and it returns an integer.
use from it.
User Guide.
Usage
Usage rule tInifniteLoop is an input component and requires an Iterate

link to connect it to the following component.

Row: Iterate
1704
tInfiniteLoop
Row: Iterate;
Component Ok; On Component Error; Synchronize;
Parallelize.

Studio User Guide.
Related scenario
For an example of the kind of scenario in which tInifniteLoop might be used, see Procedure on page
1980, regarding the tLoop component.
1705
tInformixBulkExec
tInformixBulkExec
Executes Insert operations in Informix databases.
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT operation used
to feed a database. These two steps are fused together in the tInformixOutputBulkExec component.
The advantage of using two components is that data can be transformed before it is loaded in the
database.
tInformixBulkExec Standard properties

These properties are used to configure tInformixBulkExec running in the Standard Job framework.
The Standard tInformixBulkExec component belongs to the Databases family.
Basic settings

Execution Platform Select the operating system you are using.
1706
tInformixBulkExec
connection.
Guide.

settings.
Instance Name of the Informix instance to be used. This information

can generally be found in the SQL hosts file.
operations:
again.
exist.
component only.
1707
tInformixBulkExec

Job designs.
alend.com).

available:
only.
Informix Directory Informix installation directory, e.g. " C:\Program Files\IBM

\IBM Informix Dynamic Server\11.50\".
Data file Name of the file to be loaded.
Insert: Add new data to the table. If duplicates are found,
the job stops.
Update: Update the existing table data.
be inserted.
Delete: Delete the entry data which corresponds to the
input flow.
Warning:
You must specify at least one key upon which the Update
and Delete operations are to be based. It is possible to
define the columns which should be used as the key from
the schema, from both the Basic Settings and the Advanced
Settings, to optimise these operations.
Advanced settings

1708
tInformixBulkExec

Field terminated by Character, string or regular expression which separates the

fields.
Set DBMONEY Select this check box to define the decimal separator in the
Decimal separator field.
Set DBDATE Select the date format that you want to apply.
Rows Before Commit Enter the number of rows to be processed before the
commit.
Bad Rows Before Abort Enter the number of rows in error at which point the Job
should stop.
tStat Catcher Statistics Select this check box to colelct the log data at component
level.
Output Where the output should go.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component offers database query flexibility and covers
all possible DB2 queries which may be required.
1709
tInformixBulkExec

unusable.
Guide.
Limitation The database server/client must be installed on the same

machine where the Studio is installed or where the Job
using tInformixBulkExec is deployed, so that the component
functions properly.
This component requires installation of its related jar files.
Related scenario
For a scenario in which tInformixBulkExec might be used, see:
1710
tInformixClose
tInformixClose
Closes connection to Informix databases.
tInformixClose closes an active connection to a database.
tInformixClose Standard properties

These properties are used to configure tInformixClose running in the Standard Job framework.
The Standard tInformixClose component belongs to the Databases family.
Basic settings
tInformixConnection from the list.
Advanced settings
level.
Usage
Usage rule This component is generally used as an input component. It

Guide.
1711
tInformixClose
Related scenario
This component is for use with tInformixConnection and tInformixRollback. They are generally used
along with tInformixConnection as the latter allows you to open a connection for the transaction
which is underway.
To see a scenario in which tInformixClose might be used, see tMysqlConnection on page 2425.
1712
tInformixCommit
tInformixCommit
Makes a global commit just once instead of commiting every row or batch of rows separately.
This component improves performance and is closely related to tInformixConnection and
tInformixRollback. They are generally used to execute transactions together.
tInformixCommit validates data processed in a job from a connected database.
tInformixCommit Standard properties

These properties are used to configure tInformixCommit running in the Standard Job framework.
The Standard tInformixCommit component belongs to the Databases family.
Basic settings
Component list If there is more than one connection in the Job, select
tInformixConnection from the list.
Close connection This check box is selected by default. It means that the
database conenction will be closed once the commit has
been made. Clear the check box to continue using the
connection once the component has completed its task.
Warning:
If you are using a Row > Main type connection to link
tInformixCommit to your Job, your data will be committed
row by row. If this is the case, do not select this check bx
otherwise the conenction will be closed before the commit
of your first row is finalized.
Advanced settings
level.
Usage
Usage rule This component is generally used along with Informix

components, particularly tInformixConnection and
tInformixRollback.
1713
tInformixCommit

Guide.
Related Scenario
This component is for use with tInformixConnection and tInformixRollback. They are generally used
along with tInformixConnection as the latter allows you to open a connection for the transaction
which is underway
To see a scenario in which tInformixCommit might be used, see Inserting data in mother/daughter
1714
tInformixConnection
tInformixConnection
subjobs.
tInformixConnection is closely related to tInformixCommit and tInformixRollback. They are generally
used along with tInformixConnection, with tInformixConnection opening the connection for the
transaction.
tInformixConnection Standard properties

These properties are used to configure tInformixConnection running in the Standard Job framework.
The Standard tInformixConnection component belongs to the Databases family.
Basic settings

Schema Name of the schema

settings.


1715
tInformixConnection
Advanced settings
Use Transaction Clear this check box when the database is configured in
NO_LOG. mode. If the check box is selected, you can choose
whether to activate the Auto Commit option.
commit component.
component.
component level.
Usage
Usage rule This component is generally used with other Informix

components, particularly tInformixCommit and
tInformixRollback.
Database Family Databases/Informix
Related scenario
For a scenario in which the tInformixConnection, might be used, see Inserting data in mother/
daughter tables on page 2426.
1716
tInformixInput
tInformixInput
tInformixInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
tInformixInput Standard properties

These properties are used to configure tInformixInput running in the Standard Job framework.
The Standard tInformixInput component belongs to the Databases family.
Basic settings


DB server Name of the database server

settings.
1717
tInformixInput

component only.

Job designs.

available:
only.
definition.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component covers all possible SQL queries for DB2 da
tabases.
1718
tInformixInput

unusable.
Guide.
Related scenarios
See also scenario for tContextLoad: Reading data from different MySQL databases using dynamically
1719
tInformixOutput
tInformixOutput
tInformixOutput writes, updates, makes changes or suppresses entries in a database.
tInformixOutput Standard properties

These properties are used to configure tInformixOutput running in the Standard Job framework.
The Standard tInformixOutput component belongs to the Databases family.
Basic settings


1720
tInformixOutput
connection.
Guide.
DB server Name of the database server

settings.
operations:
again.
exist.
Truncate table: Truncate the table.
Warning:
A commit operation will be carried out after the table is t
runcated.
Job stops.
1721
tInformixOutput

be inserted.
Warning:
Delete operation.
component only.

Job designs.
alend.com).

available:
only.
1722
tInformixOutput
Row > Rejects link.
Advanced settings

Note:
global variables.

transaction quality (but not rollback) and, above all, better
performance at executions.




column.
processing.

batch.
1723
tInformixOutput
is selected.
Optimize the batch insertion Ensure the check box is selected, to optimize the insertion
of batches of data.
level.
Global Variables

check box.
use from it.
User Guide.
Usage
a table in a Informix database. It also allows you to create a
1724
tInformixOutput

unusable.
Guide.
Related scenarios
For tInformixOutput related topics, see:
1725
tInformixOutputBulk
tInformixOutputBulk
Prepares the file to be used as a parameter in the INSERT query used to feed Informix databases.
database.
Writes a file composed of columns, based on a defined delimiter and on Informix standards.
tInformixOutputBulk Standard properties

These properties are used to configure tInformixOutputBulk running in the Standard Job framework.
The Standard tInformixOutputBulk component belongs to the Databases family.
Basic settings
Built-in: No property data stored centrally

Append Select this check box to append new rows to the end of the
file.
component only.

Job designs.
1726
tInformixOutputBulk

alend.com).

available:
only.
Advanced settings
Field separator Character, string or regular expression used to separate

fields
Set DBMONEY Select this box if you want to define the decimal separator
in the corresponding field.
Set DBDATE Select the date format that you want to apply.
Create directory if not exists This check box is selected automatically. The option allows
you to create a folder for the output file if it doesn't already
exist.
Custom the flush buffer size Select this box in order to customize the memory size used
to store the data temporarily. In the Row number field enter
the number of rows at which point the memory should be
freed.
handling.
level.
Global Variables

check box.
1727
tInformixOutputBulk

use from it.
User Guide.
Usage
Usage rule This component is generally used along with tInformixBulkE

xec. Together, they improve performance levels when
adding data to an Informix database.
Guide.

Related scenario
For a scenario in which tInformixOutputBulk might be used, see:
1728
tInformixOutputBulkExec
Carries out Insert operations in Informix databases using the data provided.
tInformixOutputBulkExec Standard properties

These properties are used to configure tInformixOutputBulkExec running in the Standard Job
framework.
The Standard tInformixOutputBulkExec component belongs to the Databases family.
Basic settings
No properties stored centrally

Execution platform Select the operating system you are using.
connection.
Guide.
1729

settings.

can be written at a time and the table must already exist for
the insert operation to be authorised.
operations:
again.
exist.
component only.

Job designs.
alend.com).

available:
1730

only.
Informix Directory Informix installation directory, e.g. " C:\Program Files\IBM

\IBM Informix Dynamic Server\11.50\".
Data file Name of the file to be generated and loaded.
Append Select this check box to add rows to the end of the file.

Advanced settings

Note:
global variables.
Fields terminated by Character, string or regular expression used to separate the

fields
Set DBMONEY Select this check box to define the decimal separator used
in the corresponding field.
Set DBDATE Select the date format you want to apply.
Rows Before Commit Enter the number of rows to be processed before the
commit.
Bad Rows Before Abort Enter the number of rows in error at which point the Job
should stop.
to hold the output table if required.
Custom the flush buffer size Select this box in order to customize the memory size used
to store the data temporarily. In the Row number field enter
the number of rows at which point the memory should be
freed.
1731
handling.
component level.
Output Where the output should go.
Usage
Usage rule This component is generally used when no particular

transformation is required on the data to be inserted in the
database.
unusable.
Guide.

using tInformixOutputBulkExec is deployed, so that the
component functions properly.
Related scenario
For a scenario in which tInformixOutputBulkExec might be used, see:
1732
tInformixRollback
tInformixRollback
Prevents involuntary transaction commits by canceling transactions in connected databases.
tInformixRollback is closely related to tInformixCommit and tInformixConnection. They are generally
used together to execute transactions.
tInformixRollback Standard properties

These properties are used to configure tInformixRollback running in the Standard Job framework.
The Standard tInformixRollback component belongs to the Databases family.
Basic settings
Component list Select the tInformixConnection component from the list if

you plan to add more than one connection to the Job.
Close Connection Clear this check box if you want to continue to use the
connection once the component has completed its task.
Advanced settings
level.
Usage
Usage rule This component must be used with other Informix

components, particularly tInformixConnection and
tInformixCommit.
Famille de composant Databases/Informix
1733
tInformixRollback

Guide.
Related Scenario
For a scenario in which tInformixRollback might be used, see Rollback from inserting data in mother/
daughter tables on page 2429.
1734
tInformixRow
tInformixRow
Acts on the actual DB structure or on the data (although without handling data) thanks to the
SQLBuilder that helps you write easily your SQL statements.
tInformixRow executes the SQL query stated onto the specified database. The Row suffix means the
component implements a flow in the job design although it doesn't provide output.
tInformixRow Standard properties

These properties are used to configure tInformixRow running in the Standard Job framework.
The Standard tInformixRow component belongs to the Databases family.
Basic settings

connection.
Guide.
1735
tInformixRow

settings.
component only.

Job designs.

available:
only.

graphically using SQLBuilder.


definition.
Row > Rejects link.
Advanced settings

1736
tInformixRow
use column list.
Note:
tab.
instruction.
Note:
increased

level.
Global Variables
check box.
use from it.
User Guide.
1737
tInformixRow
Usage
unusable.
Guide.
Related scenarios
1738
tInformixSCD
tInformixSCD
Tracks and shows changes which have been made to Informix SCD dedicated tables
tInformixSCD addresses Slowly Changing Dimension transformation needs, by regularly reading a data
source and listing the modifications in an SCD dedicated table.
tInformixSCD Standard properties

These properties are used to configure tInformixSCD running in the Standard Job framework.
The Standard tInformixSCD component belongs to the Business Intelligence and the Databases
families.
Basic settings

data
connection.
Guide.
1739
tInformixSCD

settings.

Table Name of the table to be created
available:
only.

Guide.

Studio User Guide.
on page 2511.
Use memory saving Mode Select this check box to improve system performance.
have Null values.
Warning:
source key(s) values when this option is selected.
1740
tInformixSCD
Use Transaction Select this check box when the database is configured in
NO_LOG mode.
rows.
Advanced settings
12:00:00.
Debug mode Select this check box to display each step of the process by
which data is written in the database.
component level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is an output component. Consequently, it

requires an input component and a connection of the Row >
Main type.
1741
tInformixSCD

unusable.
Guide.
Related scenario
For a scenario in which tInformixSCD might be used, see tMysqlSCD on page 2508.
1742
tInformixSP
tInformixSP
Centralises and calls multiple and complex queries in a database.
tInformixSP calls procedures stored in a database.
tInformixSP Standard properties

These properties are used to configure tInformixSP running in the Standard Job framework.
The Standard tInformixSP component belongs to the Databases family.
Basic settings
Built-in: No properties stored centrally.

connection.
Guide.
1743
tInformixSP

settings.

component only.

Job designs.

available:
only.
SP Name Enter the exact name of the stored procedure (SP).
Is Function / Return result in Select this check box if only one value must be returned.
From the list, select the the schema column upon which the
value to be obtained is based.
Parameters Click the Plus button and select the various Schema
Columns that will be required by the procedures. Note
that the SP schema can hold more columns than there are
parameters used in the procedure.
Select the Type of parameter:
IN: Input parameter.
OUT: Output parameter/return value.
IN OUT: Input parameters is to be returned as value, likely
after modification through the procedure (function).
RECORDSET: Input parameters is to be returned as a set of
values, rather than single value.
1744
tInformixSP
Note:
Check Inserting data in mother/daughter tables on page
2426, if you want to analyze a set of records from a
database table or DB query and return single records.
Use Transaction Clear this check box if the database is configured in the
NO_LOG mode.
Advanced settings

tStatCatcher Statistics Select this check box to collect log data at a component
level.
Usage
Usage rule This is an intermediary component. It can also be used as an

entry component. In this case, only the entry parameters are
authorized.
unusable.
Guide.
Limitation The stored procedure syntax must correspond to that of the

database.
This component requires installation of its related jar files.
Related scenarios
1745
tInformixSP
• Retrieving personal information using a stored procedure on page 2404.

• Using tMysqlSP to find a State Label using a stored procedure on page 2528.
• Checking number format using a stored procedure on page 2735.
• Executing a stored procedure using tMDMSP on page 2180.
Also, see Inserting data in mother/daughter tables on page 2426 if you want to analyse a set of
records in a table or SQL query.
1746
tIngresBulkExec
tIngresBulkExec
Inserts data in bulk to a table in the Ingres DBMS for performance gain.
tIngresOutputBulk and tIngresBulkExec are generally used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used
to feed a database. These two steps are fused together in the tIngresOutputBulkExec component.
database.
tIngresBulkExec Standard properties

These properties are used to configure tIngresBulkExec running in the Standard Job framework.
The Standard tIngresBulkExec component belongs to the Databases family.
Basic settings

Table Name of the table to be filled.
VNode Name of the virtual node.
Action on table Actions that can be taken on the table defined:

None: No operation made to the table.
Truncate: Delete all the rows in the table and release the
file space back to the operating system.
File name Name of the file to be loaded.
Warning:
This file should be located on the same machine as the d
atabase server.
1747
tIngresBulkExec

component only.

Job designs.
alend.com).

available:
only.
Delete Working Files After Use Select this check box to delete the files that are created
during the execution.
Advanced settings
Row Separator String (ex: "\n"on Unix) to separate rows
Null Indicator Value of the null indicator.
Session User User of the defined session (the connection to the da

tabase).
Rollback Enable or disable rollback.
On Error Policy of error handling:

Continue: Continue the execution.
Terminate: Terminate the execution.
Reject Row File Path and name of the file that holds the rejected rows.
Available when Continue is selected from the On Error list.
Error Count Number of errors to trigger the termination of the ex

ecution.
1748
tIngresBulkExec
Available when Terminate is selected from the On Error list.
Allocation Number of pages initially allocated to the table or index.
Extend Number of pages by which a table or index grows.
Fill Factor Specify the percentage (from 1 to 100) of each primary data
page that must be filled with rows, under ideal conditions.
For example, if you specify a fillfactor of 40, the DBMS
Server fills 40% of each of the primary data pages in the
restructured table with rows.
Min Pages/Max Pages Specify the minimum/maximum number of primary pages a

hash table must have. The Min. pages and Max. pages must
be at least 1.
Leaf Fill A bulk copy from can specify a leaffill value. This clause
specifies the percentage (from 1 to 100) of each B-tree leaf
page that must be filled with rows during the copy. This
clause can be used only on tables with a B-tree storage
structure.
Non Leaf Fill A bulk copy from can specify a nonleaffill value. This clause
specifies the percentage (from 1 to 100) of each B-tree non-
leaf index page that must be filled with rows during the
copy. This clause can be used only on tables with a B-tree
storage structure.
Row Estimate Specify the estimated number of rows to be copied from a

file to a table during a bulk copy operation.
Trailing WhiteSpace Selected by default, this check box is designed to trim the
trailing white spaces and applies only to such data types as
VARCHAR, NVARCHAR and TEXT.
Encoding List of the encoding schemes.
Output Where to output the error message:

to console: Message output to the console.
to global variable: Message output to the global variable.
level.
Global Variables
Global Variables NB_LINE_DATA: the number of rows read. This is an After

NB_LINE_BAD: the number of rows rejected. This is an After
check box.
1749
tIngresBulkExec

use from it.
User Guide.
Usage
Usage rule Deployed along with tIngresOutputBulk, tIngresBulkExe

c feeds the given data in bulk to the Ingres database for
performance gain.

using tIngresBulkExec is deployed, so that the component
functions properly.
Due to license incompatibility, one or more JARs required
Related scenarios
• Loading data to a table in the Ingres DBMS on page 1772
1750
tIngresClose
tIngresClose
Closes the transaction committed in the connected Ingres database.
tIngresClose Standard properties

These properties are used to configure tIngresClose running in the Standard Job framework.
The Standard tIngresClose component belongs to the Databases family.
Basic settings
Component list Select the tIngresConnection component in the list if more

Advanced settings
level.
Usage
Usage rule This component is to be used along with Ingres

components, especially with tIngresConnection and
tIngresCommit.
Guide.
1751
tIngresClose
Related scenarios
1752
tIngresCommit
tIngresCommit
Commits in one go, using a unique connection, a global transaction instead of doing that on every row
or every batch and thus provides gain in performance.
tIngresCommit validates the data processed through the Job into the connected database.
tIngresCommit Standard properties

These properties are used to configure tIngresCommit running in the Standard Job framework.
The Standard tIngresCommit component belongs to the Databases family.
Basic settings

Warning:
tIngresCommit to your Job, your data will be committed
Advanced settings
level.
Usage
Usage rule This component is more commonly used with other tIngres*
components, especially with the tIngresConnection and
tIngresRollback components.
1753
tIngresCommit

Guide.
Related scenario
For tIngresCommit related scenario, see Inserting data in mother/daughter tables on page 2426.
1754
tIngresConnection
tIngresConnection
subjobs.
tIngresConnection opens a connection to the database for a current transaction.
tIngresConnection Standard properties

These properties are used to configure tIngresConnection running in the Standard Job framework.
The Standard tIngresConnection component belongs to the Databases and the ELT families.
Basic settings

Server Database server IP address.

settings.
1755
tIngresConnection

Advanced settings
commit component.
component.
Usage
components, especially with the tIngresCommit and
tIngresRollback components.

Related scenarios
For tIngresConnection related scenario, see Loading data to a table in the Ingres DBMS on page 1772.
1756
tIngresInput
tIngresInput
Reads an Ingres database and extracts fields based on a query.
tIngresInput executes a DB query with a strictly defined order which must correspond to the schema
definition. Then it passes on the field list to the next component via a Main row link.
tIngresInput Standard properties

These properties are used to configure tIngresInput running in the Standard Job framework.
The Standard tIngresInput component belongs to the Databases family.
Basic settings


1757
tIngresInput
connection.
Guide.
Server Database server IP address

settings.
component only.

Job designs.

available:
only.
definition.
1758
tIngresInput
Advanced settings
connection created.

columns.
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component covers all possible SQL queries for Ingres
databases.

Related scenarios
1759
tIngresInput
See also the scenario for tContextLoad: Reading data from different MySQL databases using
dynamically loaded connection parameters on page 497.
1760
tIngresOutput
tIngresOutput
tIngresOutput writes, updates, makes changes or suppresses entries in a database.
tIngresOutput Standard properties

These properties are used to configure tIngresOutput running in the Standard Job framework.
The Standard tIngresOutput component belongs to the Databases family.
Basic settings


1761
tIngresOutput
connection.
Guide.

settings.
operations:
again.
exist.
Job stops.
be inserted.
1762
tIngresOutput
Warning:
Delete operation.
component only.

Job designs.
alend.com).

available:
only.
Row > Rejects link.
Advanced settings
connection created.
1763
tIngresOutput






column.
processing.
Note:
option.

batch.
is selected.
level.
Global Variables

1764
tIngresOutput

check box.
use from it.
User Guide.
Usage
a table in a Ingres database. It also allows you to create a

Related scenarios
1765
tIngresOutputBulk
tIngresOutputBulk
Prepares the file whose data is inserted in bulk to the Ingres DBMS for performance gain.
step, an output file is generated. In the second step, this file is used in the INSERT operation used to
feed a database. These two steps are fused together in the tIngresOutputBulkExec component.
tIngresOutputBulk prepares a file with the schema defined and the data coming from the preceding
component.
tIngresOutputBulk Standard properties

These properties are used to configure tIngresOutputBulk running in the Standard Job framework.
The Standard tIngresOutputBulk component belongs to the Databases family.
Basic settings

Warning:
This file is generated on the local machine or a shared
folder on the LAN.
Append the File Select this check box to add the new rows at the end of the
file.
component only.

Job designs.
1766
tIngresOutputBulk

alend.com).

available:
only.
Advanced settings
Row Separator String (ex: "\n"on Unix) to separate rows.
Include Header Select this check box to include the column header in the fi
le.
Encoding List of encoding schemes.
level.
Global Variables

check box.
use from it.
User Guide.
1767
tIngresOutputBulk
Usage
Usage rule Deployed along with tIngresBulkExec, tIngresOutputBulk

is intended to save the incoming data to a file, whose
data is then inserted in bulk to an Ingres database by
tIngresBulkExec for performance gain.

Related scenarios
• Loading data to a table in the Ingres DBMS on page 1772,
1768
tIngresOutputBulkExec
Inserts data in bulk to a table in the Ingres DBMS for performance gain.
step, an output file is generated. In the second step, this file is used in the INSERT operation used to
feed a database. These two steps are fused together in the tIngresOutputBulkExec component.
tIngresOutputBulkExec prepares an output file and uses it to feed a table in the Ingres DBMS.
tIngresOutputBulkExec Standard properties

These properties are used to configure tIngresOutputBulkExec running in the Standard Job framework.
The Standard tIngresOutputBulkExec component belongs to the Databases family.
Basic settings

Table Name of the table to be filled.
VNode Name of the virtual node.

The database server must be installed on the same machine
where the Studio is installed or where the Job using
tIngresOutputBulkExec is deployed.
Action on table Actions that can be taken on the table defined:

None: No operation made to the table.
Truncate: Delete all the rows in the table and release the
file space back to the operating system.
File name Name of the file to be generated and loaded.
1769
Warning:
This file is generated on the machine specified by the
VNode field so it should be on the same machine as the
database server.
component only.

Job designs.
alend.com).

available:
only.
Delete Working Files After Use Select this check box to delete the files that are created
during the execution.
Advanced settings
Row Separator String (ex: "\n"on Unix) to separate rows
On Error Policy of error handling:

Continue: Continue the execution.
Terminate: Terminate the execution.
Reject Row File Path and name of the file that holds the rejected rows.
Available when Continue is selected from the On Error list.
1770
Error Count Number of errors to trigger the termination of the ex

ecution.
Available when Terminate is selected from the On Error list.
Rollback Enable or disable rollback.
Null Indicator Value of the null indicator.
Session User User of the defined session (the connection to the da

tabase).
Allocation Number of pages initially allocated to the table or index.
Extend Number of pages by which a table or index grows.
Fill Factor Specify the percentage (from 1 to 100) of each primary data
page that must be filled with rows, under ideal conditions.
For example, if you specify a fillfactor of 40, the DBMS
Server fills 40% of each of the primary data pages in the
restructured table with rows.
Min Pages/Max Pages Specify the minimum/maximum number of primary pages a

hash table must have. The Min. pages and Max. pages must
be at least 1.
Leaf Fill A bulk copy from can specify a leaffill value. This clause
specifies the percentage (from 1 to 100) of each B-tree leaf
page that must be filled with rows during the copy. This
clause can be used only on tables with a B-tree storage
structure.
Non Leaf Fill A bulk copy from can specify a nonleaffill value. This clause
specifies the percentage (from 1 to 100) of each B-tree non-
leaf index page that must be filled with rows during the
copy. This clause can be used only on tables with a B-tree
storage structure.
Row Estimate Specify the estimated number of rows to be copied from a

file to a table during a bulk copy operation.
Trailing WhiteSpace Selected by default, this check box is designed to trim the
trailing white spaces and applies only to such data types as
VARCHAR, NVARCHAR and TEXT.
Output Where to output the error message:

to console: Message output to the console.
to global variable: Message output to the global variable.
level.
Usage
Usage rule Usually deployed along with tIngresConnection or

tIngresRow, tIngresOutputBulkExec prepares an output
file and feeds its data in bulk to the Ingres DBMS for
performance gain.
1771

using tIngresOutputBulkExec is deployed, so that the
component functions properly.
Due to license incompatibility, one or more JARs required
Loading data to a table in the Ingres DBMS

In this scenario, a tIngresOutputBulkExec component is deployed to prepare an output file with the
employee data from a .csv file and then use that output file to feed a table in an Ingres database.
Dragging and dropping components

Procedure
1. Drop tIngresConnection, tFileInputDelimited and tIngresOutputBulkExec from the Palette onto
the workspace.
2. Rename tIngresOutputBulkExec as save_a_copy_and_load_to_DB.
3. Link tIngresConnection to tFileInputDelimited using an OnSubjobOk trigger.
4. Link tFileInputDelimited to tIngresOutputBulkExec using a Row > Main connection.

Procedure
1. Double-click tIngresConnection to open its Basic settings view in the Component tab.
2. In the Server field, enter the address of the server where the Ingres DBMS resides, for example
"localhost".
1772
Keep the default settings of the Port field.

3. In the Database field, enter the name of the Ingres database, for example "research".
4. In the Username and Password fields, enter the authentication credentials.
A context variable is used for the password here. For more information on context variables, see
5. Double-click tFileInputDelimited to open its Basic settings view in the Component tab.
6. Select the source file by clicking the [...] button next to the File name/Stream field.
8. Click the [+] button to add four columns, for example name, age, job and dept, with the data type
as string, Integer, string and string respectively.
Click OK to close the schema editor.
Click Yes on the pop-up window that asks whether to propagate the changes to the subsequent
component.
Leave other default settings unchanged.
9. Double-click tIngresOutputBulkExec to open its Basic settings view in the Component tab.
1773
10. In the Table field, enter the name of the table for data insertion.
11. In the VNode and Database fields, enter the names of the VNode and the database.
12. In the File Name field, enter the full path of the file that will hold the data of the source file.
Executing the Job

Procedure
As shown above, the employee data is written to the table employee in the database research on
the node talendbj. Meanwhile, the output file employee_research.csv has been generated at C:/
Users/talend/Desktop.
Related scenarios
1774
tIngresRollback
tIngresRollback
Avoids to commit part of a transaction involuntarily by canceling the transaction committed in the
connected database.
tIngresRollback Standard properties

These properties are used to configure tIngresRollback running in the Standard Job framework.
The Standard tIngresRollback component belongs to the Databases family.
Basic settings

Advanced settings
level.
Usage
components, especially with the tIngresConnection and
tIngresCommit components.
1775
tIngresRollback

Guide.
Related scenarios
For tIngresRollback related scenario, see Rollback from inserting data in mother/daughter tables on
page 2429.
1776
tIngresRow
tIngresRow
Acts on the actual DB structure or on the data (although without handling data) using the SQLBuilder
tool to write easily your SQL statements.
tIngresRow executes the SQL query stated onto the specified database. The Row suffix means the
tIngresRow Standard properties

These properties are used to configure tIngresRow running in the Standard Job framework.
The Standard tIngresRow component belongs to the Databases family.
Basic settings

connection.
Guide.
1777
tIngresRow

settings.
component only.

Job designs.

available:
only.



definition.
Row > Rejects link.
Advanced Settings
connection created.
1778
tIngresRow
use column list.
tab.
instruction.
Note:
increased

level.
Global Variables
check box.
use from it.
User Guide.
Usage

1779
tIngresRow
Related scenarios
1780
tIngresSCD
tIngresSCD
Reflects and tracks changes in a dedicated Ingres SCD table.
tIngresSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and
logging the changes into a dedicated SCD table.
tIngresSCD Standard properties

These properties are used to configure tIngresSCD running in the Standard Job framework.
The Standard tIngresSCD component belongs to the Business Intelligence and the Databases families.
Basic settings
connection.
Guide.

stored. The fields to follow are pre-filled in using fetched
data.
Server Database server IP address.
1781
tIngresSCD

settings.
available:
only.

Guide.

Studio User Guide.
on page 2511.
have Null values.
Warning:
source key(s) values when this option is selected.
rows.
1782
tIngresSCD
Advanced settings
connection created.
12:00:00.
level.
Global Variables

check box.
use from it.
User Guide.
Usage

Related scenario
1783
tInterbaseClose
tInterbaseClose
Closes the transaction committed in the connected Interbase database.
tInterbaseClose Standard properties

These properties are used to configure tInterbaseClose running in the Standard Job framework.
The Standard tInterbaseClose component belongs to the Databases family.
Basic settings
Component list Select the tInterbaseConnection component in the list if

Advanced settings
level.
Usage
Usage rule This component is to be used along with Interbase

components, especially with tInterbaseConnection and
tInterbaseCommit.
Guide.
1784
tInterbaseClose
Related scenarios
1785
tInterbaseCommit
tInterbaseCommit
tInterbaseCommit validates the data processed through the Job into the connected DB.
tInterbaseCommit Standard properties

These properties are used to configure tInterbaseCommit running in the Standard Job framework.
The Standard tInterbaseCommit component belongs to the Databases family.
Basic settings

Warning:
tInterbaseCommit to your Job, your data will be committed
Advanced settings
level.
Usage

tInterbase* components, especially with the tInterbaseConn
ection and tInterbaseRollback components.
1786
tInterbaseCommit

Guide.
Related scenario
For tInterbaseCommit related scenario, see Inserting data in mother/daughter tables on page 2426.
1787
tInterbaseConnection
subjobs.
tInterbaseConnection opens a connection to the database for a current transaction.
tInterbaseConnection Standard properties

These properties are used to configure tInterbaseConnection running in the Standard Job framework.
The Standard tInterbaseConnection component belongs to the Databases and the ELT families.
Basic settings


settings.
1788
Advanced settings
commit component.
component.
Usage

other tInterbase* components, especially with the
tInterbaseCommit and tInterbaseRollback components.
Related scenarios
For tInterbaseConnection related scenario, see tMysqlConnection on page 2425
1789
tInterbaseInput
tInterbaseInput
Reads an Interbase database and extracts fields based on a query.
tInterbaseInput executes a DB query with a strictly defined order which must correspond to the
tInterbaseInput Standard properties

These properties are used to configure tInterbaseInput running in the Standard Job framework.
The Standard tInterbaseInput component belongs to the Databases family.
Basic settings


1790
tInterbaseInput
connection.
Guide.

settings.

Guide.

Studio User Guide.

available:
only.
definition.
1791
tInterbaseInput
Advanced settings
connection created.

columns.
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Interbase databases.
unusable.
1792
tInterbaseInput

Guide.
Related scenarios
See also the related topic in tContextLoad: Reading data from different MySQL databases using
1793
tInterbaseOutput
tInterbaseOutput
tInterbaseOutput writes, updates, makes changes or suppresses entries in a database.
tInterbaseOutput Standard properties

These properties are used to configure tInterbaseOutput running in the Standard Job framework.
The Standard tInterbaseOutput component belongs to the Databases family.
Basic settings


1794
tInterbaseOutput
connection.
Guide.

settings.
operations:
again.
exist.
Job stops.
be inserted.
1795
tInterbaseOutput
Warning:
Delete operation.
Clear data in table Wipes out data from the selected table before action.
component only.

Job designs.
alend.com).

available:
only.
Row > Rejects link.
1796
tInterbaseOutput
Advanced settings
connection created.





column.
processing.
Note:
option.

batch.
is selected.
level.
Global Variables

1797
tInterbaseOutput

check box.
use from it.
User Guide.
Usage
table in a Interbase database. It also allows you to create a
unusable.
Guide.
1798
tInterbaseOutput
Related scenarios
1799
tInterbaseRollback
tInterbaseRollback
Avoids to commit part of a transaction involuntarily by canceling the transaction committed in the
connected Interbase database.
tInterbaseRollback Standard properties

These properties are used to configure tInterbaseRollback running in the Standard Job framework.
The Standard tInterbaseRollback component belongs to the Databases family.
Basic settings

Advanced settings
level.
Usage

tInterbase* components, especially with the tInterbaseConn
ection and tInterbaseCommit components.
1800
tInterbaseRollback

Guide.
Related scenarios
For tInterbaseRollback related scenario, see Rollback from inserting data in mother/daughter tables
on page 2429.
1801
tInterbaseRow
tInterbaseRow
Acts on the actual database structure or on the data (although without handling data) using the
SQLBuilder tool to write easily your SQL statements.
tInterbaseRow executes the SQL query stated onto the specified database. The Row suffix means the
tInterbaseRow Standard properties

These properties are used to configure tInterbaseRow running in the Standard Job framework.
The Standard tInterbaseRow component belongs to the Databases family.
Basic settings

connection.
Guide.
1802
tInterbaseRow

settings.

Guide.

Studio User Guide.

available:
only.



definition.
Row > Rejects link.
Advanced settings
connection created.
1803
tInterbaseRow
use column list.
Note:
tab.
instruction.
Note:
increased

level.
Global Variables
check box.
use from it.
User Guide.
1804
tInterbaseRow
Usage
unusable.
Guide.
Related scenarios
• For tDBSQLRow related scenario: see Procedure on page 622
• For tMySQLRow related scenario: see Removing and regenerating a MySQL table index on page
2497.
1805
tIntervalMatch
tIntervalMatch
Returns a value based on a Join relation.
tIntervalMatch receives a main flow and aggregates it based on join to a lookup flow. Then it matches
a specified value to a range of values and returns related information.
tIntervalMatch Standard properties

These properties are used to configure tIntervalMatch running in the Standard Job framework.
The Standard tIntervalMatch component belongs to the Data Quality family.
Basic settings

Guide.


available:
only.
Search Column Select the main flow column containing the values to be
matched with a range of values
Column (LOOKUP) Select the lookup flow column containing the values to be
returned when the Join is ok.
Lookup Column (min) / Include the bound (min) Select the column containing the minimum value of the
range. Select the check box to include the minimum value
of the range in the match.
1806
tIntervalMatch
Lookup Column (max) / Include the bound (max) Select the column containing the maximum value of the
range. Select the check box to include the maximum value
of the range in the match.
Advanced settings
level.
Global Variables

check box.
use from it.
User Guide.
Usage

Identifying server locations based on their IP addresses

This scenario describes a four-component Job that checks the server IP addresses listed in the main
input file against a list of IP ranges given in a lookup file to identify the hosting country for each
server.
1807
tIntervalMatch
Setting up the Job

About this task
The Job requires two tFileInputDelimited components, a tIntervalMatch component and a tLogRow
component.
Procedure
1. Drop the components onto the design workspace.
2. Connect the components using Row > Main connection.
Note that the connection from the second tFileInputDelimited component to the tIntervalMatch
component will appear as a Lookup connection.

Procedure
1. Double-click the first tFileInputDelimited component to open its Basic settings view.
2. Browse to the file to be used as the main input, which provides a list of servers and their IP
addresses:
Server;IP
Server1;057.010.010.010
Server2;001.010.010.100
Server3;057.030.030.030
Server4;053.010.010.100
3. Click the [...] button next to Edit schema to open the Schema dialog box and define the input
schema. According to the input file structure, the schema is made of two columns, respectively
Server and IP, both of type String. Then click OK to close the dialog box.
1808
tIntervalMatch
4. Define the number of header rows to be skipped, and keep the other settings as they are.
5. Define the properties of the second tFileInputDelimited component similarly.
The file to be used as the input to the lookup flow in this example lists some IP address ranges
and the corresponding countries:
StartIP;EndIP;Country
001.000.000.000;001.255.255.255;USA
002.006.190.056;002.006.190.063;UK
011.000.000.000;011.255.255.255;USA
057.000.000.000;057.255.255.255;France
012.063.178.060;012.063.178.063;Canada
053.000.000.000;053.255.255.255;Germany
Accordingly, the schema of the lookup flow should have the following structure:
1809
tIntervalMatch
6. Double-click the tIntervalMatch component to open its Basic settings view.
7. From the Search Column list, select the main flow column containing the values to be matched
with the range values. In this example, we want to match the servers' IP addresses with the range
values from the lookup flow.
8. From the Column (LOOKUP) list, select the lookup column that holds the values to be returned. In
this example, we want to get the names of countries where the servers are hosted.
9. Set the min and max lookup columns corresponding to the range bounds defined in the lookup
schema, StartIP and EndIP respectively in this example.
Executing the Job

Procedure
Press Ctrl+S to save your Job and press F6 to run it.
The name of the country where each server is hosted is displayed next to the IP address.
1810
tIterateToFlow
tIterateToFlow
Transforms non processable data into a processable flow. tIterateToFlow transforms a list into a data
flow that can be processed.
tIterateToFlow Standard properties

These properties are used to configure tIterateToFlow running in the Standard Job framework.
The Standard tIterateToFlow component belongs to the Orchestration family.
Basic settings
component. The schema is either Built-in or remote in the
Repository.
available:
only.

Guide.

Mapping Column: Enter a name for the column to be created

Value: Press Ctrl+Space to access all of the available
variables, be they global or user-defined.
Advanced Settings
level.
1811
tIterateToFlow
Global Variables

check box.
use from it.
User Guide.
Usage


Row: Main.
Trigger: Run if; On Component Ok; On Component Error.

Row: Iterate;

Studio User Guide.
Transforming a list of files as data flow

The following scenario describes a Job that iterates on a list of files, picks up the filename and current
date and transforms this into a flow, that gets displayed on the console.
• Drop the following components: tFileList, tIterateToFlow and tLogRow from the Palette to the
design workspace.
• Connect the tFileList to the tIterateToFlow using an iterate link and connect the Job to the
tLogRow using a Row main connection.
• In the tFileList Component view, set the directory where the list of files is stored.
1812
tIterateToFlow
• In this example, the files are three simple .txt files held in one directory: Countries.
• No need to care about the case, hence clear the Case sensitive check box.
• Leave the Include Subdirectories check box unchecked.
• Then select the tIterateToFlow component et click Edit Schema to set the new schema
• Add two new columns: Filename of String type and Date of date type. Make sure you define the
correct pattern in Java.
• Click OK to validate.
• Notice that the newly created schema shows on the Mapping table.
• In each cell of the Value field, press Ctrl+Space bar to access the list of global and user-specific
variables.
• For the Filename column, use the global variable: tFileList_1CURRENT_FILEPATH. It
retrieves the current filepath in order to catch the name of each file, the Job iterates on.
• For the Date column, use the Talend routine: Talend Date.getCurrentDate() (in Java)
• Then on the tLogRow component view, select the Print values in cells of a table check box.
1813
tIterateToFlow
The filepath displays on the Filename column and the current date displays on the Date column.
1814
tJasperOutput
tJasperOutput
Creates a report in rich formats using Jaspersoft's iReport.
This component is closely related to Jaspersoft's report designer -- iReport. It reads and processes
data from an input flow to create a report against a .jrxml report template defined via iReport.
tJasperOutput reads and processes data from an input flow to create a report against a .jrxml report
template defined via iReport.
tJasperOutput Standard properties

These properties are used to configure tJasperOutput running in the Standard Job framework.
The Standard tJasperOutput component belongs to the Business Intelligence family.
Basic settings
Jrxml file Report template file created via iReport.
Temp path Path of temporary files.
Destination path Path of the final report file.
File name/Stream Name of the final report.
Report type File type of the final report.
available:
only.

Guide.

Studio User Guide.
1815
tJasperOutput
iReport Edit the command to provide the path of iReport's

execution file, e.g. replacing __IREPORT_PATH__\ with E:
\Program Files\Jaspersoft\iReport-4.1.1\bin\, or giving the
full path of the execution file such as "E:\Program Files\J
aspersoft\iReport-4.1.1\bin\iReport.exe".
Launch Click to run iReport.
Advanced settings
level.
Specify Locale Select this check box to choose a locale from the Report
Locale list.
Note:
The first line of the Report Locale list is empty. You can
click it to customize a locale.
Encoding Select an encoding mode from this list. You can select
Custom from the list to enter an encoding method in the
field that appears.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is closely related to Jaspersoft's report

designer -- iReport. It reads and processes data from
an input flow to create a report against a .jrxml report
1816
tJasperOutput
Generating a report against a .jrxml template

The following Job reads data from a .csv file and creates a .pdf report based on an existing .jrxml
report template. Note that the template file should be created via Jaspersoft's iReport based on a file
that shares the same schema with the source .csv file of this job.
Setting up the Job

Procedure
1. Drag and drop the following components from the Palette to the workspace: tFileInputDelimited
and tJasperOutput.
2. Connect tFileInputDelimited and tJasperOutput using a Row link.

Procedure
1. Double-click the tFileInputDelimited component to display its Basic settings view.
2. Select Built-In from the Property Type drop-down list.
Note:
You can select Repository from the Property Type drop-down list to fill in the relevant fields
automatically if the relevant metadata has been stored locally in the Repository. For more
information about Metadata, see the Talend Studio User Guide.
3. Fill in the File name/Stream field to give the path and name of the source file, e.g. "C:/Documents
and Settings/Andy ZHANG/nom.csv".
4. Keep the default settings for the Row Separator and Field Separator fields. You can also change
them as needed.
1817
tJasperOutput
5. Set 1 in the Header field and 0 in the Footer field. Leave the Limit field empty. You can also
change them as needed.
6. Select Built-In from the Schema drop-down list and click Edit schema to define the data structure
of the input file. In this case, the input file has 2 columns: Nom and Prenom.

Procedure
1. Double-click tJasperOutput to display its Basic settings view.
2. Enter the full path of the report template file created via Jaspersoft's iReport in the Jrxml file
field. You can click the three-dot button to browse.
Note:
The schema of the file, which is used to create a .jrxml template file via iReport, should be the
same as that of the source file that is used to create the report.
3. Enter the path for the temporary files generated during the job execution in the Temp path field.
You can click the three-dot button to browse.
4. Enter the path for the final report file generated during the job execution in the Destination path
field. You can click the three-dot button to browse.
5. Enter the name for the final report file generated during the job execution in the File name/
Stream field.
6. Select the format for the final report file generated during the job execution in the Report type
field.
7. Click Sync columns to retrieve the schema from the previous component.
1818
tJasperOutput
8. Enter the path of execution file of Jaspersoft's iReport in the iReport field, e.g. replacing
__IREPORT_PATH__\ with E:\Program Files\Jaspersoft\iReport-4.1.1\bin\. You can click the Launch
button to run iReport.
Note:
This step is not mandatory. Yet, this helps you conveniently access the iReport software for
relevant operations, e.g. creating a report template, etc.
Job execution
Procedure
1. Press CTRL+S to save your Job.
You can find the file out.pdf in the folder specified in the Destination path field.
1819
tJasperOutputExec
tJasperOutputExec
Creates a report in rich formats using Jaspersoft's iReport and offers a performance gain as it functions
as a combination of an input component and a tJasperOutput component.
This component is closely related to Jaspersoft's report designer -- iReport. It reads and processes
data from a source file to create a report against a .jrxml report template defined via iReport.
tJasperOutputExec is used as a combination of an input component and a tJasperOutput component.
The advantage of using two separate components is that data can be transformed before being used
to generate a report and the input sources can be various and rich.
Reads and processes data from a source file to create a report against a .jrxml report template
defined via iReport.
tJasperOutputExec Standard properties

These properties are used to configure tJasperOutputExec running in the Standard Job framework.
The Standard tJasperOutputExec component belongs to the Business Intelligence family.
Basic settings
Jrxml file Report template file created via iReport.
Source file Name of the source file.
Record delimiter Delimiter of the records.
Destination path Path of the final report file.
Use Default Output Name Select this check box to use the default name for the report
generated, which takes the source file's name.
Output Name Name of the final report.
Note:
This field does not appear if the Use Default Output
Name box has been selected.
Report type File type of the final report.
iReport Edit the command to provide the path of iReport's

execution file, e.g. replacing __IREPORT_PATH__\ with E:
\Program Files\Jaspersoft\iReport-4.1.1\bin\, or giving the
full path of the execution file such as "E:\Program Files\J
aspersoft\iReport-4.1.1\bin\iReport.exe".
Launch Click to run iReport.
1820
tJasperOutputExec
Advanced settings
level.
Specify Locale Select this check box to choose a locale from the Report
Locale list.
Note:
The first line of the Report Locale list is empty. You can
click it to customize a locale.
Encoding Select an encoding mode from this list. You can select
Custom from the list to enter an encoding method in the
field that appears.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component is closely related to Jaspersoft's report

designer -- iReport. It reads and processes data from
a source file to create a report against a .jrxml report
Related Scenario
For related scenarios, see Generating a report against a .jrxml template on page 1817.
1821
tJava
tJava
Extends the functionalities of a Talend Job using custom Java commands.
tJava enables you to enter personalized code in order to integrate it in Talend program. You can
execute this code only once.
tJava Standard properties

These properties are used to configure tJava running in the Standard Job framework.
The Standard tJava component belongs to the Custom Code family.
Basic settings
Code Type in the Java code you want to execute according to the
task you need to perform. For further information about Java
functions syntax specific to Talend , see Talend Studio
Help Contents (Help > Developer Guide > API Reference).
For a complete Java reference, check http://docs.oracle.com/
javaee/6/api/
Note: If your custom Java code references

org.talend.transform.runtime
.api.ExecutionStatus, change it to
org.talend.transform.runtime
.common.MapExecutionStatus.
Advanced settings
Import Enter the Java code to import, if necessary, external libraries

used in the Code field of the Basic settings view.
Global Variables

check box.
use from it.
1822
tJava

User Guide.
Usage
Usage rule This component is generally used as a one-component

subJob.
Limitation You should know Java language.
Printing out a variable content

The following scenario is a simple demo of the extended application of the tJava component. The
Job aims at printing out the number of lines being processed using a Java command and the global
variable provided in Talend Studio .
Setting up the Job

Procedure
1. Select and drop the following components from the Palette onto the design workspace:
tFileInputDelimited, tFileOutputExcel, tJava.
2. Connect the tFileInputDelimited to the tFileOutputExcel using a Row Main connection. The
content from a delimited txt file will be passed on through the connection to an xls-type of file
without further transformation.
3. Then connect the tFileInputDelimited component to the tJava component using a Trigger > On
Subjob Ok link. This link sets a sequence ordering tJava to be executed at the end of the main
process.

Procedure
1. Set the Basic settings of the tFileInputDelimited component.
1823
tJava
2. Define the path to the input file in the File name field.
The input file used in this example is a simple text file made of two columns: Names and their
respective Emails.
3. Click the Edit Schema button, and set the two-column schema. Then click OK to close the dialog
box.
4. When prompted, click OK to accept the propagation, so that the tFileOutputExcel component gets
automatically set with the input schema.

Set the output file to receive the input content without changes. If the file does not exist already, it
will get created.
1824
tJava
In this example, the Sheet name is Email and the Include Header box is selected.
Configuring the tJava component

Procedure
1. Then select the tJava component to set the Java command to execute.
2. In the Code area, type in the following command:
String var = "Nb of line processed: ";

var = var + globalMap.get("tFileInputDelimited_1_NB_LINE");
System.out.println(var);
In this use case, we use the NB_Line variable. To access the global variable list, press Ctrl + Space
bar on your keyboard and select the relevant global parameter.
Executing the Job

Procedure
1825
tJava
Results
The content gets passed on to the Excel file defined and the Number of lines processed are displayed
on the Run console.
1826
tJavaDBInput
tJavaDBInput
Reads a database and extracts fields based on a query
tJavaDBInput executes a DB query with a strictly defined order which must correspond to the schema
definition. Then it passes on the field list to the next component via a Main row link.
tJavaDBInput Standard properties

These properties are used to configure tJavaDBInput running in the Standard Job framework.
The Standard tJavaDBInput component belongs to the Databases family.
Basic settings


Framework Select your Java database framework on the list
DB root path Browse to your database root.

settings.

Guide.

Studio User Guide.
1827
tJavaDBInput

available:
only.
definition.
Advanced settings

columns.
level.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule This component covers all possible SQL database queries.
1828
tJavaDBInput

Related scenarios
See also the related topic in tContextLoad: Reading data from different MySQL databases using
1829
tJavaDBOutput
tJavaDBOutput
tJavaDBOutput writes, updates, makes changes or suppresses entries in a database.
tJavaDBOutput Standard properties

These properties are used to configure tJavaDBOutput running in the Standard Job framework.
The Standard tJavaDBOutput component belongs to the Databases family.
Basic settings



settings.
operations:
again.
exist.
1830
tJavaDBOutput

Job stops.
be inserted.
Warning:
Delete operation.
component only.

Job designs.
alend.com).

available:
only.
1831
tJavaDBOutput

Row > Rejects link.
Advanced settings





column.
level.
Global Variables

1832
tJavaDBOutput

check box.
use from it.
User Guide.
Usage
table in a Java database. It also allows you to create a reject
flow using a Row > Rejects link to filter data in error. For an
example of tMysqlOutput in use, see Retrieving data in error
with a Reject link on page 2474.

Related scenarios
1833
tJavaDBRow
tJavaDBRow
Acts on the actual database structure or on the data (although without handling data) using the
SQLBuilder tool to write easily your SQL statements.
tJavaDBRow executes the SQL query stated onto the specified database. The Row suffix means the
tJavaDBRow Standard properties

These properties are used to configure tJavaDBRow running in the Standard Job framework.
The Standard tJavaDBRow component belongs to the Databases family.
Basic settings


settings.

Guide.

Studio User Guide.

available:
only.
1834
tJavaDBRow




definition.
Row > Rejects link.
Advanced settings
use column list.
tab.
instruction.
Note:
increased

level.
1835
tJavaDBRow
Global Variables
check box.
use from it.
User Guide.
Usage

Related scenarios
1836
tJavaFlex
tJavaFlex
Provides a Java code editor that lets you enter personalized code in order to integrate it in Talend
program.
tJavaFlex enables you to add Java code to the Start/Main/End code sections of this component itself.
With tJavaFlex, you can enter the three java-code parts (start, main and end) that constitute a kind of
component dedicated to do a desired operation.
tJavaFlex Standard properties

These properties are used to configure tJavaFlex running in the Standard Job framework.
The Standard tJavaFlex component belongs to the Custom Code family.
Basic settings
component in the Job.
component only.

Job designs.
alend.com).

available:
only.
1837
tJavaFlex
Data Auto Propagate Select this check box to automatically propagate the data to
the component that follows.
Start code Enter the Java code that will be called during the
initialization phase.
Main code Enter the Java code to be applied for each line in the data
flow.
End code Enter the Java code that will be called during the closing
phase.
Advanced settings
Import Enter the Java code that helps to import, if necessary,

external libraries used in the Main code box of the Basic
settings view.
Global Variables

check box.
use from it.
User Guide.
Usage
Usage rule You can use this component as a start, intermediate

or output component. You can as well use it as a one-
component subJob.
Limitation You should know the Java language.
Generating data flow

This scenario describes a two-components Job that generates a three-line data flow describing
different personal titles (Miss, Mrs, and Mr) and displaying them on the console.
1838
tJavaFlex
Setting up the Job

Procedure
1. Drop tJavaFlex and tLogRow from the Palette onto the design workspace.
2. Connect the components together using a Row > Main link.
Configuring the tJavaFlex component

Procedure
1. Double-click tJavaFlex to display its Basic settings view and define its properties.
2. Click the three-dot button next to Edit schema to open the corresponding dialog box where you
can define the data structure to pass to the component that follows.
3. Click the [+] button to add two columns: key and value and then set their types to Integer and
String respectively.
4. Click OK to validate your changes and close the dialog box.
1839
tJavaFlex
5. In the Basic settings view of tJavaFlex, select the Data Auto Propagate check box to automatically
propagate data to the component that follows.
In this example, we do not want to do any transformation on the retrieved data.
6. In the Start code field, enter the code to be executed in the initialization phase.
In this example, the code indicates the initialization of tJavaFlex by displaying the START
message and sets up the loop and the variables to be used afterwards in the Java code:
System.out.println("## START\n#");
String [] valueArray = {"Miss", "Mrs", "Mr"};
for (int i=0;i<valueArray.length;i++) {
7. In the Main code field, enter the code you want to apply on each of the data rows.
In this example, we want to display each key with its value:
row1.key = i;
row1.value = valueArray[i];
Warning:
In the Main code field, "row1" corresponds to the name of the link that comes out of tJavaFlex. If you
rename this link, you have to modify the code of this field accordingly.
8. In the End code field, enter the code that will be executed in the closing phase.
In this example, the brace (curly bracket) closes the loop and the code indicates the end of the
execution of tJavaFlex by displaying the END message:
}
System.out.println("#\n## END");
9. If needed, double-click tLogRow and in its Basic settings view, click the [...] button next to Edit
schema to make sure that the schema has been correctly propagated.
1840
tJavaFlex

Procedure
The three personal titles are displayed on the console along with their corresponding keys.
Processing rows of data with tJavaFlex

This scenario describes a two-component Job that generates random data and then collects that data
and does some transformation on it line by line using Java code through the tJavaFlex component.
Setting up the Job

Procedure
1. Drop tRowGenerator and tJavaFlex from the Palette onto the design workspace.
2. Connect the components together using a Row Main link.

Procedure
1. Double-click tRowGenerator to display its Basic settings view and the RowGenerator Editor dialog
box where you can define the component properties.
1841
tJavaFlex
2. Click the plus button to add four columns: number, txt, date and flag.
3. Define the schema and set the parameters to the four columns according to the above capture.
4. In the Functions column, select the three-dot function [...] for each of the defined columns.
5. In the Parameters column, enter 10 different parameters for each of the defined columns.
These 10 parameters corresponds to the data that will be randomly generated when executing
tRowGenerator.
6. Click OK to validate your changes and close the editor.
Configuring the tJavaFlex component

Procedure
1. Double-click tJavaFlex to display its Basic settings view and define the components properties.
2. Click Sync columns to retrieve the schema from the preceding component.
3. In the Start code field, enter the code to be executed in the initialization phase.
In this example, the code indicates the initialization of the tJavaFlex component by displaying the
START message and defining the variable to be used afterwards in the Java code:
System.out.println("## START\n#");
int i = 0;
4. In the Main code field, enter the code to be applied on each line of data.
1842
tJavaFlex
In this example, we want to show the number of each line starting from 0 and then the number
and the random text transformed to upper case and finally the random date set in the editor of
tRowGenerator. Then, we create a condition to show if the status is true or false and we increment
the number of the line:
System.out.print(" row" + i + ":");

System.out.print("# number:" + row1.number);
System.out.print (" | txt:" + row1.txt.toUpperCase());
System.out.print(" | date:" + row1.date);
if(row1.flag) System.out.println(" | flag: true");
else System.out.println(" | flag: false");
i++;
Warning:
In the Main code field, "row1" corresponds to the name of the link that connects to tJavaFlex. If you
rename this link, you have to modify the code.
5. In the End code field, enter the code that will be executed in the closing phase.
In this example, the code indicates the end of the execution of tJavaFlex by displaying the END
message:
System.out.println("#\n## END");

Procedure
1843
tJavaFlex
The console displays the randomly generated data that was modified by the java command set
through tJavaFlex.
1844
tJavaRow
tJavaRow
Provides a code editor that lets you enter the Java code to be applied to each row of the flow.
tJavaRow allows you to enter customized code which you can integrate in a Talend program.
tJavaRow Standard properties

These properties are used to configure tJavaRow running in the Standard Job framework.
The Standard tJavaRow component belongs to the Custom Code family.
Basic settings
component only.

Job designs.
alend.com).

available:
only.
Generate code Click this button to automatically generate the code in

the Code field to map the columns of the input schema
with those of the output schema. This generation does not
change anything in your schema.
1845
tJavaRow
The principle of this mapping is to relate the columns

that have the same column name. Then you can adapt the
generated code depending on the actual map you need.
Code Enter the Java code to be applied to each line of the data
flow.
Advanced settings
Import Enter the Java code to import, if necessary, external libraries

used in the Code field of the Basic settings view.
level..
Global Variables

check box.
use from it.
User Guide.
To enter a global variable (for example COUNT of
tFileRowCount) in the Code box, you need to type in the
entire piece of code manually, that is to say ((Integer)glob
alMap.get("tFileRowCount_COUNT")).
Usage
Usage rule This component is used as an intermediary between two

other components. It must be linked to both an input and an
output component.
Function tJavaRow allows you to enter customized code which you

can integrate in a Talend programme. With tJavaRow, you
can enter the Java code to be applied to each row of the
flow.
Purpose tJavaRow allows you to broaden the functionality of Talend

Jobs, using the Java language.
Limitation Knowledge of Java language is necessary.
1846
tJavaRow
Transforming data line by line using tJavaRow

In this scenario, the information of a few cities read from an input delimited file is transformed using
Java code through the tJavaRow component and printed on the console.
Setting up the Job

Procedure
1. Drop a tFileInputDelimited component and a tJavaRow component from the Palette onto the
design workspace, and label them to better identify their roles in the Job.
2. Connect the two components using a Row > Main connection.

Procedure
1. Double-click the tFileInputDelimited component to display its Basic settings view in the
Component tab.
2. In the File name/Stream field, type in the path to the input file in double quotation marks, or
browse to the path by clicking the [...] button, and define the first line of the file as the header.
In this example, the input file has the following content:
City;Population;LandArea;PopDensity
Beijing;10233000;1418;7620
Moscow;10452000;1081;9644
Seoul;10422000;605;17215
Tokyo;8731000;617;14151
New York;8310000;789;10452
3. Click the [...] button next to Edit schema to open the Schema dialog box, and define the data
structure of the input file. Then, click OK to validate the schema setting and close the dialog box.
1847
tJavaRow
4. Double-click the tJavaRow component to display its Basic settings view in the Component tab.
5. Click Sync columns to make sure that the schema is correctly retrieved from the preceding
component.
6. In the Code field, enter the code to be applied on each line of data based on the defined schema
columns.
In this example, we want to transform the city names to upper case, group digits of numbers
larger than 1000 using the thousands separator for ease of reading, and print the data on the
console:
System.out.print("\n" + input_row.City.toUpperCase() + ":");

System.out.print("\n - Population: "
+ FormatterUtils.format_Number(String.valueOf(input_row.Population), ',', '.') + "
people");
System.out.print("\n - Land area: "
+ FormatterUtils.format_Number(String.valueOf(input_row.LandArea), ',', '.')
+ " km2");
System.out.print("\n - Population density: "
+ FormatterUtils.format_Number(String.valueOf(input_row.PopDensity), ',', '.') + "
people/km2\n");
Note:
In the Code field, input_row refers to the link that connects to tJavaRow.
1848
tJavaRow

Procedure
The city information is transformed by the Java code set through tJavaRow and displayed on the
console.
1849
tJDBCClose
tJDBCClose
Closes an active JDBC connection to release the occupied resources.
tJDBCClose Standard properties

These properties are used to configure tJDBCClose running in the Standard Job framework.
The Standard tJDBCClose component belongs to the Databases family.
Basic settings
Connection Component Select the component that opens the connection you need
to close from the drop-down list.
Advanced settings
Global Variables

Usage
Usage rule This component is to be used along with JDBC components,

especially with tJDBCConnection and tJDBCCommit.
1850
tJDBCClose

Guide.
Related scenarios
1851
tJDBCColumnList
tJDBCColumnList
Lists all column names of a given JDBC table.
tJDBCColumList iterates on all columns of a given table through a defined JDBC connection.
tJDBCColumnList Standard properties

These properties are used to configure tJDBCColumnList running in the Standard Job framework.
The Standard tJDBCColumnList component belongs to the Databases family.
Basic settings
Database Type Select the type of the database to be accessed.
Component list Select the tJDBCConnection component in the list if more

Table name Enter the name of the table.
an error occurs.
Advanced settings
Statistics
Global Variables
Global Variables CURRENT_COLUMN: the name of the column currently

iterated upon. This is a Flow variable and it returns a string.
CURRENT_COLUMN_TYPE: the ID of the type of the column
currently iterated upon. This is a Flow variable and it returns
an integer.
CURRENT_COLUMN_TYPE_NAME: the name of the type of
the column currently iterated upon. This is a Flow variable
CURRENT_COLUMN_PRECISION: the precision of the column
currently iterated upon. This is a Flow variable, and it
returns an integer.
CURRENT_COLUMN_SCALE: the scale of the column
currently iterated upon. This is a Flow variable, and it
returns an integer.
NB_COLUMN: the number of columns iterated upon so far.
1852
tJDBCColumnList

check box.
use from it.
User Guide.
Usage
Usage rule This component is to be used along with JDBC components,

especially with tJDBCConnection.
Related scenario
For tJDBCColumnList related scenario, see Iterating on a DB table and listing its column names on
page 2419.
1853
tJDBCCommit
tJDBCCommit
tJDBCCommit validates the data processed through the Job into the connected DB.
tJDBCCommit Standard properties

These properties are used to configure tJDBCCommit running in the Standard Job framework.
The Standard tJDBCCommit component belongs to the Databases family.
Basic settings
Close Connection Select this check box to close the database connection once
Clear this check box to continue to use the selected
If this component is linked to your Job via a Row > Main
connection, your data will be committed row by row. In this
case, do not select the Close connection check box or your
connection will be closed before the end of the first row
commit.
Advanced settings
Global Variables

Usage
Usage rule This component is more commonly used with other tJDBC*
components, especially with the tJDBCConnection and
tJDBCRollback components.
1854
tJDBCCommit
Guide.
Related scenario
For tJDBCCommit related scenario, see Inserting data in mother/daughter tables on page 2426.
1855
tJDBCConnection
tJDBCConnection
subjobs.
tJDBCConnection Standard properties

These properties are used to configure tJDBCConnection running in the Standard Job framework.
The Standard tJDBCConnection component belongs to the Databases and the ELT families.
Basic settings
filled in.
Drivers Complete this table to load the driver JARs needed. To do

database.
For more information, see Importing a database driver.
Driver Class Enter the class name for the specified driver between
river.
Use Id and Password The database user authentication data.
1856
tJDBCConnection

settings.
Specify a data source alias Select this check box and in the Data source alias field
displayed, specify the alias of a data source created on
Talend Runtime side to use the shared connection pool
defined in the data source configuration. This option works
only when you deploy and run your Job in Talend Runtime.
This check box is not available when the Use or register a
shared DB Connection check box is selected.
Advanced settings
Use Auto-Commit Select this check box to activate the auto commit mode.
commit component.
component.
Global Variables

1857
tJDBCConnection
Usage

tJDBC* components, especially with the tJDBCCommit and
tJDBCRollback components.
Importing a database driver

To enable a JDBC component work with a specific database, you need to import the corresponding
data driver into the component.
Procedure
1. If the library to be imported isn't available on your machine, either download and install it using
the Modules view or download and store it in a local directory.
2. In the Drivers table, add one row to the table by clicking the [+] button.
3. Click the newly added row and click the [...] button to open the Module dialog box where you can
import the external library.
1858
tJDBCConnection
4. If you have installed the library using the Modules view:

• Select the Platform option and then select the library from the list.
• Select the Artifact repository (local m2/nexus) > Find by name or Artifact repository (local
m2/nexus) > Find by Maven URI option, then specify the full name or Maven URI of the libr
ary module, and click the Detect the module install status button to validate its installation
status.
5. If you have stored the library file in a local directory:
a) Select the Artifact repository (local m2/nexus) option.
b) Select the Install a new module option, and click the [...] button to browse to library file.
c) If you need to customize the Maven URI of the library, select the Custom MVN URI check box,
specify the new URI, and then click the Detect the module install status button to validate its
installation status.
Note:
Changing the Maven URI for an external module will affect all the components and
metadata connections that use that module within the project.
When working on a remote project, your custom Maven URI settings will be automatically
synchronized to the Talend Artifact Repository and will be used when other users working
on the same project install the external module.
6. Click OK to confirm your changes.

The imported library file is listed in the Drivers table.
1859
tJDBCConnection
Note: You can replace or delete the imported library, or import new libraries if needed.
Related scenario
For tJDBCConnection related scenario, see tMysqlConnection on page 2425
1860
tJDBCInput
tJDBCInput
Reads any database using a JDBC API connection and extracts fields based on a query.
tJDBCInput executes a database query with a strictly defined order which must correspond to the
tJDBCInput Standard properties

These properties are used to configure tJDBCInput running in the Standard Job framework.
The Standard tJDBCInput component belongs to the Databases family.
Basic settings
filled in.
drop-down list.


1861
tJDBCInput
database.
river.

settings.

becomes built-in.

only.
Table Name The name of the table from which data will be retrieved.
Query Type and Query Specify the database query statement paying particularly
attention to the properly sequence of the fields which must
correspond to the schema definition.
• Built-In: Fill in the query statement in the Query field
manually or click the [...] button next to the Query
field to build the statement graphically using the
SQLBuilder.
• Repository: Select the relevant query stored in the
Repository by clicking the [...] button next to it and
in the pop-up Repository Content dialog box, select
the query to be used, and the Query field will be
automatically filled in.
1862
tJDBCInput
Guess Query Click this button to generate query in the Query field based
on the defined table and schema.
Guess Schema Click this button to generate schema columns based on the
query defined in the Query field.
drop-down list.
Advanced settings
Use cursor Select this check box to specify the number of rows you
want to work with at any given time. This option optimises
performance.
Trim all the String/Char columns Select this check box to remove leading whitespace and
trailing whitespace from all String/Char columns.
Check column to trim Select the check box for corresponding column to remove
leading whitespace and trailing whitespace from it.
This property is not available when the Trim all the String/
Char columns check box is selected.
Enable Mapping File for Dynamic Select this check box to use the specified metadata
mapping file when reading data from a dynamic type
column. This check box is cleared by default.
With this check box selected, you can specify the metadata
mapping file to use by selecting a type of database from the
Mapping File drop-down list.
For more information about metadata mapping files, see the
section on type conversion of Talend Studio User Guide.
using a prepared statement. In the Set PreparedStatem
ent Parameters table displayed, specify the value for each
parameter represented by a question mark ? in the SQL
statement defined in the Query field.
• Parameter Index: the position of the parameter in the
SQL statement.
• Parameter Type: the data type of the parameter.
• Parameter Value: the value of the parameter.
For a related use case of this property, see Using
PreparedStatement objects to query data on page 2498.
1863
tJDBCInput
Global Variables



Usage
Usage rule This component covers all possible SQL queries for any
database using a JDBC connection.
Guide.
Related scenarios
Related topic in tContextLoad: see Reading data from different MySQL databases using dynamically
1864
tJDBCOutput
tJDBCOutput
Executes the action defined on the data contained in the table, based on the flow incoming from the
preceding component in the Job.
tJDBCOutput writes, updates, makes changes or suppresses entries in any type of database connected
to a JDBC API.
tJDBCOutput Standard properties

These properties are used to configure tJDBCOutput running in the Standard Job framework.
The Standard tJDBCOutput component belongs to the Databases family.
Basic settings
filled in.
drop-down list.


1865
tJDBCOutput

database.
river.

settings.
Table Name The name of the table into which data will be written.
Data Action Select an action to be performed on data of the table

defined.
found, job stops.
• Update: Make changes to existing entries.
• Insert or update: Insert a new record in the index pool.
If the record with the given reference already exists, an
update would be made.
reference. If the record does not exist in the index pool,
a new record would be inserted.
flow.
Warning:
It is necessary to specify at least one column as a
primary key on which the Update and Delete operations
are based. You can do that by clicking Edit Schema
and selecting the check box(es) next to the column(s)
you want to set as primary key(s). For an advanced
use, click the Advanced settings view where you can
simultaneously define primary keys for the Update
and Delete operations. To do that: Select the Use field
options check box and then in the Key in update column,
select the check boxes next to the column names you
want to use as a base for the Update operation. Do
the same in the Key in delete column for the Delete
operation.
Clear data in table Select this check box to clear data in the table before
performing the action defined.
1866
tJDBCOutput

alend.com).

becomes built-in.

only.
Guess Schema Click this button to generate schema columns based on the
settings of database table columns.
an error occurs.
When errors are skipped, you can collect the rows on error
using a Row > Reject connection.
If you use the component's own DB configuration, your
data source connection will be closed at the end of the
component. To prevent this from happening, use a shared
DB connection with the data source alias specified.
drop-down list.
Advanced settings
Commit every Specify the number of rows to be processed before

committing batches of rows together into the database.
1867
tJDBCOutput

and, above all, better performance at executions.
Additional Columns This option allows you to call SQL functions to perform
actions on columns, which are not insert, update or delete
actions, or actions that require particular preprocessing. It is
not offered if you create (with or without drop) the database
table.
• Name: The name of the schema column to be inserted,
or the name of the schema column used to replace an
existing column.
• SQL expression: The SQL statement to be executed in
order to insert or replace relevant column.
• Position: Select Before, After, or Replace
according to the action to be performed on the
reference column.
• Reference column: The name of the reference column
that can be used to locate the new column to be
inserted or that will be replaced.
Use field options Select this check box and in the Fields options table
displayed, select the check box for the corresponding
column to customize a request, particularly if multiple
actions are being carried out on the data.
• Key in update: Select the check box for the
corresponding column based on which data is updated.
• Key in delete: Select the check box for the
corresponding column based on which data is deleted.
• Updatable: Select the check box if data in the
corresponding column can be updated.
• Insertable: Select the check box if data in the
corresponding column can be inserted.
processing, and in the Batch Size field displayed, specify the
number of records to be processed in each batch.
Enable parallel execution Select this check box to perform high-speed data processing
by treating multiple data flows simultaneously. This feature
depends on the database or the application ability to handle
multiple inserts in parallel as well as the number of CPU
affected. With this check box selected, you need to specify
the number of parallel executions desired in the Number of
parallel executions field displayed.
Note: When parallel execution is enabled, it is not

possible to use global variables to retrieve return values.
Global Variables

1868
tJDBCOutput

NB_LINE_INSERTED The number of rows inserted. This is an After variable and it

returns an integer.
NB_LINE_UPDATED The number of rows updated. This is an After variable and it

returns an integer.
NB_LINE_DELETED The number of rows deleted. This is an After variable and it

returns an integer.
NB_LINE_REJECTED The number of rows rejected. This is an After variable and it

returns an integer.

Usage
Usage rule This component offers the flexibility benefit of the database
query and covers all of the SQL queries possible.
a table in a JDBC database. It also allows you to create a
Guide.
Related scenarios
For tJDBCOutput related topics, see:
1869
tJDBCRollback
tJDBCRollback
Avoids commiting part of a transaction accidentally by canceling the transaction committed in the
connected database.
tJDBCRollback Standard properties

These properties are used to configure tJDBCRollback running in the Standard Job framework.
The Standard tJDBCRollback component belongs to the Databases family.
Basic settings
Close Connection Select this check box to close the database connection once
Clear this check box to continue to use the selected
Advanced settings
Global Variables

Usage
Usage rule This component is more commonly used with other tJDBC* components, especially with the
tJDBCConnection and tJDBCCommit components.
Dynamic Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
settings your database connection dynamically from multiple connections planned in your Job. This feature
is useful when you need to access database tables having the same data structure but in different
databases, especially when you are working in an environment where you cannot change your Job
settings, for example, when your Job has to be deployed and executed independent of Talend Studio.
For examples on using dynamic parameters, see Reading data from databases through context-
based dynamic connections on page 2446 and Reading data from different MySQL databases using
1870
tJDBCRollback
dynamically loaded connection parameters on page 497. For more information on Dynamic settings
and context variables, see Talend Studio User Guide.
Related scenario
For tJDBCRollback related scenario, see tMysqlRollback on page 2491
1871
tJDBCRow
tJDBCRow
Acts on the actual DB structure or on the data (although without handling data) using the SQLBuilder
tool to write easily your SQL statements.
tJDBCRow is the component for any type database using a JDBC API. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesn't provide output.
tJDBCRow Standard properties

These properties are used to configure tJDBCRow running in the Standard Job framework.
The Standard tJDBCRow component belongs to the Databases family.
Basic settings
filled in.
drop-down list.

database.
1872
tJDBCRow
river.

settings.

becomes built-in.

only.
Table Name The name of the table to be processed.
Query Type and Query Specify the database query statement paying particularly
attention to the properly sequence of the fields which must
correspond to the schema definition.
• Built-In: Fill in the query statement in the Query field
manually or click the [...] button next to the Query
field to build the statement graphically using the
SQLBuilder.
• Repository: Select the relevant query stored in the
Repository by clicking the [...] button next to it and
in the pop-up Repository Content dialog box, select
the query to be used, and the Query field will be
automatically filled in.
Guess Query Click this button to generate query in the Query field based
on the defined table and schema.
1873
tJDBCRow

If you use the component's own DB configuration, your
data source connection will be closed at the end of the
component. To prevent this from happening, use a shared
DB connection with the data source alias specified.
drop-down list.
an error occurs.
When errors are skipped, you can collect the rows on error
using a Row > Reject connection.
Advanced settings
Propagate QUERY's recordset Select this check box to propagate the result of the query
to the output flow. From the use column list displayed, you
need to select a column into which the query result will be
inserted.
schema from that of the preceding component. Moreover,
the column that holds the query's recordset should be set to
the Object type and this component is usually followed by a
tParseRecordSet component.
using a prepared statement. In the Set PreparedStatem
ent Parameters table displayed, specify the value for each
parameter represented by a question mark ? in the SQL
statement defined in the Query field.
• Parameter Index: the position of the parameter in the
SQL statement.
• Parameter Type: the data type of the parameter.
• Parameter Value: the value of the parameter.
For a related use case of this property, see Using
PreparedStatement objects to query data on page 2498.
Commit every Specify the number of rows to be processed before

committing batches of rows together into the database.
and, above all, better performance at executions.
Global Variables

1874
tJDBCRow

Usage
Usage rule This component offers the flexibility of the DB query for any
database using a JDBC connection and covers all possible
SQL queries.
Guide.
Related scenarios
1875
tJDBCSCDELT
tJDBCSCDELT
Tracks data changes in a source database table using SCD (Slowly Changing Dimensions) Type 1
method and/or Type 2 method and writes both the current and historical data into a specified SCD
dimension table.
tJDBCSCDELT Standard properties

These properties are used to configure tJDBCSCDELT running in the Standard Job framework.
The Standard tJDBCSCDELT component belongs to two families: Business Intelligence and Databases.
Basic settings
filled in.
Driver JAR Complete this table to load the driver JARs needed. To do

TalendOpenStudio Components RG en 7.3.1

Uploaded by

Copyright:

Available Formats

TalendOpenStudio Components RG en 7.3.1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TalendOpenStudio Components RG en 7.3.1

Uploaded by

Copyright:

Available Formats

Talend Components

Dynamic database components.............................................................................. 595

Exporting data using tSAPHanaUnload...............................................................3323

tAccessBulkExec Standard properties

Property type Either Built-in or Repository .

Built-in: No property data is stored centrally.

Repository: Select the repository file in which the

DB version Select the version of your database.

Database Type in the directory where your database is stored.

Username and Password DB user authentication data.

Local filename Browse to the delimited file to be loaded into your d

Built-in: The schema is created and stored locally for this

Repository: The schema already exists and is stored in the

Click Edit schema to make changes to the schema. If the

completion and choose this schema metadata again in

Additional JDBC parameters Specify additional connection properties for the DB

Usage rule This component is to be used along with tAccessOutputB

tAccessClose Standard properties

Component list Select the tAccessConnection component in the list if more

Usage rule This component is to be used along with other Access

tAccessCommit Standard properties

Component list Select the tAccessConnection component in the list if more

s database tables having the same data structure but in

tAccessConnection Standard properties

Property type Either Built-in or Repository .

Built-in: No property data stored centrally.

Repository: Select the repository file in which the

DB Version Access 2003 or later versions.

Database Name of the database.

Username and Password DB user authentication data.

Additional JDBC parameters Specify additional connection properties for the DB

Usage rule This component is more commonly used with other

Inserting data in parent/child tables

tAccessInput Standard properties

Property type Either Built-in or Repository .

Built-in: No property data stored centrally.

Repository: Select the repository file in which the

Click this icon to open a database connection wizard and

DB Version Select the version of Access that you are using.

Database Name of the database.

Username and Password DB user authentication data.

Built-in: The schema is created and stored locally for this

Repository: The schema already exists and is stored in the

Click Edit schema to make changes to the schema. If the

Additional JDBC parameters Specify additional connection properties for the DB

Trim column Remove leading and trailing whitespace from defined

Global Variables NB_LINE: the number of rows processed. This is an After

For examples on using dynamic parameters, see Reading

tAccessOutput Standard properties

Property type Either Built-in or Repository .

Built-in: No property data stored centrally.

Repository: Select the repository file in which the

Click this icon to open a database connection wizard and

DB Version Select the version of Access that you are using.

Database Name of the database

Username and Password DB user authentication data.

Repository: You have already created the schema and stored

Click Edit schema to make changes to the schema. If the

Additional JDBC parameters Specify additional connection properties for the DB

you have selected the Use an existing connection check box