TalendOpenStudio Components RG en 7.3.1
TalendOpenStudio Components RG en 7.3.1
TalendOpenStudio Components RG en 7.3.1
Reference Guide
7.3.1
Last updated: 2020-02-23
Contents
Copyleft........................................................................................................................ 77
tAccessBulkExec.......................................................................................................... 79
tAccessBulkExec Standard properties.......................................................................................................................79
Related scenarios............................................................................................................................................................. 81
tAccessClose................................................................................................................ 82
tAccessClose Standard properties..............................................................................................................................82
Related scenarios............................................................................................................................................................. 83
tAccessCommit............................................................................................................ 84
tAccessCommit Standard properties......................................................................................................................... 84
Related scenario............................................................................................................................................................... 85
tAccessConnection...................................................................................................... 86
tAccessConnection Standard properties.................................................................................................................. 86
Inserting data in parent/child tables........................................................................................................................87
tAccessInput.................................................................................................................91
tAccessInput Standard properties.............................................................................................................................. 91
Related scenarios............................................................................................................................................................. 94
tAccessOutput..............................................................................................................95
tAccessOutput Standard properties...........................................................................................................................95
Related scenarios...........................................................................................................................................................100
tAccessOutputBulk....................................................................................................101
tAccessOutputBulk Standard properties................................................................................................................101
Related scenarios...........................................................................................................................................................103
tAccessOutputBulkExec............................................................................................104
tAccessOutputBulkExec Standard properties...................................................................................................... 104
Related scenarios...........................................................................................................................................................107
tAccessRollback.........................................................................................................108
tAccessRollback Standard properties..................................................................................................................... 108
Related scenarios...........................................................................................................................................................109
tAccessRow................................................................................................................ 110
tAccessRow Standard properties..............................................................................................................................110
Related scenarios...........................................................................................................................................................113
tAddCRCRow..............................................................................................................114
tAddCRCRow Standard properties...........................................................................................................................114
Adding a surrogate key to a file..............................................................................................................................115
tAddLocationFromIP.................................................................................................118
tAddLocationFromIP Standard properties............................................................................................................ 118
Identifying a real-world geographic location of an IP.................................................................................... 119
tAdvancedFileOutputXML........................................................................................122
tAdvancedFileOutputXML Standard properties.................................................................................................. 122
Defining the XML tree.................................................................................................................................................125
Mapping XML data........................................................................................................................................................127
Defining the node status............................................................................................................................................ 127
Creating an XML file using a loop......................................................................................................................... 128
tAggregateRow..........................................................................................................133
tAggregateRow Standard properties...................................................................................................................... 133
Aggregating values and sorting data.....................................................................................................................135
tAggregateSortedRow.............................................................................................. 139
tAggregateSortedRow Standard properties......................................................................................................... 139
Sorting and aggregating the input data...............................................................................................................141
tAmazonAuroraClose................................................................................................ 146
tAmazonAuroraClose Standard properties........................................................................................................... 146
Related scenario.............................................................................................................................................................147
tAmazonAuroraCommit............................................................................................148
tAmazonAuroraCommit Standard properties.......................................................................................................148
Related scenario.............................................................................................................................................................149
tAmazonAuroraConnection......................................................................................150
tAmazonAuroraConnection Standard properties................................................................................................150
Related scenario.............................................................................................................................................................152
tAmazonAuroraInput................................................................................................ 153
tAmazonAuroraInput Standard properties............................................................................................................153
Handling data with Amazon Aurora....................................................................................................................... 156
tAmazonAuroraOutput............................................................................................. 163
tAmazonAuroraOutput Standard properties........................................................................................................ 163
Related scenario.............................................................................................................................................................169
tAmazonAuroraRollback.......................................................................................... 170
tAmazonAuroraRollback Standard properties..................................................................................................... 170
Related Scenario............................................................................................................................................................ 171
tAmazonEMRListInstances.......................................................................................172
tAmazonEMRListInstances Standard properties.................................................................................................172
Related scenario.............................................................................................................................................................173
tAmazonEMRManage................................................................................................174
tAmazonEMRManage Standard properties...........................................................................................................174
Managing an Amazon EMR cluster.........................................................................................................................178
tAmazonEMRResize.................................................................................................. 182
tAmazonEMRResize Standard properties..............................................................................................................182
Related scenario.............................................................................................................................................................184
tAmazonMysqlClose................................................................................................. 185
tAmazonMysqlClose Standard properties............................................................................................................. 185
Related scenarios...........................................................................................................................................................186
tAmazonMysqlCommit............................................................................................. 187
tAmazonMysqlCommit Standard properties........................................................................................................ 187
Related scenario.............................................................................................................................................................188
tAmazonMysqlConnection....................................................................................... 189
tAmazonMysqlConnection Standard properties................................................................................................. 189
Related scenario.............................................................................................................................................................191
tAmazonMysqlInput..................................................................................................192
tAmazonMysqlInput Standard properties............................................................................................................. 192
Related scenarios...........................................................................................................................................................194
tAmazonMysqlOutput...............................................................................................195
tAmazonMysqlOutput Standard properties.......................................................................................................... 195
Related scenarios...........................................................................................................................................................200
tAmazonMysqlRollback............................................................................................201
tAmazonMysqlRollback Standard properties.......................................................................................................201
Related scenario.............................................................................................................................................................202
tAmazonMysqlRow................................................................................................... 203
tAmazonMysqlRow Standard properties............................................................................................................... 203
Related scenario.............................................................................................................................................................206
tAmazonOracleClose................................................................................................ 207
tAmazonOracleClose Standard properties............................................................................................................207
Related scenario.............................................................................................................................................................208
tAmazonOracleCommit............................................................................................ 209
tAmazonOracleCommit Standard properties....................................................................................................... 209
Related scenario.............................................................................................................................................................210
tAmazonOracleConnection...................................................................................... 211
tAmazonOracleConnection Standard properties................................................................................................ 211
Related scenario.............................................................................................................................................................213
tAmazonOracleInput.................................................................................................214
tAmazonOracleInput Standard properties............................................................................................................ 214
Related scenarios...........................................................................................................................................................217
tAmazonOracleOutput..............................................................................................218
tAmazonOracleOutput Standard properties.........................................................................................................218
Related scenarios...........................................................................................................................................................223
tAmazonOracleRollback........................................................................................... 224
tAmazonOracleRollback Standard properties......................................................................................................224
Related scenario.............................................................................................................................................................225
tAmazonOracleRow.................................................................................................. 226
tAmazonOracleRow Standard properties..............................................................................................................226
Related scenarios...........................................................................................................................................................229
tAmazonRedshiftManage.........................................................................................230
tAmazonRedshiftManage Standard properties................................................................................................... 230
Related scenario.............................................................................................................................................................233
tApacheLogInput.......................................................................................................234
tApacheLogInput Standard properties...................................................................................................................234
Reading an Apache access-log file.........................................................................................................................235
tAS400Close.............................................................................................................. 237
tAS400Close Standard properties............................................................................................................................237
Related scenario.............................................................................................................................................................238
tAS400Commit.......................................................................................................... 239
tAS400Commit Standard properties....................................................................................................................... 239
Related scenario.............................................................................................................................................................240
tAS400Connection.................................................................................................... 241
tAS400Connection Standard properties................................................................................................................ 241
Related scenario.............................................................................................................................................................242
tAS400Input.............................................................................................................. 243
tAS400Input Standard properties............................................................................................................................ 243
Handling data with AS/400....................................................................................................................................... 245
Related scenarios...........................................................................................................................................................249
tAS400LastInsertId................................................................................................... 250
tAS400LastInsertId Standard properties............................................................................................................... 250
Related scenario.............................................................................................................................................................251
tAS400Output........................................................................................................... 252
tAS400Output Standard properties.........................................................................................................................252
Related scenarios...........................................................................................................................................................256
tAS400Rollback.........................................................................................................257
tAS400Rollback Standard properties..................................................................................................................... 257
Related scenarios...........................................................................................................................................................258
tAS400Row................................................................................................................ 259
tAS400Row Standard properties..............................................................................................................................259
Related scenarios...........................................................................................................................................................262
tAssert........................................................................................................................ 263
tAssert Standard properties....................................................................................................................................... 263
Viewing product orders status (on a daily basis) against a benchmark number....................................264
Setting up the assertive condition for a Job execution.................................................................................. 267
tAssertCatcher........................................................................................................... 273
tAssertCatcher Standard properties........................................................................................................................ 273
Related scenarios...........................................................................................................................................................274
tAzureAdlsGen2Input............................................................................................... 275
tAzureAdlsGen2Input Standard properties...........................................................................................................275
Related scenario.............................................................................................................................................................277
tAzureAdlsGen2Output............................................................................................ 278
tAzureAdlsGen2Output Standard properties....................................................................................................... 278
Accessing Azure ADLS Gen2 storage..................................................................................................................... 280
tAzureStorageConnection........................................................................................ 283
tAzureStorageConnection Standard properties.................................................................................................. 283
Related scenario.............................................................................................................................................................284
tAzureStorageContainerCreate............................................................................... 285
tAzureStorageContainerCreate Standard properties.........................................................................................285
Creating a container in Azure Storage.................................................................................................................. 286
tAzureStorageContainerDelete............................................................................... 291
tAzureStorageContainerDelete Standard properties.........................................................................................291
Related scenarios...........................................................................................................................................................292
tAzureStorageContainerExist.................................................................................. 293
tAzureStorageContainerExist Standard properties............................................................................................293
Related scenario.............................................................................................................................................................294
tAzureStorageContainerList.................................................................................... 295
tAzureStorageContainerList Standard properties.............................................................................................. 295
Related scenario.............................................................................................................................................................297
tAzureStorageDelete................................................................................................ 298
tAzureStorageDelete Standard properties............................................................................................................298
Related scenarios...........................................................................................................................................................300
tAzureStorageGet......................................................................................................301
tAzureStorageGet Standard properties.................................................................................................................. 301
Retrieving files from a Azure Storage container............................................................................................... 303
tAzureStorageInputTable.........................................................................................310
tAzureStorageInputTable Standard properties................................................................................................... 310
Handling data with Microsoft Azure Table storage..........................................................................................313
tAzureStorageList..................................................................................................... 320
tAzureStorageList Standard properties..................................................................................................................320
Related scenario.............................................................................................................................................................322
tAzureStorageOutputTable......................................................................................323
tAzureStorageOutputTable Standard properties................................................................................................323
Related scenario.............................................................................................................................................................326
tAzureStoragePut......................................................................................................327
tAzureStoragePut Standard properties.................................................................................................................. 327
Related scenario.............................................................................................................................................................329
tAzureStorageQueueCreate..................................................................................... 330
tAzureStorageQueueCreate Standard properties............................................................................................... 330
Related scenario.............................................................................................................................................................331
tAzureStorageQueueDelete..................................................................................... 332
tAzureStorageQueueDelete Standard properties...............................................................................................332
Related scenario.............................................................................................................................................................333
tAzureStorageQueueInput....................................................................................... 334
tAzureStorageQueueInput Standard properties................................................................................................. 334
Related scenario.............................................................................................................................................................336
tAzureStorageQueueInputLoop...............................................................................337
tAzureStorageQueueInputLoop Standard properties........................................................................................337
Related scenario.............................................................................................................................................................339
tAzureStorageQueueList.......................................................................................... 340
tAzureStorageQueueList Standard properties.....................................................................................................340
Related scenario.............................................................................................................................................................342
tAzureStorageQueueOutput.................................................................................... 343
tAzureStorageQueueOutput Standard properties.............................................................................................. 343
Related scenario.............................................................................................................................................................345
tAzureStorageQueuePurge...................................................................................... 346
tAzureStorageQueuePurge Standard properties................................................................................................ 346
Related scenario.............................................................................................................................................................347
tBarChart....................................................................................................................348
tBarChart Standard properties.................................................................................................................................. 348
Creating a bar chart from the input data.............................................................................................................350
tBigQueryBulkExec................................................................................................... 357
tBigQueryBulkExec Standard properties............................................................................................................... 357
Related Scenario............................................................................................................................................................ 360
tBigQueryInput..........................................................................................................361
tBigQueryInput Standard properties.......................................................................................................................361
Performing a query in Google BigQuery.............................................................................................................. 364
tBigQueryOutput....................................................................................................... 368
tBigQueryOutput Standard properties................................................................................................................... 368
Writing data in Google BigQuery............................................................................................................................ 371
tBigQueryOutputBulk............................................................................................... 379
tBigQueryOutputBulk Standard properties.......................................................................................................... 379
Related Scenario............................................................................................................................................................ 381
tBigQuerySQLRow.....................................................................................................382
tBigQuerySQLRow Standard properties................................................................................................................ 382
tBonitaDeploy............................................................................................................385
tBonitaDeploy Standard properties.........................................................................................................................385
Related Scenario............................................................................................................................................................ 386
tBonitaInstantiateProcess........................................................................................387
tBonitaInstantiateProcess Standard properties.................................................................................................. 387
Executing a Bonita process via a Talend Job..................................................................................................... 390
Outputting the process instance UUID over the Row > Main link.............................................................. 395
tBoxConnection.........................................................................................................398
tBoxConnection Standard properties..................................................................................................................... 398
Related scenario.............................................................................................................................................................399
tBoxCopy....................................................................................................................400
tBoxCopy Standard properties..................................................................................................................................400
Related scenarios...........................................................................................................................................................402
tBoxDelete................................................................................................................. 403
tBoxDelete Standard properties...............................................................................................................................403
Related scenarios...........................................................................................................................................................404
tBoxGet...................................................................................................................... 405
tBoxGet Standard properties..................................................................................................................................... 405
Related scenario.............................................................................................................................................................406
tBoxList...................................................................................................................... 407
tBoxList Standard properties.....................................................................................................................................407
Related scenarios...........................................................................................................................................................408
tBoxPut...................................................................................................................... 409
tBoxPut Standard properties..................................................................................................................................... 409
Uploading and downloading files from Box....................................................................................................... 411
tBufferInput............................................................................................................... 414
tBufferInput Standard properties.............................................................................................................................414
Retrieving bufferized data..........................................................................................................................................415
tBufferOutput............................................................................................................ 417
tBufferOutput Standard properties......................................................................................................................... 417
Buffering data..................................................................................................................................................................418
Buffering data to be used as a source system...................................................................................................420
Buffering output data on the webapp server..................................................................................................... 421
Calling a Job with context variables from a browser...................................................................................... 424
Calling a Job exported as Webservice in another Job..................................................................................... 426
tCassandraBulkExec..................................................................................................429
tCassandraBulkExec Standard properties............................................................................................................. 429
Related scenarios...........................................................................................................................................................430
tCassandraClose........................................................................................................ 431
tCassandraClose Standard properties.................................................................................................................... 431
Related Scenario............................................................................................................................................................ 431
tCassandraConnection..............................................................................................432
tCassandraConnection Standard properties.........................................................................................................432
Related scenario.............................................................................................................................................................433
tCassandraInput........................................................................................................ 434
Mapping tables between Cassandra type and Talend data type.................................................................434
tCassandraInput Standard properties.....................................................................................................................435
Handling data with Cassandra..................................................................................................................................439
tCassandraOutput..................................................................................................... 445
tCassandraOutput Standard properties................................................................................................................. 445
Related Scenario............................................................................................................................................................ 450
tCassandraOutputBulk..............................................................................................451
tCassandraOutputBulk Standard properties.........................................................................................................451
Related scenarios...........................................................................................................................................................454
tCassandraOutputBulkExec......................................................................................455
tCassandraOutputBulkExec Standard properties............................................................................................... 455
Related scenarios...........................................................................................................................................................458
tCassandraRow.......................................................................................................... 459
tCassandraRow Standard properties.......................................................................................................................459
Related scenario.............................................................................................................................................................460
tChangeFileEncoding................................................................................................462
tChangeFileEncoding Standard properties...........................................................................................................462
Transforming the character encoding of a file.................................................................................................. 463
tChronometerStart.................................................................................................... 465
tChronometerStart Standard properties................................................................................................................465
Related scenario.............................................................................................................................................................465
tChronometerStop.................................................................................................... 466
tChronometerStop Standard properties................................................................................................................ 466
Measuring the processing time of a subJob and part of a subJob.............................................................. 467
tCloudStart.................................................................................................................471
tCloudStart Standard properties.............................................................................................................................. 471
Related scenarios...........................................................................................................................................................473
tCloudStop................................................................................................................. 474
tCloudStop Standard properties...............................................................................................................................474
Related scenarios...........................................................................................................................................................475
tCombinedSQLAggregate.........................................................................................476
tCombinedSQLAggregate Standard properties...................................................................................................476
Filtering and aggregating table columns directly on the DBMS................................................................. 478
tCombinedSQLFilter................................................................................................. 488
tCombinedSQLFilter Standard properties.............................................................................................................488
Related Scenario............................................................................................................................................................ 489
tCombinedSQLInput................................................................................................. 490
tCombinedSQLInput Standard properties.............................................................................................................490
Related scenario.............................................................................................................................................................491
tCombinedSQLOutput...............................................................................................492
tCombinedSQLOutput Standard properties......................................................................................................... 492
Related scenario.............................................................................................................................................................493
tContextDump........................................................................................................... 494
tContextDump Standard properties........................................................................................................................494
Related scenarios...........................................................................................................................................................495
tContextLoad.............................................................................................................496
tContextLoad Standard properties.......................................................................................................................... 496
Reading data from different MySQL databases using dynamically loaded connection parameters..497
tConvertType............................................................................................................. 504
tConvertType Standard properties.......................................................................................................................... 504
Converting java types.................................................................................................................................................. 505
tCosmosDBBulkLoad................................................................................................ 510
tCosmosDBBulkLoad Standard properties............................................................................................................510
tCosmosDBConnection............................................................................................. 513
tCosmosDBConnection Standard properties........................................................................................................513
tCosmosDBInput........................................................................................................515
tCosmosDBInput Standard properties....................................................................................................................515
tCosmosDBOutput.....................................................................................................519
tCosmosDBOutput Standard properties................................................................................................................ 519
tCosmosDBRow......................................................................................................... 524
tCosmosDBRow Standard properties......................................................................................................................524
tCouchbaseDCPInput................................................................................................ 527
tCouchbaseDCPInput Standard properties........................................................................................................... 527
tCouchbaseDCPOutput............................................................................................. 529
tCouchbaseDCPOutput Standard properties........................................................................................................529
tCouchbaseInput....................................................................................................... 532
tCouchbaseInput Standard properties................................................................................................................... 532
tCouchbaseOutput.................................................................................................... 537
tCouchbaseOutput Standard properties................................................................................................................ 537
tCreateTable.............................................................................................................. 540
tCreateTable Standard properties........................................................................................................................... 540
Creating new table in a Mysql Database............................................................................................................. 544
tCreateTemporaryFile...............................................................................................546
tCreateTemporaryFile Standard properties..........................................................................................................546
Creating a temporary file and writing data into it........................................................................................... 547
tDB2BulkExec............................................................................................................553
tDB2BulkExec Standard properties.........................................................................................................................553
Related scenarios...........................................................................................................................................................558
tDB2Close.................................................................................................................. 559
tDB2Close Standard properties................................................................................................................................ 559
Related scenarios...........................................................................................................................................................560
tDB2Commit.............................................................................................................. 561
tDB2Commit Standard properties............................................................................................................................561
Related scenario.............................................................................................................................................................562
tDB2Connection........................................................................................................ 563
tDB2Connection Standard properties.....................................................................................................................563
Related scenarios...........................................................................................................................................................565
tDB2Input...................................................................................................................566
tDB2Input Standard properties.................................................................................................................................566
Related scenarios...........................................................................................................................................................569
tDB2Output................................................................................................................570
tDB2Output Standard properties............................................................................................................................. 570
Related scenarios...........................................................................................................................................................575
tDB2Rollback.............................................................................................................576
tDB2Rollback Standard properties.......................................................................................................................... 576
Related scenarios...........................................................................................................................................................577
tDB2Row.................................................................................................................... 578
tDB2Row Standard properties.................................................................................................................................. 578
Related scenarios...........................................................................................................................................................581
tDB2SCD.....................................................................................................................582
tDB2SCD Standard properties...................................................................................................................................582
Related scenarios...........................................................................................................................................................585
tDB2SCDELT.............................................................................................................. 586
tDB2SCDELT Standard properties........................................................................................................................... 586
Related Scenarios.......................................................................................................................................................... 590
tDB2SP....................................................................................................................... 591
tDB2SP Standard properties...................................................................................................................................... 591
Related scenarios...........................................................................................................................................................593
tDBBulkExec.............................................................................................................. 596
tDBBulkExec Standard properties........................................................................................................................... 596
tDBClose.....................................................................................................................597
tDBClose Standard properties...................................................................................................................................597
tDBColumnList.......................................................................................................... 598
tDBColumnList Standard properties....................................................................................................................... 598
tDBCommit.................................................................................................................599
tDBCommit Standard properties.............................................................................................................................. 599
tDBConnection.......................................................................................................... 600
tDBConnection Standard properties....................................................................................................................... 600
tDBInput.....................................................................................................................601
tDBInput Standard properties................................................................................................................................... 601
tDBLastInsertId......................................................................................................... 603
tDBLastInsertId Standard properties...................................................................................................................... 603
tDBOutput.................................................................................................................. 604
tDBOutput Standard properties................................................................................................................................604
tDBOutputBulk.......................................................................................................... 606
tDBOutputBulk Standard properties....................................................................................................................... 606
tDBOutputBulkExec.................................................................................................. 607
tDBOutputBulkExec Standard properties..............................................................................................................607
tDBRollback............................................................................................................... 608
tDBRollback Standard properties.............................................................................................................................608
tDBRow...................................................................................................................... 609
tDBRow Standard properties.....................................................................................................................................609
tDBSCD....................................................................................................................... 610
tDBSCD Standard properties..................................................................................................................................... 610
tDBSCDELT................................................................................................................ 611
tDBSCDELT Standard properties.............................................................................................................................. 611
tDBSP..........................................................................................................................612
tDBSP Standard properties.........................................................................................................................................612
tDBTableList.............................................................................................................. 613
tDBTableList Standard properties........................................................................................................................... 613
tDBFSConnection...................................................................................................... 614
tDBFSConnection Standard properties.................................................................................................................. 614
tDBFSGet....................................................................................................................615
tDBFSGet Standard properties..................................................................................................................................615
tDBFSPut....................................................................................................................617
tDBFSPut Standard properties.................................................................................................................................. 617
tDBSQLRow................................................................................................................619
tDBSQLRow Standard properties.............................................................................................................................619
Resetting a DB auto-increment................................................................................................................................621
tDenormalize............................................................................................................. 623
tDenormalize Standard properties.......................................................................................................................... 623
Denormalizing on one column................................................................................................................................. 624
Denormalizing on multiple columns......................................................................................................................626
tDenormalizeSortedRow.......................................................................................... 629
tDenormalizeSortedRow Standard properties.....................................................................................................629
Regrouping sorted rows.............................................................................................................................................. 630
tDie............................................................................................................................. 634
tDie Standard properties.............................................................................................................................................634
Related scenarios...........................................................................................................................................................635
tDotNETInstantiate................................................................................................... 636
tDotNETInstantiate Standard properties...............................................................................................................636
Related scenario.............................................................................................................................................................637
tDotNETRow.............................................................................................................. 638
tDotNETRow Standard properties........................................................................................................................... 638
Integrating .Net into Talend Studio: Introduction............................................................................................ 640
Integrating .Net into Talend Studio: Prerequisites........................................................................................... 640
Integrating .Net into Talend Studio: configuring the Job...............................................................................641
Utilizing .NET in Talend..............................................................................................................................................643
tDropboxConnection.................................................................................................647
tDropboxConnection Standard properties............................................................................................................647
Related scenario.............................................................................................................................................................647
tDropboxDelete.........................................................................................................648
tDropboxDelete Standard properties..................................................................................................................... 648
Related scenarios...........................................................................................................................................................649
tDropboxGet.............................................................................................................. 650
tDropboxGet Standard properties............................................................................................................................650
Related scenarios...........................................................................................................................................................651
tDropboxList..............................................................................................................652
tDropboxList Standard properties........................................................................................................................... 652
Related scenarios...........................................................................................................................................................653
tDropboxPut.............................................................................................................. 654
tDropboxPut Standard properties............................................................................................................................654
Uploading files to Dropbox....................................................................................................................................... 655
tDTDValidator............................................................................................................661
tDTDValidator Standard properties.........................................................................................................................661
Validating XML files..................................................................................................................................................... 662
Validating XML files..................................................................................................................................................... 662
tDynamoDBInput.......................................................................................................665
tDynamoDBInput Standard properties...................................................................................................................665
Writing and extracting JSON documents from DynamoDB............................................................................668
tDynamoDBOutput....................................................................................................675
tDynamoDBOutput Standard properties............................................................................................................... 675
Related scenarios...........................................................................................................................................................677
tEDIFACTtoXML.........................................................................................................678
tEDIFACTtoXML Standard properties..................................................................................................................... 678
Reading an EDIFACT message file and saving it to XML...............................................................................679
tELTGreenplumInput................................................................................................ 682
tELTGreenplumInput Standard properties............................................................................................................682
Related scenarios...........................................................................................................................................................683
tELTGreenplumMap.................................................................................................. 684
tELTGreenplumMap Standard properties..............................................................................................................684
Mapping data using a simple implicit join..........................................................................................................686
Related scenarios...........................................................................................................................................................693
tELTGreenplumOutput............................................................................................. 694
tELTGreenplumOutput Standard properties........................................................................................................ 694
Related scenarios...........................................................................................................................................................696
tELTHiveInput............................................................................................................697
tELTHiveInput Standard properties........................................................................................................................ 697
Related scenarios...........................................................................................................................................................698
tELTHiveMap............................................................................................................. 699
tELTHiveMap Standard properties.......................................................................................................................... 699
Joining table columns and writing them into Hive.......................................................................................... 710
Related scenarios...........................................................................................................................................................717
tELTHiveOutput.........................................................................................................718
tELTHiveOutput Standard properties..................................................................................................................... 718
Related scenarios...........................................................................................................................................................720
tELTInput................................................................................................................... 721
tELTInput Standard properties................................................................................................................................. 721
Related scenarios...........................................................................................................................................................722
tELTMap..................................................................................................................... 723
tELTMap Standard properties................................................................................................................................... 723
Aggregating Snowflake data using context variables as table and connection names.......................725
Related scenarios...........................................................................................................................................................729
tELTOutput................................................................................................................ 730
tELTOutput Standard properties.............................................................................................................................. 730
Related scenarios...........................................................................................................................................................732
tELTMSSqlInput........................................................................................................ 733
tELTMSSqlInput Standard properties.....................................................................................................................733
Related scenarios...........................................................................................................................................................734
tELTMSSqlMap.......................................................................................................... 735
tELTMSSqlMap Standard properties.......................................................................................................................735
Related scenarios...........................................................................................................................................................737
tELTMSSqlOutput......................................................................................................738
tELTMSSqlOutput Standard properties..................................................................................................................738
Related scenarios...........................................................................................................................................................740
tELTMysqlInput......................................................................................................... 741
tELTMysqlInput Standard properties......................................................................................................................741
Related scenarios...........................................................................................................................................................742
tELTMysqlMap...........................................................................................................743
tELTMysqlMap Standard properties........................................................................................................................743
Aggregating table columns and filtering............................................................................................................. 745
Mapping date using using an Alias table............................................................................................................ 749
Related scenarios...........................................................................................................................................................753
tELTMysqlOutput...................................................................................................... 754
tELTMysqlOutput Standard properties.................................................................................................................. 754
Related scenarios...........................................................................................................................................................756
tELTNetezzaInput..................................................................................................... 757
tELTNetezzaInput Standard properties..................................................................................................................757
Related scenarios...........................................................................................................................................................758
tELTNetezzaMap....................................................................................................... 759
tELTNetezzaMap Standard properties....................................................................................................................759
Related scenarios...........................................................................................................................................................761
tELTNetezzaOutput.................................................................................................. 762
tELTNetezzaOutput Standard properties.............................................................................................................. 762
Related scenarios...........................................................................................................................................................764
tELTOracleInput........................................................................................................ 765
tELTOracleInput Standard properties.....................................................................................................................765
Related scenarios...........................................................................................................................................................766
tELTOracleMap.......................................................................................................... 767
tELTOracleMap Standard properties.......................................................................................................................767
Updating Oracle database entries...........................................................................................................................769
Related scenario.............................................................................................................................................................772
tELTOracleOutput..................................................................................................... 773
tELTOracleOutput Standard properties................................................................................................................. 773
Managing data using the Oracle MERGE function............................................................................................775
tELTPostgresqlInput................................................................................................. 780
tELTPostgresqlInput Standard properties.............................................................................................................780
Related scenarios...........................................................................................................................................................781
tELTPostgresqlMap...................................................................................................782
tELTPostgresqlMap Standard properties...............................................................................................................782
Related scenarios...........................................................................................................................................................784
tELTPostgresqlOutput.............................................................................................. 785
tELTPostgresqlOutput Standard properties......................................................................................................... 785
Related scenarios...........................................................................................................................................................787
tELTSybaseInput....................................................................................................... 788
tELTSybaseInput Standard properties....................................................................................................................788
Related scenarios...........................................................................................................................................................789
tELTSybaseMap......................................................................................................... 790
tELTSybaseMap Standard properties......................................................................................................................790
Related scenarios...........................................................................................................................................................792
tELTSybaseOutput.................................................................................................... 793
tELTSybaseOutput Standard properties................................................................................................................ 793
Related scenarios...........................................................................................................................................................795
tELTTeradataInput.................................................................................................... 796
tELTTeradataInput Standard properties................................................................................................................796
Related scenarios...........................................................................................................................................................797
tELTTeradataMap......................................................................................................798
tELTTeradataMap Standard properties..................................................................................................................798
Mapping data using a subquery.............................................................................................................................. 800
Related scenarios...........................................................................................................................................................809
tELTTeradataOutput................................................................................................. 810
tELTTeradataOutput Standard properties.............................................................................................................810
Related scenarios...........................................................................................................................................................812
tELTVerticaInput....................................................................................................... 813
tELTVerticaInput Standard properties....................................................................................................................813
Related scenarios...........................................................................................................................................................814
tELTVerticaMap.........................................................................................................815
tELTVerticaMap Standard properties......................................................................................................................815
Related scenarios...........................................................................................................................................................817
tELTVerticaOutput.................................................................................................... 818
tELTVerticaOutput Standard properties................................................................................................................ 818
Related scenarios...........................................................................................................................................................820
tESBConsumer........................................................................................................... 821
tESBConsumer Standard properties........................................................................................................................821
Using tESBConsumer to retrieve the valid email..............................................................................................826
Using tESBConsumer with custom SOAP Headers............................................................................................833
tESBProviderFault.....................................................................................................844
tESBProviderFault Standard properties................................................................................................................. 844
Requesting airport names based on country codes......................................................................................... 845
tESBProviderRequest................................................................................................857
tESBProviderRequest Standard properties........................................................................................................... 857
Sending a message without expecting a response.......................................................................................... 859
tESBProviderResponse............................................................................................. 869
tESBProviderResponse Standard properties........................................................................................................ 869
Returning Hello world response..............................................................................................................................870
tEXABulkExec............................................................................................................ 881
tEXABulkExec Standard properties......................................................................................................................... 881
Settings for different sources of import data..................................................................................................... 886
Importing data into an EXASolution database table from a local CSV file..............................................889
tEXAClose...................................................................................................................895
tEXAClose Standard properties................................................................................................................................ 895
Related scenario.............................................................................................................................................................896
tEXACommit.............................................................................................................. 897
tEXACommit Standard properties............................................................................................................................897
Related scenario.............................................................................................................................................................898
tEXAConnection........................................................................................................ 899
tEXAConnection Standard properties.....................................................................................................................899
Related scenario.............................................................................................................................................................901
tEXAInput...................................................................................................................902
tEXAInput Standard properties.................................................................................................................................902
Related scenario.............................................................................................................................................................905
tEXAOutput................................................................................................................906
tEXAOutput Standard properties............................................................................................................................. 906
Related scenario.............................................................................................................................................................911
tEXARollback............................................................................................................. 912
tEXARollback Standard properties.......................................................................................................................... 912
Related Scenario............................................................................................................................................................ 913
tEXARow.................................................................................................................... 914
tEXARow Standard properties...................................................................................................................................914
Related Scenario............................................................................................................................................................ 917
tEXistConnection.......................................................................................................918
tEXistConnection Standard properties...................................................................................................................918
Related scenarios...........................................................................................................................................................919
tEXistDelete...............................................................................................................920
tEXistDelete Standard properties............................................................................................................................ 920
Related scenarios...........................................................................................................................................................921
tEXistGet.................................................................................................................... 922
tEXistGet Standard properties.................................................................................................................................. 922
Retrieving resources from a remote eXist DB server...................................................................................... 923
tEXistList....................................................................................................................926
tEXistList Standard properties.................................................................................................................................. 926
Related scenario.............................................................................................................................................................927
tEXistPut.................................................................................................................... 928
tEXistPut Standard properties...................................................................................................................................928
Related scenarios...........................................................................................................................................................929
tEXistXQuery............................................................................................................. 930
tEXistXQuery Standard properties...........................................................................................................................930
Related scenarios...........................................................................................................................................................931
tEXistXUpdate........................................................................................................... 932
tEXistXUpdate Standard properties........................................................................................................................ 932
Related scenarios...........................................................................................................................................................933
tExternalSortRow......................................................................................................934
tExternalSortRow Standard properties.................................................................................................................. 934
Related scenario.............................................................................................................................................................936
tExtractDelimitedFields........................................................................................... 937
tExtractDelimitedFields Standard properties...................................................................................................... 937
Extracting a delimited string column of a database table............................................................................ 939
tExtractJSONFields....................................................................................................945
tExtractJSONFields Standard properties............................................................................................................... 945
Retrieving error messages while extracting data from JSON fields........................................................... 947
Collecting data from your favorite online social network............................................................................. 952
Extracting data from a JSON file through looping........................................................................................... 956
tExtractPositionalFields........................................................................................... 963
tExtractPositionalFields Standard properties......................................................................................................963
Related scenario.............................................................................................................................................................965
tExtractRegexFields..................................................................................................966
tExtractRegexFields Standard properties............................................................................................................. 966
Extracting name, domain and TLD from e-mail addresses............................................................................967
tExtractXMLField...................................................................................................... 971
tExtractXMLField Standard properties...................................................................................................................971
Extracting XML data from a field in a database table....................................................................................973
Extracting correct and erroneous data from an XML field in a delimited file........................................975
tFileArchive................................................................................................................979
tFileArchive Standard properties............................................................................................................................. 979
Zipping files using a tFileArchive........................................................................................................................... 981
tFileCompare............................................................................................................. 984
tFileCompare Standard properties.......................................................................................................................... 984
Comparing unzipped files...........................................................................................................................................985
tFileCopy.................................................................................................................... 988
tFileCopy Standard properties.................................................................................................................................. 988
Restoring files from bin.............................................................................................................................................. 990
tFileDelete................................................................................................................. 992
tFileDelete Standard properties...............................................................................................................................992
Deleting files................................................................................................................................................................... 993
tFileExist.................................................................................................................... 995
tFileExist Standard properties.................................................................................................................................. 995
Checking for the presence of a file and creating it if it does not exist.................................................... 996
tFileFetch.................................................................................................................1000
tFileFetch Standard properties.............................................................................................................................. 1000
Fetching data through HTTP.................................................................................................................................. 1003
Reusing stored cookie to fetch files through HTTP...................................................................................... 1005
Related scenario.......................................................................................................................................................... 1009
tFileInputARFF........................................................................................................ 1010
tFileInputARFF Standard properties.....................................................................................................................1010
Displaying the content of a ARFF file................................................................................................................ 1011
tFileInputDelimited................................................................................................ 1015
tFileInputDelimited Standard properties............................................................................................................1015
Reading data from a Delimited file and display the output.......................................................................1018
Reading data from a remote file in streaming mode....................................................................................1020
tFileInputExcel........................................................................................................ 1024
tFileInputExcel Standard properties.................................................................................................................... 1024
Related scenarios........................................................................................................................................................ 1027
tFileInputFullRow................................................................................................... 1028
tFileInputFullRow Standard properties...............................................................................................................1028
Reading full rows in a delimited file.................................................................................................................. 1029
tFileInputJSON........................................................................................................ 1032
tFileInputJSON Standard properties.....................................................................................................................1032
Extracting JSON data from a file using JSONPath without setting a loop node..................................1034
Extracting JSON data from a file using JSONPath..........................................................................................1037
Extracting JSON data from a file using XPath.................................................................................................1039
Extracting JSON data from a URL.........................................................................................................................1040
tFileInputLDIF......................................................................................................... 1045
tFileInputLDIF Standard properties......................................................................................................................1045
Related scenario.......................................................................................................................................................... 1047
tFileInputMail..........................................................................................................1048
tFileInputMail Standard properties...................................................................................................................... 1048
Extracting key fields from an email.................................................................................................................... 1050
tFileInputMSDelimited...........................................................................................1052
tFileInputMSDelimited Standard properties..................................................................................................... 1052
The Multi Schema Editor......................................................................................................................................... 1053
Reading a multi structure delimited file............................................................................................................1054
tFileInputMSPositional.......................................................................................... 1061
tFileInputMSPositional Standard properties..................................................................................................... 1061
Reading data from a positional file.....................................................................................................................1063
tFileInputMSXML....................................................................................................1067
tFileInputMSXML Standard properties................................................................................................................1067
Reading a multi-structure XML file..................................................................................................................... 1068
tFileInputPositional................................................................................................1072
tFileInputPositional Standard properties........................................................................................................... 1072
Reading a Positional file and saving filtered results to XML.....................................................................1075
tFileInputProperties............................................................................................... 1079
tFileInputProperties Standard properties...........................................................................................................1079
Reading and matching the keys and the values of different .properties files and outputting the
results in a glossary...................................................................................................................................................1080
tFileInputRaw..........................................................................................................1085
tFileInputRaw Standard properties.......................................................................................................................1085
Related Scenario..........................................................................................................................................................1086
tFileInputRegex...................................................................................................... 1087
tFileInputRegex Standard properties...................................................................................................................1087
Reading data using a Regex and outputting the result to Positional file............................................. 1089
tFileInputXML......................................................................................................... 1092
tFileInputXML Standard properties...................................................................................................................... 1092
Reading and extracting data from an XML structure....................................................................................1095
Extracting erroneous XML data via a reject flow...........................................................................................1096
tFileList.................................................................................................................... 1100
tFileList Standard properties.................................................................................................................................. 1100
Iterating on a file directory.....................................................................................................................................1102
Finding duplicate files between two folders....................................................................................................1104
tFileOutputARFF..................................................................................................... 1110
tFileOutputARFF Standard properties................................................................................................................. 1110
Related scenario.......................................................................................................................................................... 1112
tFileOutputDelimited............................................................................................. 1113
tFileOutputDelimited Standard properties........................................................................................................ 1113
Writing data in a delimited file.............................................................................................................................1116
Utilizing Output Stream to save filtered data to a local file......................................................................1120
tFileOutputExcel..................................................................................................... 1123
tFileOutputExcel Standard properties.................................................................................................................1123
Related scenario.......................................................................................................................................................... 1126
tFileOutputJSON..................................................................................................... 1127
tFileOutputJSON Standard properties..................................................................................................................1127
Writing a JSON structured file............................................................................................................................... 1128
tFileOutputLDIF...................................................................................................... 1131
tFileOutputLDIF Standard properties.................................................................................................................. 1131
Writing data from a database table into an LDIF file...................................................................................1133
tFileOutputMSDelimited........................................................................................1138
tFileOutputMSDelimited Standard properties.................................................................................................. 1138
Related scenarios........................................................................................................................................................ 1139
tFileOutputMSPositional....................................................................................... 1140
tFileOutputMSPositional Standard properties..................................................................................................1140
Related scenarios........................................................................................................................................................ 1141
tFileOutputMSXML................................................................................................. 1142
tFileOutputMSXML Standard properties.............................................................................................................1142
Defining the MultiSchema XML tree................................................................................................................... 1143
Mapping XML data from multiple schema sources....................................................................................... 1144
Defining the node status......................................................................................................................................... 1145
Related scenarios........................................................................................................................................................ 1146
tFileOutputPositional.............................................................................................1147
tFileOutputPositional Standard properties........................................................................................................1147
Related scenario.......................................................................................................................................................... 1150
tFileOutputProperties............................................................................................ 1151
tFileOutputProperties Standard properties....................................................................................................... 1151
Related scenarios........................................................................................................................................................ 1152
tFileOutputRaw.......................................................................................................1153
tFileOutputRaw Standard properties................................................................................................................... 1153
tFileOutputXML...................................................................................................... 1155
tFileOutputXML Standard properties...................................................................................................................1155
Related scenarios........................................................................................................................................................ 1157
tFileProperties........................................................................................................ 1158
tFileProperties Standard properties..................................................................................................................... 1158
Displaying the properties of a processed file.................................................................................................. 1159
tFileRowCount.........................................................................................................1161
tFileRowCount Standard properties..................................................................................................................... 1161
Writing a file to MySQL if the number of its records matches a reference value............................... 1162
tFileTouch................................................................................................................1166
tFileTouch Standard properties............................................................................................................................. 1166
Related scenarios........................................................................................................................................................ 1167
tFileUnarchive......................................................................................................... 1168
tFileUnarchive Standard properties......................................................................................................................1168
Related scenario.......................................................................................................................................................... 1169
tFilterColumns........................................................................................................ 1170
tFilterColumns Standard properties..................................................................................................................... 1170
Related Scenario..........................................................................................................................................................1171
tFilterRow................................................................................................................ 1172
tFilterRow Standard properties..............................................................................................................................1172
Filtering a list of names using simple conditions.......................................................................................... 1173
Filtering a list of names through different logical operations.................................................................. 1177
tFirebirdClose..........................................................................................................1179
tFirebirdClose Standard properties.......................................................................................................................1179
Related scenarios........................................................................................................................................................ 1180
tFirebirdCommit......................................................................................................1181
tFirebirdCommit Standard properties..................................................................................................................1181
Related scenario.......................................................................................................................................................... 1182
tFirebirdConnection................................................................................................1183
tFirebirdConnection Standard properties...........................................................................................................1183
Related scenarios........................................................................................................................................................ 1184
tFirebirdInput.......................................................................................................... 1185
tFirebirdInput Standard properties.......................................................................................................................1185
Related scenarios........................................................................................................................................................ 1187
tFirebirdOutput....................................................................................................... 1189
tFirebirdOutput Standard properties....................................................................................................................1189
Related scenarios........................................................................................................................................................ 1193
tFirebirdRollback.................................................................................................... 1194
tFirebirdRollback Standard properties................................................................................................................ 1194
Related scenario.......................................................................................................................................................... 1195
tFirebirdRow............................................................................................................1196
tFirebirdRow Standard properties.........................................................................................................................1196
Related scenarios........................................................................................................................................................ 1199
tFixedFlowInput......................................................................................................1200
tFixedFlowInput Standard properties..................................................................................................................1200
Related scenarios........................................................................................................................................................ 1201
tFlowMeter.............................................................................................................. 1202
tFlowMeter Standard properties............................................................................................................................1202
Related scenario.......................................................................................................................................................... 1203
tFlowMeterCatcher................................................................................................. 1204
tFlowMeterCatcher Standard properties.............................................................................................................1204
Catching flow metrics from a Job.........................................................................................................................1205
tFlowToIterate........................................................................................................ 1209
tFlowToIterate Standard properties..................................................................................................................... 1209
Transforming data flow to a list...........................................................................................................................1210
tForeach................................................................................................................... 1214
tForeach Standard properties................................................................................................................................. 1214
Iterating on a list and retrieving the values.................................................................................................... 1214
tFTPClose.................................................................................................................1217
tFTPClose Standard properties.............................................................................................................................. 1217
Related scenarios........................................................................................................................................................ 1217
tFTPConnection...................................................................................................... 1218
tFTPConnection Standard properties...................................................................................................................1218
Related scenarios........................................................................................................................................................ 1220
tFTPDelete...............................................................................................................1221
tFTPDelete Standard properties............................................................................................................................ 1221
Related scenario.......................................................................................................................................................... 1224
tFTPFileExist........................................................................................................... 1225
tFTPFileExist Standard properties........................................................................................................................ 1225
Related scenario.......................................................................................................................................................... 1227
tFTPFileList............................................................................................................. 1228
tFTPFileList Standard properties...........................................................................................................................1228
Listing and getting files/folders on an FTP directory...................................................................................1230
tFTPFileProperties..................................................................................................1236
tFTPFileProperties Standard properties..............................................................................................................1236
Related scenario.......................................................................................................................................................... 1238
tFTPGet.................................................................................................................... 1239
tFTPGet Standard properties.................................................................................................................................. 1239
Related scenario.......................................................................................................................................................... 1242
tFTPPut.................................................................................................................... 1243
tFTPPut Standard properties...................................................................................................................................1243
Putting files onto an FTP server...........................................................................................................................1246
tFTPRename............................................................................................................ 1250
tFTPRename Standard properties......................................................................................................................... 1250
Renaming a file located on an FTP server........................................................................................................1253
tFTPTruncate...........................................................................................................1256
tFTPTruncate Standard properties........................................................................................................................1256
Related scenario.......................................................................................................................................................... 1258
tFuzzyMatch............................................................................................................ 1259
tFuzzyMatch Standard properties......................................................................................................................... 1259
Checking the Levenshtein distance of 0 in first names............................................................................... 1260
Checking the Levenshtein distance of 1 or 2 in first names......................................................................1263
Checking the Metaphonic distance in first name........................................................................................... 1264
tGoogleDataprocManage....................................................................................... 1266
tGoogleDataprocManage Standard properties................................................................................................. 1266
tGoogleDriveConnection........................................................................................1268
tGoogleDriveConnection Standard properties..................................................................................................1268
OAuth methods for accessing Google Drive.....................................................................................................1270
Related scenario.......................................................................................................................................................... 1279
tGoogleDriveCopy...................................................................................................1280
tGoogleDriveCopy Standard properties...............................................................................................................1280
Related scenario.......................................................................................................................................................... 1282
tGoogleDriveCreate................................................................................................ 1283
tGoogleDriveCreate Standard properties............................................................................................................1283
Related scenario.......................................................................................................................................................... 1285
tGoogleDriveDelete................................................................................................ 1286
tGoogleDriveDelete Standard properties........................................................................................................... 1286
Related scenario.......................................................................................................................................................... 1288
tGoogleDriveGet..................................................................................................... 1289
tGoogleDriveGet Standard properties..................................................................................................................1289
Related scenario.......................................................................................................................................................... 1291
tGoogleDriveList..................................................................................................... 1292
tGoogleDriveList Standard properties................................................................................................................. 1292
Related scenario.......................................................................................................................................................... 1294
tGoogleDrivePut..................................................................................................... 1295
tGoogleDrivePut Standard properties..................................................................................................................1295
Managing files with Google Drive........................................................................................................................1297
tGPGDecrypt............................................................................................................ 1306
tGPGDecrypt Standard properties......................................................................................................................... 1306
Decrypting a GnuPG-encrypted file and display its content.......................................................................1307
tGreenplumBulkExec..............................................................................................1311
tGreenplumBulkExec Standard properties.........................................................................................................1311
Related scenarios........................................................................................................................................................ 1314
tGreenplumClose.................................................................................................... 1315
tGreenplumClose Standard properties................................................................................................................ 1315
Related scenarios........................................................................................................................................................ 1316
tGreenplumCommit................................................................................................ 1317
tGreenplumCommit Standard properties............................................................................................................1317
Related scenarios........................................................................................................................................................ 1318
tGreenplumConnection.......................................................................................... 1319
tGreenplumConnection Standard properties.................................................................................................... 1319
Related scenarios........................................................................................................................................................ 1320
tGreenplumGPLoad................................................................................................ 1321
tGreenplumGPLoad Standard properties............................................................................................................1321
Related scenario.......................................................................................................................................................... 1326
tGreenplumInput.....................................................................................................1327
tGreenplumInput Standard properties.................................................................................................................1327
Related scenarios........................................................................................................................................................ 1329
tGreenplumOutput..................................................................................................1330
tGreenplumOutput Standard properties............................................................................................................. 1330
Related scenarios........................................................................................................................................................ 1334
tGreenplumOutputBulk..........................................................................................1336
tGreenplumOutputBulk Standard properties.................................................................................................... 1336
Related scenarios........................................................................................................................................................ 1338
tGreenplumOutputBulkExec..................................................................................1339
tGreenplumOutputBulkExec Standard properties........................................................................................... 1339
Related scenarios........................................................................................................................................................ 1341
tGreenplumRollback...............................................................................................1342
tGreenplumRollback Standard properties..........................................................................................................1342
Related scenarios........................................................................................................................................................ 1343
tGreenplumRow...................................................................................................... 1344
tGreenplumRow Standard properties.................................................................................................................. 1344
Related scenarios........................................................................................................................................................ 1347
tGreenplumSCD.......................................................................................................1348
tGreenplumSCD Standard properties...................................................................................................................1348
Related scenario.......................................................................................................................................................... 1351
tGroovy.....................................................................................................................1352
tGroovy Standard properties................................................................................................................................... 1352
Related Scenarios........................................................................................................................................................1353
tGroovyFile.............................................................................................................. 1354
tGroovyFile Standard properties............................................................................................................................1354
Calling a file which contains Groovy code........................................................................................................1355
tGSBucketCreate..................................................................................................... 1357
tGSBucketCreate Standard properties................................................................................................................. 1357
Related scenario.......................................................................................................................................................... 1358
tGSBucketDelete.....................................................................................................1359
tGSBucketDelete Standard properties................................................................................................................. 1359
Related scenarios........................................................................................................................................................ 1360
tGSBucketExist........................................................................................................1361
tGSBucketExist Standard properties.................................................................................................................... 1361
Related scenario.......................................................................................................................................................... 1362
tGSBucketList.......................................................................................................... 1363
tGSBucketList Standard properties.......................................................................................................................1363
Related scenario.......................................................................................................................................................... 1364
tGSClose...................................................................................................................1365
tGSClose Standard properties.................................................................................................................................1365
Related scenario.......................................................................................................................................................... 1365
tGSConnection.........................................................................................................1366
tGSConnection Standard properties..................................................................................................................... 1366
Related scenario.......................................................................................................................................................... 1367
tGSCopy....................................................................................................................1368
tGSCopy Standard properties..................................................................................................................................1368
Related scenario.......................................................................................................................................................... 1369
tGSDelete.................................................................................................................1370
tGSDelete Standard properties.............................................................................................................................. 1370
Related scenario.......................................................................................................................................................... 1371
tGSGet...................................................................................................................... 1372
tGSGet Standard properties.....................................................................................................................................1372
Related scenarios........................................................................................................................................................ 1374
tGSList......................................................................................................................1375
tGSList Standard properties.................................................................................................................................... 1375
Related scenario.......................................................................................................................................................... 1376
tGSPut...................................................................................................................... 1377
tGSPut Standard properties.....................................................................................................................................1377
Managing files with Google Cloud Storage...................................................................................................... 1378
tHashInput............................................................................................................... 1386
tHashInput Standard properties............................................................................................................................ 1386
Reading data from the cache memory for high-speed data access......................................................... 1387
Clearing the memory before loading data to it in case an iterator exists in the same subJob....... 1391
tHashOutput............................................................................................................ 1395
tHashOutput Standard properties......................................................................................................................... 1395
Related scenarios........................................................................................................................................................ 1397
tHBaseClose.............................................................................................................1398
tHBaseClose Standard properties..........................................................................................................................1398
Related scenario.......................................................................................................................................................... 1399
tHBaseConnection.................................................................................................. 1400
tHBaseConnection Standard properties..............................................................................................................1400
Related scenario.......................................................................................................................................................... 1404
tHBaseInput.............................................................................................................1405
HBase filters.................................................................................................................................................................. 1405
tHBaseInput Standard properties..........................................................................................................................1406
Exchanging customer data with HBase..............................................................................................................1411
tHBaseOutput..........................................................................................................1419
tHBaseOutput Standard properties.......................................................................................................................1419
Related scenario.......................................................................................................................................................... 1424
tHCatalogInput........................................................................................................1425
tHCatalogInput Standard properties.................................................................................................................... 1425
Related scenario.......................................................................................................................................................... 1430
tHCatalogLoad........................................................................................................ 1431
tHCatalogLoad Standard properties.....................................................................................................................1431
Related scenario.......................................................................................................................................................... 1435
tHCatalogOperation................................................................................................1436
tHCatalogOperation Standard properties...........................................................................................................1436
Managing HCatalog tables on Hortonworks Data Platform........................................................................1444
tHCatalogOutput.....................................................................................................1453
tHCatalogOutput Standard properties.................................................................................................................1453
Related scenario.......................................................................................................................................................... 1459
tHDFSCompare........................................................................................................1460
tHDFSCompare Standard properties.................................................................................................................... 1460
Related scenarios........................................................................................................................................................ 1465
tHDFSConnection....................................................................................................1466
tHDFSConnection Standard properties............................................................................................................... 1466
Related scenarios........................................................................................................................................................ 1472
tHDFSCopy...............................................................................................................1473
tHDFSCopy Standard properties............................................................................................................................ 1473
Related scenario.......................................................................................................................................................... 1478
tHDFSDelete............................................................................................................1479
tHDFSDelete Standard properties.........................................................................................................................1479
Related scenarios........................................................................................................................................................ 1483
tHDFSExist...............................................................................................................1484
tHDFSExist Standard properties............................................................................................................................ 1484
Checking the existence of a file in HDFS......................................................................................................... 1489
tHDFSGet................................................................................................................. 1493
tHDFSGet Standard properties............................................................................................................................... 1493
Computing data with Hadoop distributed file system..................................................................................1498
tHDFSInput.............................................................................................................. 1505
tHDFSInput Standard properties........................................................................................................................... 1505
Using HDFS components to work with Azure Data Lake Storage (ADLS)..............................................1511
tHDFSList................................................................................................................. 1517
tHDFSList Standard properties...............................................................................................................................1517
Iterating on a HDFS directory................................................................................................................................ 1523
tHDFSOutput........................................................................................................... 1528
tHDFSOutput Standard properties........................................................................................................................1528
Related scenario.......................................................................................................................................................... 1534
tHDFSOutputRaw....................................................................................................1535
tHDFSOutputRaw Standard properties............................................................................................................... 1535
Related Scenario..........................................................................................................................................................1541
tHDFSProperties..................................................................................................... 1542
tHDFSProperties Standard properties..................................................................................................................1542
Related scenario.......................................................................................................................................................... 1547
tHDFSPut................................................................................................................. 1548
tHDFSPut Standard properties............................................................................................................................... 1548
Related scenario.......................................................................................................................................................... 1553
tHDFSRename......................................................................................................... 1554
tHDFSRename Standard properties......................................................................................................................1554
Related scenario.......................................................................................................................................................... 1559
tHDFSRowCount..................................................................................................... 1560
tHDFSRowCount Standard properties................................................................................................................. 1560
Related scenarios........................................................................................................................................................ 1565
tHiveClose................................................................................................................1566
tHiveClose Standard properties............................................................................................................................. 1566
Related scenarios........................................................................................................................................................ 1567
tHiveConnection..................................................................................................... 1568
tHiveConnection Standard properties................................................................................................................. 1568
Connecting to a custom Hadoop distribution.................................................................................................. 1579
Creating a partitioned Hive table......................................................................................................................... 1582
Creating a JDBC Connection to Azure HDInsight Hive................................................................................. 1589
tHiveCreateTable.................................................................................................... 1596
tHiveCreateTable Standard properties................................................................................................................1596
Related scenario.......................................................................................................................................................... 1608
tHiveInput................................................................................................................1609
tHiveInput Standard properties............................................................................................................................. 1609
Related scenarios........................................................................................................................................................ 1621
tHiveLoad.................................................................................................................1622
tHiveLoad Standard properties.............................................................................................................................. 1622
Related scenario.......................................................................................................................................................... 1633
tHiveRow................................................................................................................. 1634
tHiveRow Standard properties............................................................................................................................... 1634
Connecting to a security-enabled MapR............................................................................................................1646
Related scenarios........................................................................................................................................................ 1649
tHSQLDbInput......................................................................................................... 1650
tHSQLDbInput Standard properties......................................................................................................................1650
Related scenarios........................................................................................................................................................ 1652
tHSQLDbOutput...................................................................................................... 1653
tHSQLDbOutput Standard properties.................................................................................................................. 1653
Related scenarios........................................................................................................................................................ 1657
tHSQLDbRow...........................................................................................................1658
tHSQLDbRow Standard properties....................................................................................................................... 1658
Related scenarios........................................................................................................................................................ 1661
tHttpRequest........................................................................................................... 1662
tHttpRequest Standard properties........................................................................................................................ 1662
Sending a HTTP request to the server and saving the response information to a local file.......... 1664
Sending a POST request from a local JSON file............................................................................................. 1666
tImpalaClose........................................................................................................... 1670
tImpalaClose Standard properties........................................................................................................................ 1670
Related scenarios........................................................................................................................................................ 1671
tImpalaConnection................................................................................................. 1672
tImpalaConnection Standard properties.............................................................................................................1672
Related scenario.......................................................................................................................................................... 1675
tImpalaCreateTable................................................................................................1676
tImpalaCreateTable Standard properties........................................................................................................... 1676
Related scenario.......................................................................................................................................................... 1682
tImpalaInput............................................................................................................1683
tImpalaInput Standard properties.........................................................................................................................1683
Related scenarios........................................................................................................................................................ 1687
tImpalaLoad............................................................................................................ 1688
tImpalaLoad Standard properties..........................................................................................................................1688
Related scenario.......................................................................................................................................................... 1692
tImpalaOutput.........................................................................................................1693
tImpalaOutput Standard properties..................................................................................................................... 1693
Related scenarios........................................................................................................................................................ 1697
tImpalaRow............................................................................................................. 1698
tImpalaRow Standard properties...........................................................................................................................1698
Related scenarios........................................................................................................................................................ 1702
tInfiniteLoop............................................................................................................1704
tInfiniteLoop Standard properties.........................................................................................................................1704
Related scenario.......................................................................................................................................................... 1705
tInformixBulkExec.................................................................................................. 1706
tInformixBulkExec Standard properties..............................................................................................................1706
Related scenario.......................................................................................................................................................... 1710
tInformixClose.........................................................................................................1711
tInformixClose Standard properties..................................................................................................................... 1711
Related scenario.......................................................................................................................................................... 1712
tInformixCommit.................................................................................................... 1713
tInformixCommit Standard properties................................................................................................................ 1713
Related Scenario..........................................................................................................................................................1714
tInformixConnection.............................................................................................. 1715
tInformixConnection Standard properties......................................................................................................... 1715
Related scenario.......................................................................................................................................................... 1716
tInformixInput.........................................................................................................1717
tInformixInput Standard properties..................................................................................................................... 1717
Related scenarios........................................................................................................................................................ 1719
tInformixOutput......................................................................................................1720
tInformixOutput Standard properties.................................................................................................................. 1720
Related scenarios........................................................................................................................................................ 1725
tInformixOutputBulk.............................................................................................. 1726
tInformixOutputBulk Standard properties......................................................................................................... 1726
Related scenario.......................................................................................................................................................... 1728
tInformixOutputBulkExec...................................................................................... 1729
tInformixOutputBulkExec Standard properties................................................................................................ 1729
Related scenario.......................................................................................................................................................... 1732
tInformixRollback................................................................................................... 1733
tInformixRollback Standard properties...............................................................................................................1733
Related Scenario..........................................................................................................................................................1734
tInformixRow.......................................................................................................... 1735
tInformixRow Standard properties....................................................................................................................... 1735
Related scenarios........................................................................................................................................................ 1738
tInformixSCD........................................................................................................... 1739
tInformixSCD Standard properties........................................................................................................................1739
Related scenario.......................................................................................................................................................... 1742
tInformixSP..............................................................................................................1743
tInformixSP Standard properties...........................................................................................................................1743
Related scenarios........................................................................................................................................................ 1745
tIngresBulkExec...................................................................................................... 1747
tIngresBulkExec Standard properties.................................................................................................................. 1747
Related scenarios........................................................................................................................................................ 1750
tIngresClose.............................................................................................................1751
tIngresClose Standard properties..........................................................................................................................1751
Related scenarios........................................................................................................................................................ 1752
tIngresCommit.........................................................................................................1753
tIngresCommit Standard properties..................................................................................................................... 1753
Related scenario.......................................................................................................................................................... 1754
tIngresConnection.................................................................................................. 1755
tIngresConnection Standard properties.............................................................................................................. 1755
Related scenarios........................................................................................................................................................ 1756
tIngresInput.............................................................................................................1757
tIngresInput Standard properties.......................................................................................................................... 1757
Related scenarios........................................................................................................................................................ 1759
tIngresOutput..........................................................................................................1761
tIngresOutput Standard properties.......................................................................................................................1761
Related scenarios........................................................................................................................................................ 1765
tIngresOutputBulk.................................................................................................. 1766
tIngresOutputBulk Standard properties..............................................................................................................1766
Related scenarios........................................................................................................................................................ 1768
tIngresOutputBulkExec.......................................................................................... 1769
tIngresOutputBulkExec Standard properties.....................................................................................................1769
Loading data to a table in the Ingres DBMS................................................................................................... 1772
Related scenarios........................................................................................................................................................ 1774
tIngresRollback....................................................................................................... 1775
tIngresRollback Standard properties................................................................................................................... 1775
Related scenarios........................................................................................................................................................ 1776
tIngresRow.............................................................................................................. 1777
tIngresRow Standard properties............................................................................................................................1777
Related scenarios........................................................................................................................................................ 1780
tIngresSCD............................................................................................................... 1781
tIngresSCD Standard properties............................................................................................................................ 1781
Related scenario.......................................................................................................................................................... 1783
tInterbaseClose....................................................................................................... 1784
tInterbaseClose Standard properties................................................................................................................... 1784
Related scenarios........................................................................................................................................................ 1785
tInterbaseCommit................................................................................................... 1786
tInterbaseCommit Standard properties...............................................................................................................1786
Related scenario.......................................................................................................................................................... 1787
tInterbaseConnection.............................................................................................1788
tInterbaseConnection Standard properties........................................................................................................1788
Related scenarios........................................................................................................................................................ 1789
tInterbaseInput....................................................................................................... 1790
tInterbaseInput Standard properties....................................................................................................................1790
Related scenarios........................................................................................................................................................ 1793
tInterbaseOutput.................................................................................................... 1794
tInterbaseOutput Standard properties................................................................................................................ 1794
Related scenarios........................................................................................................................................................ 1799
tInterbaseRollback..................................................................................................1800
tInterbaseRollback Standard properties............................................................................................................. 1800
Related scenarios........................................................................................................................................................ 1801
tInterbaseRow......................................................................................................... 1802
tInterbaseRow Standard properties......................................................................................................................1802
Related scenarios........................................................................................................................................................ 1805
tIntervalMatch.........................................................................................................1806
tIntervalMatch Standard properties..................................................................................................................... 1806
Identifying server locations based on their IP addresses............................................................................ 1807
tIterateToFlow........................................................................................................ 1811
tIterateToFlow Standard properties..................................................................................................................... 1811
Transforming a list of files as data flow........................................................................................................... 1812
tJasperOutput..........................................................................................................1815
tJasperOutput Standard properties.......................................................................................................................1815
Generating a report against a .jrxml template................................................................................................ 1817
tJasperOutputExec..................................................................................................1820
tJasperOutputExec Standard properties..............................................................................................................1820
Related Scenario..........................................................................................................................................................1821
tJava......................................................................................................................... 1822
tJava Standard properties.........................................................................................................................................1822
Printing out a variable content............................................................................................................................. 1823
tJavaDBInput........................................................................................................... 1827
tJavaDBInput Standard properties........................................................................................................................ 1827
Related scenarios........................................................................................................................................................ 1829
tJavaDBOutput........................................................................................................ 1830
tJavaDBOutput Standard properties..................................................................................................................... 1830
Related scenarios........................................................................................................................................................ 1833
tJavaDBRow.............................................................................................................1834
tJavaDBRow Standard properties.......................................................................................................................... 1834
Related scenarios........................................................................................................................................................ 1836
tJavaFlex.................................................................................................................. 1837
tJavaFlex Standard properties................................................................................................................................ 1837
Generating data flow................................................................................................................................................. 1838
Processing rows of data with tJavaFlex............................................................................................................. 1841
tJavaRow..................................................................................................................1845
tJavaRow Standard properties................................................................................................................................1845
Transforming data line by line using tJavaRow.............................................................................................. 1847
tJDBCClose...............................................................................................................1850
tJDBCClose Standard properties............................................................................................................................ 1850
Related scenarios........................................................................................................................................................ 1851
tJDBCColumnList.................................................................................................... 1852
tJDBCColumnList Standard properties.................................................................................................................1852
Related scenario.......................................................................................................................................................... 1853
tJDBCCommit...........................................................................................................1854
tJDBCCommit Standard properties........................................................................................................................1854
Related scenario.......................................................................................................................................................... 1855
tJDBCConnection.................................................................................................... 1856
tJDBCConnection Standard properties.................................................................................................................1856
Importing a database driver................................................................................................................................... 1858
Related scenario.......................................................................................................................................................... 1860
tJDBCInput............................................................................................................... 1861
tJDBCInput Standard properties.............................................................................................................................1861
Related scenarios........................................................................................................................................................ 1864
tJDBCOutput............................................................................................................ 1865
tJDBCOutput Standard properties......................................................................................................................... 1865
Related scenarios........................................................................................................................................................ 1869
tJDBCRollback......................................................................................................... 1870
tJDBCRollback Standard properties...................................................................................................................... 1870
Related scenario.......................................................................................................................................................... 1871
tJDBCRow.................................................................................................................1872
tJDBCRow Standard properties.............................................................................................................................. 1872
Related scenarios........................................................................................................................................................ 1875
tJDBCSCDELT...........................................................................................................1876
tJDBCSCDELT Standard properties....................................................................................................................... 1876
Tracking data changes in a Snowflake table using the tJDBCSCDELT component............................ 1879
tJDBCSP....................................................................................................................1889
tJDBCSP Standard properties.................................................................................................................................. 1889
Related scenario.......................................................................................................................................................... 1891
tJDBCTableList........................................................................................................ 1893
tJDBCTableList Standard properties.....................................................................................................................1893
Related scenario.......................................................................................................................................................... 1894
tJIRAInput................................................................................................................ 1895
tJIRAInput Standard properties.............................................................................................................................. 1895
Retrieving the project information from JIRA application...........................................................................1896
tJIRAOutput............................................................................................................. 1899
tJIRAOutput Standard properties...........................................................................................................................1899
Creating an issue in JIRA application..................................................................................................................1900
Updating an issue in JIRA application................................................................................................................ 1903
tJMSInput.................................................................................................................1908
tJMSInput Standard properties...............................................................................................................................1908
Related scenarios........................................................................................................................................................ 1910
tJMSOutput..............................................................................................................1911
tJMSOutput Standard properties........................................................................................................................... 1911
Enqueuing/dequeuing a message on the ActiveMQ server.........................................................................1912
Related scenarios........................................................................................................................................................ 1915
tJoin.......................................................................................................................... 1916
tJoin Standard properties......................................................................................................................................... 1916
Doing an exact match on two columns and outputting the main and rejected data........................ 1917
tKafkaCommit......................................................................................................... 1922
tKafkaCommit Standard properties...................................................................................................................... 1922
Related scenarios........................................................................................................................................................ 1922
tKafkaConnection................................................................................................... 1923
tKafkaConnection Standard properties............................................................................................................... 1923
Related scenarios........................................................................................................................................................ 1924
Kafka and AVRO in a Job......................................................................................................................................... 1924
tKafkaCreateTopic.................................................................................................. 1926
tKafkaCreateTopic Standard properties..............................................................................................................1926
Related scenarios........................................................................................................................................................ 1927
tKafkaInput..............................................................................................................1928
tKafkaInput Standard properties........................................................................................................................... 1928
Related scenarios........................................................................................................................................................ 1931
tKafkaOutput...........................................................................................................1932
tKafkaOutput Standard properties........................................................................................................................1932
Related scenarios........................................................................................................................................................ 1934
tLDAPAttributesInput.............................................................................................1935
tLDAPAttributesInput Standard properties........................................................................................................ 1935
Related scenario.......................................................................................................................................................... 1938
tLDAPClose..............................................................................................................1939
tLDAPClose Standard properties........................................................................................................................... 1939
Related scenarios........................................................................................................................................................ 1939
tLDAPConnection....................................................................................................1940
tLDAPConnection Standard properties................................................................................................................1940
Related scenarios........................................................................................................................................................ 1941
tLDAPInput.............................................................................................................. 1942
tLDAPInput Standard properties............................................................................................................................1942
Displaying LDAP directory's filtered content................................................................................................... 1944
tLDAPOutput........................................................................................................... 1947
tLDAPOutput Standard properties........................................................................................................................ 1947
Editing data in a LDAP directory.......................................................................................................................... 1950
tLDAPRenameEntry................................................................................................ 1953
tLDAPRenameEntry Standard properties............................................................................................................1953
Related scenarios........................................................................................................................................................ 1955
tLibraryLoad............................................................................................................ 1956
tLibraryLoad Standard properties......................................................................................................................... 1956
Importing an external library................................................................................................................................. 1957
Checking the format of an e-mail address........................................................................................................1958
tLineChart................................................................................................................ 1961
tLineChart Standard properties..............................................................................................................................1961
Creating a line chart to ease trend analysis.................................................................................................... 1963
tLogCatcher............................................................................................................. 1970
tLogCatcher Standard properties.......................................................................................................................... 1970
Catching messages triggered by a tWarn component.................................................................................. 1971
Catching the message triggered by a tDie component................................................................................ 1973
tLogRow...................................................................................................................1977
tLogRow Standard properties.................................................................................................................................1977
Related scenarios........................................................................................................................................................ 1978
tLoop........................................................................................................................ 1979
tLoop Standard properties.......................................................................................................................................1979
Executing a Job multiple times using a loop...................................................................................................1980
tMap......................................................................................................................... 1983
tMap Standard properties........................................................................................................................................ 1983
Mapping data using a filter and a simple explicit join................................................................................ 1985
Advanced mapping with lookup reload at each row.....................................................................................2003
Mapping with join output tables.......................................................................................................................... 2010
tMapRDBClose........................................................................................................ 2015
tMapRDBClose Standard properties..................................................................................................................... 2015
Related scenario.......................................................................................................................................................... 2016
tMapRDBConnection.............................................................................................. 2017
tMapRDBConnection Standard properties......................................................................................................... 2017
Related scenario.......................................................................................................................................................... 2021
tMapRDBInput.........................................................................................................2022
tMapRDBInput Standard properties..................................................................................................................... 2022
Related scenario.......................................................................................................................................................... 2027
tMapRDBOutput......................................................................................................2028
tMapRDBOutput Standard properties.................................................................................................................. 2028
Related scenario.......................................................................................................................................................... 2032
tMapROjaiInput.......................................................................................................2033
tMapROjaiInput Standard properties................................................................................................................... 2033
tMapROjaiOutput....................................................................................................2036
tMapROjaiOutput Standard properties................................................................................................................2036
Writing candidate data in a MapR-DB OJAI database................................................................................... 2039
tMapRStreamsCommit........................................................................................... 2043
tMapRStreamsCommit Standard properties...................................................................................................... 2043
Related scenarios........................................................................................................................................................ 2043
tMapRStreamsConnection..................................................................................... 2044
tMapRStreamsConnection Standard properties............................................................................................... 2044
Related scenarios........................................................................................................................................................ 2046
tMapRStreamsCreateStream................................................................................. 2047
tMapRStreamsCreateStream Standard properties...........................................................................................2047
Related scenarios........................................................................................................................................................ 2049
tMapRStreamsInput................................................................................................2050
tMapRStreamsInput Standard properties........................................................................................................... 2050
Related scenarios........................................................................................................................................................ 2054
tMapRStreamsOutput.............................................................................................2055
tMapRStreamsOutput Standard properties........................................................................................................2055
Related scenarios........................................................................................................................................................ 2057
tMarketoBulkExec...................................................................................................2058
tMarketoBulkExec Standard properties.............................................................................................................. 2058
Related scenario.......................................................................................................................................................... 2060
tMarketoConnection...............................................................................................2061
tMarketoConnection Standard properties.......................................................................................................... 2061
Related scenario.......................................................................................................................................................... 2062
tMarketoCampaign................................................................................................. 2063
tMarketoCampaign Standard properties.............................................................................................................2063
tMarketoInput......................................................................................................... 2067
tMarketoInput Standard properties...................................................................................................................... 2067
Related Scenario..........................................................................................................................................................2072
tMarketoListOperation...........................................................................................2073
tMarketoListOperation Standard properties......................................................................................................2073
Adding a lead record to a Marketo list using SOAP API.............................................................................. 2075
tMarketoOutput...................................................................................................... 2078
tMarketoOutput Standard properties...................................................................................................................2078
Transmitting data with Marketo using REST API........................................................................................... 2081
tMarkLogicBulkLoad...............................................................................................2087
tMarkLogicBulkLoad Standard properties..........................................................................................................2087
Related scenario.......................................................................................................................................................... 2089
tMarkLogicClose..................................................................................................... 2090
tMarkLogicClose Standard properties..................................................................................................................2090
Related scenario.......................................................................................................................................................... 2091
tMarkLogicConnection........................................................................................... 2092
tMarkLogicConnection Standard properties......................................................................................................2092
Related scenario.......................................................................................................................................................... 2093
tMarkLogicInput......................................................................................................2094
tMarkLogicInput Standard properties..................................................................................................................2094
Related scenario.......................................................................................................................................................... 2096
tMarkLogicOutput...................................................................................................2097
tMarkLogicOutput Standard properties.............................................................................................................. 2097
Related scenario.......................................................................................................................................................... 2099
tMaxDBInput........................................................................................................... 2100
tMaxDBInput Standard properties........................................................................................................................ 2100
Related scenario.......................................................................................................................................................... 2102
tMaxDBOutput........................................................................................................ 2103
tMaxDBOutput Standard properties.....................................................................................................................2103
Related scenario.......................................................................................................................................................... 2106
tMaxDBRow.............................................................................................................2107
tMaxDBRow Standard properties.......................................................................................................................... 2107
Related scenario.......................................................................................................................................................... 2109
tMDMBulkLoad....................................................................................................... 2110
tMDMBulkLoad Standard properties....................................................................................................................2110
Loading records into a business entity.............................................................................................................. 2113
tMDMClose.............................................................................................................. 2118
tMDMClose Standard properties............................................................................................................................2118
Related scenario.......................................................................................................................................................... 2119
tMDMCommit.......................................................................................................... 2120
tMDMCommit Standard properties.......................................................................................................................2120
Related scenario.......................................................................................................................................................... 2121
tMDMConnection.................................................................................................... 2122
tMDMConnection Standard properties................................................................................................................2122
Related scenario.......................................................................................................................................................... 2123
tMDMDelete............................................................................................................ 2124
tMDMDelete Standard properties......................................................................................................................... 2124
Deleting master data from an MDM Hub.......................................................................................................... 2128
tMDMInput.............................................................................................................. 2135
tMDMInput Standard properties............................................................................................................................2135
Reading master data from an MDM hub........................................................................................................... 2139
tMDMOutput............................................................................................................2142
tMDMOutput Standard properties.........................................................................................................................2142
Examples of partial update operations using tMDMOutput....................................................................... 2147
Writing master data in an MDM hub...................................................................................................................2153
Removing master data partially from the MDM hub.....................................................................................2158
tMDMReceive.......................................................................................................... 2165
tMDMReceive Standard properties....................................................................................................................... 2165
Extracting information from an MDM record in XML................................................................................... 2167
tMDMRollback.........................................................................................................2171
tMDMRollback Standard properties..................................................................................................................... 2171
Related scenario.......................................................................................................................................................... 2172
tMDMRouteRecord................................................................................................. 2173
tMDMRouteRecord Standard properties............................................................................................................. 2173
Routing an update report record to Event Manager..................................................................................... 2175
tMDMSP................................................................................................................... 2179
tMDMSP Standard properties................................................................................................................................. 2179
Executing a stored procedure using tMDMSP..................................................................................................2180
tMDMTriggerInput..................................................................................................2186
tMDMTriggerInput Standard properties..............................................................................................................2186
Exchanging the event information about an MDM record..........................................................................2188
tMDMTriggerOutput...............................................................................................2197
tMDMTriggerOutput Standard properties.......................................................................................................... 2197
Related scenario.......................................................................................................................................................... 2198
tMDMViewSearch................................................................................................... 2199
tMDMViewSearch Standard properties............................................................................................................... 2199
Retrieving records from an MDM hub via an existing view....................................................................... 2203
tMemorizeRows...................................................................................................... 2206
tMemorizeRows Standard properties...................................................................................................................2206
Retrieving the different ages and lowest age data....................................................................................... 2207
tMicrosoftCrmInput................................................................................................ 2213
tMicrosoftCrmInput Standard properties............................................................................................................2213
Writing data in a Microsoft CRM database and putting conditions on columns to extract specified
rows...................................................................................................................................................................................2217
tMicrosoftCrmOutput............................................................................................. 2223
tMicrosoftCrmOutput Standard properties........................................................................................................ 2223
Related Scenario..........................................................................................................................................................2226
tMicrosoftMQInput................................................................................................. 2227
tMicrosoftMQInput Standard properties.............................................................................................................2227
Writing and fetching queuing messages from Microsoft message queue............................................. 2228
tMicrosoftMQOutput.............................................................................................. 2233
tMicrosoftMQOutput Standard properties..........................................................................................................2233
Related scenario.......................................................................................................................................................... 2234
tMomCommit...........................................................................................................2235
tMomCommit Standard properties....................................................................................................................... 2235
Related scenario.......................................................................................................................................................... 2236
tMomConnection.................................................................................................... 2237
tMomConnection Standard properties................................................................................................................ 2237
Related scenario.......................................................................................................................................................... 2239
tMomInput...............................................................................................................2240
tMomInput Standard properties............................................................................................................................ 2240
Asynchronous communication via a MOM server...........................................................................................2246
Transmitting XML files via a MOM server.........................................................................................................2249
tMomMessageIdList............................................................................................... 2255
tMomMessageIdList Standard properties...........................................................................................................2255
Related scenario.......................................................................................................................................................... 2256
tMomOutput............................................................................................................ 2257
tMomOutput Standard properties......................................................................................................................... 2257
Related scenario.......................................................................................................................................................... 2262
tMomRollback......................................................................................................... 2263
tMomRollback Standard properties......................................................................................................................2263
Related scenario.......................................................................................................................................................... 2264
tMondrianInput....................................................................................................... 2265
tMondrianInput Standard properties................................................................................................................... 2265
Extracting multi-dimenstional datasets from a MySQL database (Cross-join tables)........................2267
tMongoDBBulkLoad............................................................................................... 2270
tMongoDBBulkLoad Standard properties...........................................................................................................2270
Importing data into MongoDB database............................................................................................................2273
tMongoDBClose...................................................................................................... 2281
tMongoDBClose Standard properties...................................................................................................................2281
Related scenario.......................................................................................................................................................... 2281
tMongoDBConnection............................................................................................ 2282
tMongoDBConnection Standard properties.......................................................................................................2282
Related scenario.......................................................................................................................................................... 2284
tMongoDBGridFSDelete.........................................................................................2285
tMongoDBGridFSDelete Standard properties................................................................................................... 2285
Related scenario.......................................................................................................................................................... 2287
tMongoDBGridFSGet.............................................................................................. 2288
tMongoDBGridFSGet Standard properties..........................................................................................................2288
Related scenario.......................................................................................................................................................... 2291
tMongoDBGridFSList.............................................................................................. 2292
tMongoDBGridFSList Standard properties......................................................................................................... 2292
Related scenario.......................................................................................................................................................... 2295
tMongoDBGridFSProperties.................................................................................. 2296
tMongoDBGridFSProperties Standard properties............................................................................................ 2296
Related scenario.......................................................................................................................................................... 2299
tMongoDBGridFSPut.............................................................................................. 2300
tMongoDBGridFSPut Standard properties..........................................................................................................2300
Managing files using MongoDB GridFS..............................................................................................................2302
tMongoDBInput.......................................................................................................2311
tMongoDBInput Standard properties...................................................................................................................2311
Retrieving data from a collection by advanced queries...............................................................................2315
Related scenarios........................................................................................................................................................ 2318
tMongoDBOutput....................................................................................................2319
tMongoDBOutput Standard properties................................................................................................................2319
Creating a collection and writing data to it.....................................................................................................2323
Upserting records in a collection..........................................................................................................................2328
tMongoDBRow........................................................................................................ 2336
tMongoDBRow Standard properties.....................................................................................................................2336
Using MongoDB functions to create a collection and write data to it................................................... 2339
tMsgBox................................................................................................................... 2345
tMsgBox Standard properties................................................................................................................................. 2345
'Hello world!' type test............................................................................................................................................. 2346
tMSSqlBulkExec...................................................................................................... 2348
tMSSqlBulkExec Standard properties.................................................................................................................. 2348
Related scenarios........................................................................................................................................................ 2352
tMSSqlClose............................................................................................................ 2353
tMSSqlClose Standard properties..........................................................................................................................2353
Related scenarios........................................................................................................................................................ 2354
tMSSqlColumnList.................................................................................................. 2355
tMSSqlColumnList Standard properties..............................................................................................................2355
Related scenario.......................................................................................................................................................... 2357
tMSSqlCommit........................................................................................................ 2358
tMSSqlCommit Standard properties.....................................................................................................................2358
Related scenarios........................................................................................................................................................ 2359
tMSSqlConnection.................................................................................................. 2360
tMSSqlConnection Standard properties..............................................................................................................2360
Inserting data into a database table and extracting useful information from it.................................2362
tMSSqlInput.............................................................................................................2368
tMSSqlInput Standard properties..........................................................................................................................2368
Related scenarios........................................................................................................................................................ 2371
tMSSqlLastInsertId................................................................................................. 2372
tMSSqlLastInsertId Standard properties.............................................................................................................2372
Related scenario.......................................................................................................................................................... 2374
tMSSqlOutput..........................................................................................................2375
tMSSqlOutput Standard properties.......................................................................................................................2375
Related scenarios........................................................................................................................................................ 2381
tMSSqlOutputBulk.................................................................................................. 2382
tMSSqlOutputBulk Standard properties..............................................................................................................2382
Related scenarios........................................................................................................................................................ 2384
tMSSqlOutputBulkExec..........................................................................................2385
tMSSqlOutputBulkExec Standard properties.................................................................................................... 2385
Related scenarios........................................................................................................................................................ 2389
tMSSqlRollback.......................................................................................................2390
tMSSqlRollback Standard properties................................................................................................................... 2390
Related scenario.......................................................................................................................................................... 2391
tMSSqlRow.............................................................................................................. 2392
tMSSqlRow Standard properties............................................................................................................................2392
Related scenarios........................................................................................................................................................ 2396
tMSSqlSCD...............................................................................................................2397
tMSSqlSCD Standard properties............................................................................................................................ 2397
Related scenario.......................................................................................................................................................... 2400
tMSSqlSP................................................................................................................. 2401
tMSSqlSP Standard properties............................................................................................................................... 2401
Retrieving personal information using a stored procedure........................................................................ 2404
Related scenarios........................................................................................................................................................ 2409
tMSSqlTableList......................................................................................................2410
tMSSqlTableList Standard properties.................................................................................................................. 2410
Related scenario.......................................................................................................................................................... 2411
tMysqlBulkExec.......................................................................................................2412
tMysqlBulkExec Standard Properties................................................................................................................... 2412
Related scenarios........................................................................................................................................................ 2415
tMysqlClose............................................................................................................. 2416
tMysqlClose Standard properties.......................................................................................................................... 2416
Related scenario.......................................................................................................................................................... 2417
tMysqlColumnList...................................................................................................2418
tMysqlColumnList Standard properties...............................................................................................................2418
Iterating on a DB table and listing its column names................................................................................. 2419
tMysqlCommit......................................................................................................... 2423
tMysqlCommit Standard properties......................................................................................................................2423
Related scenario.......................................................................................................................................................... 2424
tMysqlConnection...................................................................................................2425
tMysqlConnection Standard properties...............................................................................................................2425
Inserting data in mother/daughter tables......................................................................................................... 2426
Sharing a database connection between a parent Job and child Job......................................................2430
tMysqlInput............................................................................................................. 2437
tMysqlInput Standard properties...........................................................................................................................2437
Writing columns from a MySQL database to an output file using tMysqlInput...................................2440
Using context parameters when reading a table from a database.......................................................... 2443
Reading data from databases through context-based dynamic connections....................................... 2446
tMysqlLastInsertId.................................................................................................. 2453
tMysqlLastInsertId Standard properties..............................................................................................................2453
Getting the ID for the last inserted record with tMysqlLastInsertId........................................................2455
tMysqlLookupInput................................................................................................ 2459
tMysqlOutput.......................................................................................................... 2460
tMysqlOutput Standard properties....................................................................................................................... 2460
Inserting a column and altering data using tMysqlOutput......................................................................... 2466
Updating data using tMysqlOutput...................................................................................................................... 2471
Retrieving data in error with a Reject link....................................................................................................... 2474
tMysqlOutputBulk...................................................................................................2480
tMysqlOutputBulk Standard properties...............................................................................................................2480
Inserting transformed data in MySQL database..............................................................................................2482
tMysqlOutputBulkExec...........................................................................................2486
tMysqlOutputBulkExec Standard properties..................................................................................................... 2486
Inserting data in bulk in MySQL database........................................................................................................2489
tMysqlRollback........................................................................................................2491
tMysqlRollback Standard properties.................................................................................................................... 2491
tMysqlRow............................................................................................................... 2493
tMysqlRow Standard properties.............................................................................................................................2493
Removing and regenerating a MySQL table index........................................................................................ 2497
Using PreparedStatement objects to query data............................................................................................ 2498
Combining two flows for selective output........................................................................................................2503
tMysqlSCD............................................................................................................... 2508
tMysqlSCD Standard properties............................................................................................................................. 2508
SCD management methodology............................................................................................................................2511
Tracking data changes using Slowly Changing Dimensions (type 0 through type 3)........................ 2514
tMysqlSCDELT......................................................................................................... 2522
tMysqlSCDELT Standard properties......................................................................................................................2522
Related Scenarios........................................................................................................................................................2525
tMysqlSP.................................................................................................................. 2526
tMysqlSP Standard properties................................................................................................................................ 2526
Using tMysqlSP to find a State Label using a stored procedure...............................................................2528
Related scenarios........................................................................................................................................................ 2531
tMysqlTableList...................................................................................................... 2532
tMysqlTableList Standard properties...................................................................................................................2532
Iterating on DB tables and deleting their content using a user-defined SQL template................... 2533
Related scenario.......................................................................................................................................................... 2537
tNamedPipeClose................................................................................................... 2538
tNamedPipeClose Standard properties............................................................................................................... 2538
Related scenario.......................................................................................................................................................... 2539
tNamedPipeOpen....................................................................................................2540
tNamedPipeOpen Standard properties............................................................................................................... 2540
Related scenario.......................................................................................................................................................... 2541
tNamedPipeOutput.................................................................................................2542
tNamedPipeOutput Standard properties............................................................................................................ 2542
tNeo4jBatchOutput.................................................................................................2545
tNeo4jBatchOutput Standard properties............................................................................................................ 2545
tNeo4jBatchOutputRelationship...........................................................................2548
tNeo4jBatchOutputRelationship Standard properties................................................................................... 2548
Writing information of actors and movies to Neo4j with hierarchical relationship using Neo4j
Batch components...................................................................................................................................................... 2550
tNeo4jBatchSchema............................................................................................... 2560
tNeo4jBatchSchema Standard properties.......................................................................................................... 2560
tNeo4jClose............................................................................................................. 2562
tNeo4jClose Standard properties.......................................................................................................................... 2562
Related scenarios........................................................................................................................................................ 2562
tNeo4jConnection...................................................................................................2564
tNeo4jConnection Standard properties...............................................................................................................2564
Related scenarios........................................................................................................................................................ 2565
tNeo4jImportTool................................................................................................... 2567
tNeo4jImportTool Standard properties...............................................................................................................2567
tNeo4jInput............................................................................................................. 2569
tNeo4jInput Standard properties...........................................................................................................................2569
Related scenarios........................................................................................................................................................ 2571
tNeo4jOutput.......................................................................................................... 2572
tNeo4jOutput Standard properties....................................................................................................................... 2572
Writing data to a Neo4j database and reading specific data from it...................................................... 2576
Writing family information to Neo4j and creating relationships.............................................................. 2580
tNeo4jOutputRelationship.................................................................................... 2586
tNeo4jOutputRelationship Standard properties.............................................................................................. 2586
Writing information of actors and movies to Neo4j with hierarchical relationship........................... 2589
tNeo4jRow............................................................................................................... 2599
tNeo4jRow Standard properties............................................................................................................................ 2599
Creating nodes with a label using a Cypher query........................................................................................2602
Importing data from a CSV file to Neo4j using a Cypher query................................................................2606
Importing data from a CSV file to Neo4j and creating relationships using a single Cypher query.. 2612
tNetezzaBulkExec................................................................................................... 2616
tNetezzaBulkExec Standard properties...............................................................................................................2616
Related scenarios........................................................................................................................................................ 2619
tNetezzaClose......................................................................................................... 2620
tNetezzaClose Standard properties...................................................................................................................... 2620
Related scenarios........................................................................................................................................................ 2621
tNetezzaCommit..................................................................................................... 2622
tNetezzaCommit Standard properties................................................................................................................. 2622
Related scenario.......................................................................................................................................................... 2623
tNetezzaConnection............................................................................................... 2624
tNetezzaConnection Standard properties.......................................................................................................... 2624
Related scenarios........................................................................................................................................................ 2625
tNetezzaInput..........................................................................................................2626
tNetezzaInput Standard properties...................................................................................................................... 2626
Related scenarios........................................................................................................................................................ 2629
tNetezzaNzLoad......................................................................................................2630
tNetezzaNzLoad Standard properties.................................................................................................................. 2630
Related scenario.......................................................................................................................................................... 2636
tNetezzaOutput.......................................................................................................2637
tNetezzaOutput Standard properties................................................................................................................... 2637
Related scenarios........................................................................................................................................................ 2642
tNetezzaRollback....................................................................................................2643
tNetezzaRollback Standard properties................................................................................................................2643
Related scenarios........................................................................................................................................................ 2644
tNetezzaRow........................................................................................................... 2645
tNetezzaRow Standard properties........................................................................................................................ 2645
Related scenarios........................................................................................................................................................ 2648
tNetezzaSCD............................................................................................................2649
tNetezzaSCD Standard properties.........................................................................................................................2649
Related scenario.......................................................................................................................................................... 2652
tNetsuiteConnection...............................................................................................2653
tNetsuiteConnection Standard properties..........................................................................................................2653
Related scenario.......................................................................................................................................................... 2654
tNetsuiteInput......................................................................................................... 2655
tNetsuiteInput Standard properties......................................................................................................................2655
Handling data with NetSuite..................................................................................................................................2657
tNetsuiteOutput...................................................................................................... 2663
tNetsuiteOutput Standard properties.................................................................................................................. 2663
Related scenario.......................................................................................................................................................... 2666
tNormalize............................................................................................................... 2667
tNormalize Standard properties.............................................................................................................................2667
Normalizing data......................................................................................................................................................... 2669
tOpenbravoERPInput..............................................................................................2672
tOpenbravoERPInput Standard properties.........................................................................................................2672
Related Scenario..........................................................................................................................................................2673
tOpenbravoERPOutput...........................................................................................2674
tOpenbravoERPOutput Standard properties..................................................................................................... 2674
Related scenario.......................................................................................................................................................... 2675
tOracleBulkExec......................................................................................................2676
tOracleBulkExec Standard properties..................................................................................................................2676
Truncating and inserting file data into an Oracle database.......................................................................2681
tOracleClose............................................................................................................ 2684
tOracleClose Standard properties......................................................................................................................... 2684
Related scenarios........................................................................................................................................................ 2685
tOracleCommit........................................................................................................ 2686
tOracleCommit Standard properties.................................................................................................................... 2686
Related scenario.......................................................................................................................................................... 2687
tOracleConnection.................................................................................................. 2688
tOracleConnection Standard properties............................................................................................................. 2688
Related scenario.......................................................................................................................................................... 2691
tOracleInput............................................................................................................ 2692
tOracleInput Standard properties......................................................................................................................... 2692
Using context parameters when reading a table from an Oracle database..........................................2695
tOracleOutput......................................................................................................... 2699
tOracleOutput Standard properties...................................................................................................................... 2699
Related scenarios........................................................................................................................................................ 2705
tOracleOutputBulk..................................................................................................2706
tOracleOutputBulk Standard properties............................................................................................................. 2706
Related scenarios........................................................................................................................................................ 2708
tOracleOutputBulkExec..........................................................................................2709
tOracleOutputBulkExec Standard properties.................................................................................................... 2709
Related scenarios........................................................................................................................................................ 2714
tOracleRollback.......................................................................................................2715
tOracleRollback Standard properties...................................................................................................................2715
Related scenario.......................................................................................................................................................... 2716
tOracleRow.............................................................................................................. 2717
tOracleRow Standard properties........................................................................................................................... 2717
Related scenarios........................................................................................................................................................ 2721
tOracleSCD...............................................................................................................2722
tOracleSCD Standard properties............................................................................................................................2722
Related scenario.......................................................................................................................................................... 2725
tOracleSCDELT........................................................................................................ 2726
tOracleSCDELT Standard properties.................................................................................................................... 2726
Related Scenarios........................................................................................................................................................2730
tOracleSP................................................................................................................. 2731
tOracleSP Standard properties...............................................................................................................................2731
Checking number format using a stored procedure...................................................................................... 2735
Related scenarios........................................................................................................................................................ 2738
tOracleTableList......................................................................................................2739
tOracleTableList Standard properties..................................................................................................................2739
Related scenarios........................................................................................................................................................ 2740
tPaloCheckElements...............................................................................................2741
tPaloCheckElements Standard properties..........................................................................................................2741
Related scenario.......................................................................................................................................................... 2743
tPaloClose................................................................................................................2744
tPaloClose Standard properties............................................................................................................................. 2744
Related scenarios........................................................................................................................................................ 2745
tPaloConnection..................................................................................................... 2746
tPaloConnection Standard properties..................................................................................................................2746
Related scenario.......................................................................................................................................................... 2747
tPaloCube................................................................................................................ 2748
tPaloCube Standard properties.............................................................................................................................. 2748
Creating a cube in an existing database........................................................................................................... 2750
tPaloCubeList.......................................................................................................... 2752
Discovering the read-only output schema of tPaloCubeList...................................................................... 2752
tPaloCubeList Standard properties.......................................................................................................................2752
Retrieving detailed cube information from a given database................................................................... 2754
tPaloDatabase......................................................................................................... 2756
tPaloDatabase Standard properties......................................................................................................................2756
Creating a database................................................................................................................................................... 2757
tPaloDatabaseList...................................................................................................2759
Discovering the read-only output schema of tPaloDatabaseList..............................................................2759
tPaloDatabaseList Standard properties...............................................................................................................2759
Retrieving detailed database information from a given Palo server.......................................................2761
tPaloDimension.......................................................................................................2763
tPaloDimension Standard properties...................................................................................................................2763
Creating a dimension with elements.................................................................................................................. 2766
tPaloDimensionList................................................................................................ 2771
Discovering the read-only output schema of tPaloDimensionList........................................................... 2771
tPaloDimensionList Standard properties............................................................................................................2771
Retrieving detailed dimension information from a given database........................................................ 2773
tPaloInputMulti.......................................................................................................2776
tPaloInputMulti Standard properties................................................................................................................... 2776
Retrieving dimension elements from a given cube....................................................................................... 2778
tPaloOutput............................................................................................................. 2782
tPaloOutput Standard properties.......................................................................................................................... 2782
Related scenario.......................................................................................................................................................... 2784
tPaloOutputMulti....................................................................................................2785
tPaloOutputMulti Standard properties................................................................................................................2785
Writing data into a given cube..............................................................................................................................2787
Rejecting inflow data when the elements to be written do not exist in a given cube..................... 2790
tPaloRule................................................................................................................. 2795
tPaloRule Standard properties............................................................................................................................... 2795
Creating a rule in a given cube............................................................................................................................ 2796
tPaloRuleList...........................................................................................................2799
Discovering the read-only output schema of tPaloRuleList....................................................................... 2799
tPaloRuleList Standard properties........................................................................................................................2799
Retrieving detailed rule information from a given cube............................................................................. 2801
tParAccelBulkExec.................................................................................................. 2803
tParAccelBulkExec Standard properties..............................................................................................................2803
Related scenarios........................................................................................................................................................ 2806
tParAccelClose........................................................................................................ 2807
tParAccelClose Standard properties.....................................................................................................................2807
Related scenarios........................................................................................................................................................ 2808
tParAccelCommit.................................................................................................... 2809
tParAccelCommit Standard properties................................................................................................................ 2809
Related scenario.......................................................................................................................................................... 2810
tParAccelConnection.............................................................................................. 2811
tParAccelConnection Standard properties......................................................................................................... 2811
Related scenario.......................................................................................................................................................... 2812
tParAccelInput.........................................................................................................2813
tParAccelInput Standard properties..................................................................................................................... 2813
Related scenarios........................................................................................................................................................ 2816
tParAccelOutput......................................................................................................2817
tParAccelOutput Standard properties..................................................................................................................2817
Related scenarios........................................................................................................................................................ 2822
tParAccelOutputBulk..............................................................................................2823
tParAccelOutputBulk Standard properties......................................................................................................... 2823
Related scenarios........................................................................................................................................................ 2825
tParAccelOutputBulkExec......................................................................................2826
tParAccelOutputBulkExec Standard properties................................................................................................2826
Related scenarios........................................................................................................................................................ 2829
tParAccelRollback...................................................................................................2830
tParAccelRollback Standard properties...............................................................................................................2830
Related scenario.......................................................................................................................................................... 2831
tParAccelRow.......................................................................................................... 2832
tParAccelRow Standard properties....................................................................................................................... 2832
Related scenarios........................................................................................................................................................ 2835
tParAccelSCD...........................................................................................................2836
tParAccelSCD Standard properties........................................................................................................................2836
Related scenario.......................................................................................................................................................... 2839
tParseRecordSet......................................................................................................2840
tParseRecordSet Standard properties..................................................................................................................2840
Related Scenario..........................................................................................................................................................2841
tPatternUnmasking.................................................................................................2842
tPatternUnmasking Standard properties............................................................................................................ 2842
Unmasking Australian phone numbers...............................................................................................................2845
tPatternUnmasking properties for Apache Spark Batch............................................................................... 2849
tPatternUnmasking properties for Apache Spark Streaming...................................................................... 2853
tPivotToColumnsDelimited................................................................................... 2857
tPivotToColumnsDelimited Standard properties.............................................................................................2857
Using a pivot column to aggregate data...........................................................................................................2858
tPOP......................................................................................................................... 2861
tPOP Standard properties........................................................................................................................................ 2861
Retrieving a selection of email messages from an email server.............................................................. 2863
tPostgresPlusBulkExec.......................................................................................... 2865
tPostgresPlusBulkExec Standard properties..................................................................................................... 2865
Related scenarios........................................................................................................................................................ 2868
tPostgresPlusClose.................................................................................................2869
tPostgresPlusClose Standard properties.............................................................................................................2869
Related scenarios........................................................................................................................................................ 2870
tPostgresPlusCommit.............................................................................................2871
tPostgresPlusCommit Standard properties........................................................................................................2871
Related scenario.......................................................................................................................................................... 2872
tPostgresPlusConnection.......................................................................................2873
tPostgresPlusConnection Standard properties.................................................................................................2873
Related scenario.......................................................................................................................................................... 2874
tPostgresPlusInput................................................................................................. 2875
tPostgresPlusInput Standard properties.............................................................................................................2875
Related scenarios........................................................................................................................................................ 2878
tPostgresPlusOutput.............................................................................................. 2879
tPostgresPlusOutput Standard properties..........................................................................................................2879
Related scenarios........................................................................................................................................................ 2884
tPostgresPlusOutputBulk...................................................................................... 2885
tPostgresPlusOutputBulk Standard properties.................................................................................................2885
Related scenarios........................................................................................................................................................ 2887
tPostgresPlusOutputBulkExec.............................................................................. 2888
tPostgresPlusOutputBulkExec Standard properties....................................................................................... 2888
Related scenarios........................................................................................................................................................ 2890
tPostgresPlusRollback........................................................................................... 2891
tPostgresPlusRollback Standard properties...................................................................................................... 2891
Related scenarios........................................................................................................................................................ 2892
tPostgresPlusRow...................................................................................................2893
tPostgresPlusRow Standard properties...............................................................................................................2893
Related scenarios........................................................................................................................................................ 2896
tPostgresPlusSCD................................................................................................... 2897
tPostgresPlusSCD Standard properties............................................................................................................... 2897
Related scenario.......................................................................................................................................................... 2900
tPostgresPlusSCDELT.............................................................................................2901
tPostgresPlusSCDELT Standard properties........................................................................................................2901
Related Scenarios........................................................................................................................................................2905
tPostgresqlBulkExec...............................................................................................2906
tPostgresqlBulkExec Standard properties..........................................................................................................2906
Related scenarios........................................................................................................................................................ 2909
tPostgresqlClose..................................................................................................... 2910
tPostgresqlClose Standard properties................................................................................................................. 2910
Related scenarios........................................................................................................................................................ 2911
tPostgresqlCommit.................................................................................................2912
tPostgresqlCommit Standard properties............................................................................................................ 2912
Related scenario.......................................................................................................................................................... 2913
tPostgresqlConnection...........................................................................................2914
tPostgresqlConnection Standard properties..................................................................................................... 2914
Related scenario.......................................................................................................................................................... 2915
tPostgresqlInput..................................................................................................... 2916
tPostgresqlInput Standard properties................................................................................................................. 2916
Related scenarios........................................................................................................................................................ 2919
tPostgresqlOutput.................................................................................................. 2920
tPostgresqlOutput Standard properties.............................................................................................................. 2920
Related scenarios........................................................................................................................................................ 2926
tPostgresqlOutputBulk.......................................................................................... 2927
tPostgresqlOutputBulk Standard properties..................................................................................................... 2927
Related scenarios........................................................................................................................................................ 2929
tPostgresqlOutputBulkExec.................................................................................. 2930
tPostgresqlOutputBulkExec Standard properties............................................................................................ 2930
Related scenarios........................................................................................................................................................ 2933
tPostgresqlRollback............................................................................................... 2934
tPostgresqlRollback Standard properties...........................................................................................................2934
Related scenario.......................................................................................................................................................... 2935
tPostgresqlRow.......................................................................................................2936
tPostgresqlRow Standard properties................................................................................................................... 2936
Related scenarios........................................................................................................................................................ 2939
tPostgresqlSCD....................................................................................................... 2940
tPostgresqlSCD Standard properties....................................................................................................................2940
Related scenario.......................................................................................................................................................... 2943
tPostgresqlSCDELT.................................................................................................2944
tPostgresqlSCDELT Standard properties............................................................................................................ 2944
Tracking data changes in a PostgreSQL table using the tPostgreSQLSCDELT component............. 2948
Related Scenario..........................................................................................................................................................2957
tPostjob....................................................................................................................2958
tPostjob Standard properties.................................................................................................................................. 2958
Related scenarios........................................................................................................................................................ 2958
tPrejob......................................................................................................................2959
tPrejob Standard properties.................................................................................................................................... 2959
Handling files before and after the execution of a data Job..................................................................... 2959
Related scenario.......................................................................................................................................................... 2962
tPubSubOutput....................................................................................................... 2963
tRedshiftBulkExec.................................................................................................. 2964
tRedshiftBulkExec Standard properties.............................................................................................................. 2964
Loading/unloading data to/from Amazon S3................................................................................................... 2970
tRedshiftClose.........................................................................................................2980
tRedshiftClose Standard properties......................................................................................................................2980
Related scenario.......................................................................................................................................................... 2981
tRedshiftCommit.....................................................................................................2982
tRedshiftCommit Standard properties.................................................................................................................2982
Related scenario.......................................................................................................................................................... 2983
tRedshiftConnection...............................................................................................2984
tRedshiftConnection Standard properties..........................................................................................................2984
Related scenario.......................................................................................................................................................... 2986
tRedshiftInput......................................................................................................... 2987
tRedshiftInput Standard properties......................................................................................................................2987
Handling data with Redshift...................................................................................................................................2991
tRedshiftOutput...................................................................................................... 2996
tRedshiftOutput Standard properties...................................................................................................................2996
Related scenarios........................................................................................................................................................ 3001
tRedshiftOutputBulk.............................................................................................. 3002
tRedshiftOutputBulk Standard properties..........................................................................................................3002
Related scenario.......................................................................................................................................................... 3006
tRedshiftOutputBulkExec...................................................................................... 3007
tRedshiftOutputBulkExec Standard properties................................................................................................ 3007
Related scenario.......................................................................................................................................................... 3013
tRedshiftRollback................................................................................................... 3014
tRedshiftRollback Standard properties............................................................................................................... 3014
Related scenario.......................................................................................................................................................... 3015
tRedshiftRow...........................................................................................................3016
tRedshiftRow Standard properties........................................................................................................................3016
Related scenarios........................................................................................................................................................ 3020
tRedshiftUnload...................................................................................................... 3021
tRedshiftUnload Standard properties.................................................................................................................. 3021
Related Scenario..........................................................................................................................................................3025
tReplace................................................................................................................... 3026
tReplace Standard properties................................................................................................................................. 3026
Cleaning up and filtering a CSV file....................................................................................................................3027
tReplaceList.............................................................................................................3031
tReplaceList Standard properties..........................................................................................................................3031
Replacing state names with their two-letter codes.......................................................................................3032
tReplicate.................................................................................................................3036
tReplicate Standard properties.............................................................................................................................. 3036
Replicating a flow and sorting two identical flows respectively..............................................................3037
tREST........................................................................................................................3041
tREST Standard properties.......................................................................................................................................3041
Creating and retrieving data by invoking REST Web service..................................................................... 3042
tRESTClient............................................................................................................. 3045
tRESTClient Standard properties...........................................................................................................................3045
Getting user information by interacting with a RESTful service...............................................................3050
Updating user information by interacting with a RESTful service........................................................... 3056
tRESTRequest..........................................................................................................3063
tRESTRequest Standard properties.......................................................................................................................3063
Using a REST service to accept HTTP GET requests and send responses............................................. 3066
Using URI Query parameters to explore the data of a database.............................................................. 3072
Using a REST service to accept HTTP POST requests.................................................................................. 3080
Using a REST service to accept HTTP POST requests and send responses...........................................3085
Using a REST service to accept HTTP POST requests in an HTML form................................................3093
tRESTResponse....................................................................................................... 3100
tRESTResponse Standard properties....................................................................................................................3100
Related scenario.......................................................................................................................................................... 3101
tRiakBucketList....................................................................................................... 3102
tRiakBucketList Standard properties....................................................................................................................3102
Related scenarios........................................................................................................................................................ 3103
tRiakClose................................................................................................................3104
tRiakClose Standard properties............................................................................................................................. 3104
Related Scenario..........................................................................................................................................................3104
tRiakConnection......................................................................................................3105
tRiakConnection Standard properties..................................................................................................................3105
Related scenario.......................................................................................................................................................... 3106
tRiakInput................................................................................................................ 3107
tRiakInput Standard properties..............................................................................................................................3107
Exporting data from a Riak bucket to a local file..........................................................................................3108
tRiakKeyList............................................................................................................ 3113
tRiakKeyList Standard properties..........................................................................................................................3113
Related scenarios........................................................................................................................................................ 3114
tRiakOutput............................................................................................................. 3115
tRiakOutput Standard properties.......................................................................................................................... 3115
Related scenarios........................................................................................................................................................ 3117
tRouteFault..............................................................................................................3118
tRouteFault Standard properties........................................................................................................................... 3118
Exchanging messages between a Job and a Route....................................................................................... 3119
tRouteInput............................................................................................................. 3126
tRouteInput Standard properties...........................................................................................................................3126
Exchanging messages between a Job and a Route....................................................................................... 3127
tRouteOutput.......................................................................................................... 3132
tRouteOutput Standard properties....................................................................................................................... 3132
Related scenario.......................................................................................................................................................... 3133
tRowGenerator........................................................................................................ 3134
tRowGenerator Standard properties.....................................................................................................................3134
Generating random java data.................................................................................................................................3136
tRSSInput.................................................................................................................3138
tRSSInput Standard properties...............................................................................................................................3138
Fetching frequently updated blog entries.........................................................................................................3139
tRSSOutput..............................................................................................................3141
tRSSOutput Standard properties........................................................................................................................... 3141
Creating an RSS flow and storing files on an FTP server........................................................................... 3142
Creating an RSS flow that contains metadata.................................................................................................3147
Creating an ATOM feed XML file..........................................................................................................................3149
tRunJob.................................................................................................................... 3153
tRunJob Standard properties...................................................................................................................................3153
Calling a Job and passing the parameter needed to the called Job........................................................ 3156
Running a list of child Jobs dynamically........................................................................................................... 3160
Propagating the buffered output data from the child Job to the parent Job........................................3164
tS3BucketCreate..................................................................................................... 3169
tS3BucketCreate Standard properties..................................................................................................................3169
Related scenario.......................................................................................................................................................... 3171
tS3BucketDelete..................................................................................................... 3172
tS3BucketDelete Standard properties................................................................................................................. 3172
Related scenario.......................................................................................................................................................... 3173
tS3BucketExist........................................................................................................ 3174
tS3BucketExist Standard properties.....................................................................................................................3174
Verifing the absence of a bucket, creating it and listing all the S3 buckets........................................ 3176
tS3BucketList.......................................................................................................... 3180
tS3BucketList Standard properties....................................................................................................................... 3180
Related scenario.......................................................................................................................................................... 3181
tS3Close...................................................................................................................3182
tS3Close Standard properties................................................................................................................................. 3182
Related scenario.......................................................................................................................................................... 3183
tS3Connection.........................................................................................................3184
tS3Connection Standard properties..................................................................................................................... 3184
Creating an IAM role on AWS................................................................................................................................ 3187
Setting up SSE KMS for your EMR cluster........................................................................................................ 3187
Setting up SSE KMS for your S3 bucket............................................................................................................ 3189
Related scenario.......................................................................................................................................................... 3191
tS3Copy....................................................................................................................3192
tS3Copy Standard properties.................................................................................................................................. 3192
Copying an S3 object from one bucket to another........................................................................................3194
tS3Delete.................................................................................................................3199
tS3Delete Standard properties...............................................................................................................................3199
Related scenario.......................................................................................................................................................... 3201
tS3Get...................................................................................................................... 3202
tS3Get Standard properties..................................................................................................................................... 3202
Related scenario.......................................................................................................................................................... 3205
tS3List...................................................................................................................... 3206
tS3List Standard properties.....................................................................................................................................3206
Listing files with the same prefix from a bucket........................................................................................... 3208
Tagging S3 objects................................................................................................ 3212
Tagging S3 objects: linking the components...................................................................................................3212
Tagging S3 objects: configuring the components..........................................................................................3212
Tagging S3 objects: executing the Job...............................................................................................................3213
tS3Put...................................................................................................................... 3215
tS3Put Standard properties..................................................................................................................................... 3215
Exchange files with Amazon S3............................................................................................................................3218
tSalesforceBulkExec............................................................................................... 3222
tSalesforceBulkExec Standard properties.......................................................................................................... 3222
Related scenario.......................................................................................................................................................... 3226
tSalesforceConnection........................................................................................... 3227
tSalesforceConnection Standard properties......................................................................................................3227
Connecting to Salesforce using OAuth implicit flow to authenticate the user (deprecated).......... 3230
Related scenario.......................................................................................................................................................... 3234
tSalesforceGetDeleted........................................................................................... 3235
tSalesforceGetDeleted Standard properties...................................................................................................... 3235
Recovering deleted data from Salesforce..........................................................................................................3238
tSalesforceGetServerTimestamp...........................................................................3243
tSalesforceGetServerTimestamp Standard properties................................................................................... 3243
Related scenario.......................................................................................................................................................... 3246
tSalesforceGetUpdated.......................................................................................... 3247
tSalesforceGetUpdated Standard properties.....................................................................................................3247
Related scenario.......................................................................................................................................................... 3251
tSalesforceInput......................................................................................................3252
tSalesforceInput Standard properties..................................................................................................................3252
How to set schema for the guess query feature of tSalesforceInput...................................................... 3257
Related scenario.......................................................................................................................................................... 3262
tSalesforceOutput...................................................................................................3263
tSalesforceOutput Standard properties...............................................................................................................3263
Upserting Salesforce data based on external IDs.......................................................................................... 3268
tSalesforceOutputBulk........................................................................................... 3279
tSalesforceOutputBulk Standard properties......................................................................................................3279
Related scenario.......................................................................................................................................................... 3280
tSalesforceOutputBulkExec...................................................................................3281
tSalesforceOutputBulkExec Standard properties............................................................................................ 3281
Inserting bulk data into Salesforce......................................................................................................................3286
tSalesforceEinsteinBulkExec................................................................................. 3290
tSalesforceEinsteinBulkExec Standard properties.......................................................................................... 3290
Related scenario.......................................................................................................................................................... 3293
tSalesforceEinsteinOutputBulkExec..................................................................... 3294
tSalesforceEinsteinOutputBulkExec Standard properties............................................................................ 3294
Related scenario.......................................................................................................................................................... 3298
tSampleRow............................................................................................................ 3299
tSampleRow Standard properties......................................................................................................................... 3299
Filtering rows and groups of rows.......................................................................................................................3300
tSAPHanaClose........................................................................................................3303
tSAPHanaClose Standard properties.................................................................................................................... 3303
Related scenarios........................................................................................................................................................ 3303
tSAPHanaCommit................................................................................................... 3304
tSAPHanaCommit Standard properties............................................................................................................... 3304
Related scenario.......................................................................................................................................................... 3305
tSAPHanaConnection............................................................................................. 3306
tSAPHanaConnection Standard properties........................................................................................................ 3306
Related scenarios........................................................................................................................................................ 3307
tSAPHanaInput........................................................................................................3308
tSAPHanaInput Standard properties.................................................................................................................... 3308
Related scenarios........................................................................................................................................................ 3311
tSAPHanaOutput.....................................................................................................3312
tSAPHanaOutput Standard properties.................................................................................................................3312
Related scenarios........................................................................................................................................................ 3317
tSAPHanaRollback.................................................................................................. 3318
tSAPHanaRollback Standard properties..............................................................................................................3318
Related scenarios........................................................................................................................................................ 3318
tSAPHanaRow......................................................................................................... 3319
tSAPHanaRow Standard properties...................................................................................................................... 3319
Related scenarios........................................................................................................................................................ 3322
tSCPClose.................................................................................................................3326
tSCPClose Standard properties.............................................................................................................................. 3326
Related scenario.......................................................................................................................................................... 3327
tSCPConnection...................................................................................................... 3328
tSCPConnection Standard properties...................................................................................................................3328
Related scenarios........................................................................................................................................................ 3329
tSCPDelete...............................................................................................................3330
tSCPDelete Standard properties............................................................................................................................ 3330
Related scenarios........................................................................................................................................................ 3331
tSCPFileExists......................................................................................................... 3332
tSCPFileExists Standard properties...................................................................................................................... 3332
Handling a file using SCP........................................................................................................................................3333
tSCPFileList............................................................................................................. 3338
tSCPFileList Standard properties...........................................................................................................................3338
Related scenario.......................................................................................................................................................... 3339
tSCPGet.................................................................................................................... 3340
tSCPGet Standard properties.................................................................................................................................. 3340
Related scenario.......................................................................................................................................................... 3341
tSCPPut.................................................................................................................... 3342
tSCPPut Standard properties.................................................................................................................................. 3342
Related scenario.......................................................................................................................................................... 3343
tSCPRename............................................................................................................ 3344
tSCPRename Standard properties......................................................................................................................... 3344
Related scenario.......................................................................................................................................................... 3345
tSCPTruncate...........................................................................................................3346
tSCPTruncate Standard properties........................................................................................................................3346
Related scenarios........................................................................................................................................................ 3347
tSendMail.................................................................................................................3348
tSendMail Standard properties.............................................................................................................................. 3348
Sending an email on error...................................................................................................................................... 3350
tServerAlive............................................................................................................. 3352
tServerAlive Standard properties.......................................................................................................................... 3352
Validating the status of the connection to a remote host.......................................................................... 3353
tServiceNowConnection.........................................................................................3356
tServiceNowConnection Standard properties...................................................................................................3356
Related scenario.......................................................................................................................................................... 3357
tServiceNowInput................................................................................................... 3358
tServiceNowInput Standard properties............................................................................................................... 3358
Related scenario.......................................................................................................................................................... 3360
tServiceNowOutput................................................................................................ 3361
tServiceNowOutput Standard properties............................................................................................................3361
Related scenario.......................................................................................................................................................... 3363
tSetEnv.....................................................................................................................3364
tSetEnv Standard properties................................................................................................................................... 3364
Modifying a variable during a Job execution................................................................................................... 3365
tSetGlobalVar.......................................................................................................... 3368
tSetGlobalVar Standard properties....................................................................................................................... 3368
Printing out the content of a global variable..................................................................................................3369
tSetKerberosConfiguration....................................................................................3371
tSetKerberosConfiguration Standard properties..............................................................................................3371
Related scenarios........................................................................................................................................................ 3372
tSetKeystore............................................................................................................3373
tSetKeystore Standard properties......................................................................................................................... 3373
Extracting customer information from a private WSDL file........................................................................3374
tSetProxy................................................................................................................. 3379
tSetProxy Standard properties............................................................................................................................... 3379
Related scenarios........................................................................................................................................................ 3381
tSleep....................................................................................................................... 3382
tSleep Standard properties......................................................................................................................................3382
Related scenarios........................................................................................................................................................ 3383
tSnowflakeBulkExec...............................................................................................3384
tSnowflakeBulkExec Standard properties.......................................................................................................... 3384
Loading data in a Snowflake table using custom stage path.................................................................... 3390
Related scenarios........................................................................................................................................................ 3397
tSnowflakeClose..................................................................................................... 3398
tSnowflakeClose Standard properties................................................................................................................. 3398
Related scenario.......................................................................................................................................................... 3398
tSnowflakeCommit................................................................................................. 3399
tSnowflakeCommit Standard properties.............................................................................................................3399
Related scenario for tSnowflakeCommit............................................................................................................3400
tSnowflakeConnection........................................................................................... 3401
tSnowflakeConnection Standard properties......................................................................................................3401
Related scenario.......................................................................................................................................................... 3403
tSnowflakeInput......................................................................................................3404
tSnowflakeInput Standard properties..................................................................................................................3404
Writing data into and reading data from a Snowflake table......................................................................3407
tSnowflakeOutput...................................................................................................3412
tSnowflakeOutput Standard properties.............................................................................................................. 3412
Related scenario.......................................................................................................................................................... 3415
tSnowflakeOutputBulk...........................................................................................3416
tSnowflakeOutputBulk Standard properties......................................................................................................3416
Related scenarios........................................................................................................................................................ 3422
tSnowflakeOutputBulkExec...................................................................................3423
tSnowflakeOutputBulkExec Standard properties............................................................................................ 3423
Loading Data Using COPY Command..................................................................................................................3430
Related scenarios........................................................................................................................................................ 3437
tSnowflakeRollback................................................................................................3438
tSnowflakeRollback Standard properties........................................................................................................... 3438
Related scenario: tSnowflakeRollback................................................................................................................ 3439
tSnowflakeRow....................................................................................................... 3440
tSnowflakeRow Standard properties....................................................................................................................3440
Querying data in a cloud file through a Snowflake external table and a materialized view......... 3443
Related scenario.......................................................................................................................................................... 3449
tSOAP....................................................................................................................... 3450
tSOAP Standard properties......................................................................................................................................3450
Fetching the country name information using a Web service................................................................... 3452
Using a SOAP message from an XML file to get country name information and saving the
information to an XML file..................................................................................................................................... 3454
tSocketInput............................................................................................................ 3458
tSocketInput Standard properties......................................................................................................................... 3458
Passing on data to the listening port................................................................................................................. 3460
tSocketOutput......................................................................................................... 3463
tSocketOutput Standard properties......................................................................................................................3463
Related Scenario..........................................................................................................................................................3464
tSortRow.................................................................................................................. 3465
tSortRow Standard properties................................................................................................................................ 3465
Sorting entries.............................................................................................................................................................. 3466
tSplitRow................................................................................................................. 3469
tSplitRow Standard properties............................................................................................................................... 3469
Splitting one row into two rows...........................................................................................................................3470
tSplunkEventCollector........................................................................................... 3474
tSplunkEventCollector Standard properties...................................................................................................... 3474
Related scenario.......................................................................................................................................................... 3475
tSQLDWHBulkExec................................................................................................. 3476
tSQLDWHBulkExec Standard properties.............................................................................................................3476
Related scenario.......................................................................................................................................................... 3480
tSQLDWHClose........................................................................................................3481
tSQLDWHClose Standard properties.................................................................................................................... 3481
Related scenario.......................................................................................................................................................... 3482
tSQLDWHCommit....................................................................................................3483
tSQLDWHCommit Standard properties............................................................................................................... 3483
Related scenario.......................................................................................................................................................... 3484
tSQLDWHConnection............................................................................................. 3485
tSQLDWHConnection Standard properties........................................................................................................ 3485
Related scenario.......................................................................................................................................................... 3487
tSQLDWHInput........................................................................................................3488
tSQLDWHInput Standard properties.................................................................................................................... 3488
Related scenario.......................................................................................................................................................... 3491
tSQLDWHOutput.....................................................................................................3492
tSQLDWHOutput Standard properties.................................................................................................................3492
Related scenario.......................................................................................................................................................... 3497
tSQLDWHRollback.................................................................................................. 3498
tSQLDWHRollback Standard properties..............................................................................................................3498
Related scenario.......................................................................................................................................................... 3499
tSQLDWHRow......................................................................................................... 3500
tSQLDWHRow Standard properties...................................................................................................................... 3500
Related scenario.......................................................................................................................................................... 3503
tSQLiteClose............................................................................................................3504
tSQLiteClose Standard properties.........................................................................................................................3504
Related scenarios........................................................................................................................................................ 3505
tSQLiteCommit........................................................................................................3506
tSQLiteCommit Standard properties.................................................................................................................... 3506
Related scenario.......................................................................................................................................................... 3507
tSQLiteConnection..................................................................................................3508
tSQLiteConnection Standard properties............................................................................................................. 3508
Related scenarios........................................................................................................................................................ 3509
tSQLiteInput............................................................................................................ 3510
tSQLiteInput Standard properties......................................................................................................................... 3510
Filtering SQlite data...................................................................................................................................................3512
tSQLiteOutput......................................................................................................... 3515
tSQLiteOutput Standard properties......................................................................................................................3515
Related Scenario..........................................................................................................................................................3519
tSQLiteRollback...................................................................................................... 3520
tSQLiteRollback Standard properties...................................................................................................................3520
Related scenarios........................................................................................................................................................ 3521
tSQLiteRow..............................................................................................................3522
tSQLiteRow Standard properties...........................................................................................................................3522
Updating SQLite rows............................................................................................................................................... 3525
Related scenarios........................................................................................................................................................ 3527
tSQLTemplate......................................................................................................... 3528
tSQLTemplate Standard properties...................................................................................................................... 3528
Related scenarios........................................................................................................................................................ 3530
tSQLTemplateAggregate....................................................................................... 3531
tSQLTemplateAggregate Standard properties..................................................................................................3531
Filtering and aggregating table columns directly on the DBMS...............................................................3533
tSQLTemplateCommit............................................................................................3537
tSQLTemplateCommit Standard properties...................................................................................................... 3537
Related scenario.......................................................................................................................................................... 3538
tSQLTemplateFilterColumns................................................................................. 3539
tSQLTemplateFilterColumns Standard properties.......................................................................................... 3539
Related Scenario..........................................................................................................................................................3540
tSQLTemplateFilterRows.......................................................................................3541
tSQLTemplateFilterRows Standard properties.................................................................................................3541
Related Scenario..........................................................................................................................................................3542
tSQLTemplateMerge.............................................................................................. 3543
tSQLTemplateMerge Standard properties..........................................................................................................3543
Merging data directly on the DBMS.................................................................................................................... 3545
tSQLTemplateRollback.......................................................................................... 3552
tSQLTemplateRollback Standard properties.....................................................................................................3552
Related scenarios........................................................................................................................................................ 3553
tSqoopExport.......................................................................................................... 3554
Additional arguments................................................................................................................................................ 3554
tSqoopExport Standard properties....................................................................................................................... 3555
Related scenarios........................................................................................................................................................ 3564
tSqoopImport.......................................................................................................... 3565
tSqoopImport Standard properties....................................................................................................................... 3565
Importing a MySQL table to HDFS.......................................................................................................................3574
tSqoopImportAllTables..........................................................................................3580
tSqoopImportAllTables Standard properties.....................................................................................................3580
Related scenarios........................................................................................................................................................ 3587
tSqoopMerge...........................................................................................................3588
tSqoopMerge Standard properties........................................................................................................................3588
Merging two datasets in HDFS..............................................................................................................................3595
tSQSConnection...................................................................................................... 3600
tSQSConnection Standard properties.................................................................................................................. 3600
Related scenarios........................................................................................................................................................ 3602
tSQSInput.................................................................................................................3603
tSQSInput Standard properties.............................................................................................................................. 3603
Retrieving messages from an Amazon SQS queue........................................................................................ 3606
tSQSMessageChangeVisibility...............................................................................3611
tSQSMessageChangeVisibility Standard properties........................................................................................3611
Related scenario.......................................................................................................................................................... 3613
tSQSMessageDelete............................................................................................... 3614
tSQSMessageDelete Standard properties...........................................................................................................3614
Related scenario.......................................................................................................................................................... 3616
tSQSOutput..............................................................................................................3617
tSQSOutput Standard properties...........................................................................................................................3617
Delivering messages to an Amazon SQS queue............................................................................................. 3620
tSQSQueueAttributes............................................................................................. 3626
tSQSQueueAttributes Standard properties........................................................................................................ 3626
Related scenario.......................................................................................................................................................... 3628
tSQSQueueCreate................................................................................................... 3629
tSQSQueueCreate Standard properties............................................................................................................... 3629
Related scenario.......................................................................................................................................................... 3631
tSQSQueueDelete................................................................................................... 3632
tSQSQueueDelete Standard properties...............................................................................................................3632
Related scenario.......................................................................................................................................................... 3634
tSQSQueueList........................................................................................................ 3635
tSQSQueueList Standard properties.....................................................................................................................3635
Listing Amazon SQS queues in an AWS region...............................................................................................3637
tSQSQueuePurge.................................................................................................... 3641
tSQSQueuePurge Standard properties................................................................................................................ 3641
Related scenario.......................................................................................................................................................... 3643
tSSH..........................................................................................................................3644
tSSH Standard properties.........................................................................................................................................3644
Displaying remote system information via SSH..............................................................................................3647
tStatCatcher.............................................................................................................3649
tStatCatcher Standard properties..........................................................................................................................3649
Displaying the statistics log of Job execution................................................................................................. 3650
tSVNLogInput..........................................................................................................3654
tSVNLogInput Standard properties.......................................................................................................................3654
Retrieving a log message from an SVN repository........................................................................................ 3655
tSybaseBulkExec.....................................................................................................3658
tSybaseBulkExec Standard properties.................................................................................................................3658
Related scenarios........................................................................................................................................................ 3662
tSybaseClose........................................................................................................... 3663
tSybaseClose Standard properties........................................................................................................................ 3663
Related scenario.......................................................................................................................................................... 3664
tSybaseCommit....................................................................................................... 3665
tSybaseCommit Standard properties....................................................................................................................3665
Related scenario.......................................................................................................................................................... 3666
tSybaseConnection................................................................................................. 3667
tSybaseConnection Standard properties.............................................................................................................3667
Related scenarios........................................................................................................................................................ 3668
tSybaseInput............................................................................................................3669
tSybaseInput Standard properties.........................................................................................................................3669
Related scenarios........................................................................................................................................................ 3672
tSybaseIQBulkExec................................................................................................. 3673
tSybaseIQBulkExec Standard properties............................................................................................................ 3673
Related scenarios........................................................................................................................................................ 3680
tSybaseIQOutputBulkExec.....................................................................................3681
tSybaseIQOutputBulkExec Standard properties...............................................................................................3681
Bulk-loading data to a Sybase IQ 12 database............................................................................................... 3685
Related scenarios........................................................................................................................................................ 3688
tSybaseOutput.........................................................................................................3689
tSybaseOutput Standard properties..................................................................................................................... 3689
Related scenarios........................................................................................................................................................ 3694
tSybaseOutputBulk.................................................................................................3695
tSybaseOutputBulk Standard properties............................................................................................................ 3695
Related scenarios........................................................................................................................................................ 3697
tSybaseOutputBulkExec.........................................................................................3698
tSybaseOutputBulkExec Standard properties................................................................................................... 3698
Related scenarios........................................................................................................................................................ 3702
tSybaseRollback......................................................................................................3703
tSybaseRollback Standard properties.................................................................................................................. 3703
Related scenarios........................................................................................................................................................ 3704
tSybaseRow............................................................................................................. 3705
tSybaseRow Standard properties.......................................................................................................................... 3705
Related scenarios........................................................................................................................................................ 3708
tSybaseSCD..............................................................................................................3709
tSybaseSCD Standard properties...........................................................................................................................3709
Related scenarios........................................................................................................................................................ 3712
tSybaseSCDELT....................................................................................................... 3713
tSybaseSCDELT Standard properties................................................................................................................... 3713
Related scenario.......................................................................................................................................................... 3717
tSybaseSP................................................................................................................ 3718
tSybaseSP Standard properties.............................................................................................................................. 3718
Related scenarios........................................................................................................................................................ 3720
tSystem.................................................................................................................... 3722
tSystem Standard properties...................................................................................................................................3722
Echoing 'Hello World!'...............................................................................................................................................3724
tTeradataClose........................................................................................................3726
tTeradataClose Standard properties.....................................................................................................................3726
Related scenarios........................................................................................................................................................ 3727
tTeradataCommit....................................................................................................3728
tTeradataCommit Standard properties................................................................................................................3728
Related scenario.......................................................................................................................................................... 3729
tTeradataConnection..............................................................................................3730
tTeradataConnection Standard properties.........................................................................................................3730
Related scenario.......................................................................................................................................................... 3732
tTeradataFastExport...............................................................................................3733
tTeradataFastExport Standard properties.......................................................................................................... 3733
Related scenarios........................................................................................................................................................ 3735
tTeradataFastLoad..................................................................................................3736
tTeradataFastLoad Standard properties............................................................................................................. 3736
Related scenarios........................................................................................................................................................ 3738
tTeradataFastLoadUtility....................................................................................... 3739
tTeradataFastLoadUtility Standard properties................................................................................................. 3739
Related scenario.......................................................................................................................................................... 3741
tTeradataInput........................................................................................................ 3742
tTeradataInput Standard properties.....................................................................................................................3742
Related scenarios........................................................................................................................................................ 3745
tTeradataMultiLoad................................................................................................3746
tTeradataMultiLoad Standard properties........................................................................................................... 3746
Related scenario.......................................................................................................................................................... 3748
tTeradataOutput..................................................................................................... 3749
tTeradataOutput Standard properties................................................................................................................. 3749
Related scenarios........................................................................................................................................................ 3754
tTeradataRollback.................................................................................................. 3755
tTeradataRollback Standard properties.............................................................................................................. 3755
Related scenario.......................................................................................................................................................... 3756
tTeradataRow..........................................................................................................3757
tTeradataRow Standard properties.......................................................................................................................3757
Related scenarios........................................................................................................................................................ 3761
tTeradataSCD.......................................................................................................... 3762
tTeradataSCD Standard properties....................................................................................................................... 3762
Related scenario.......................................................................................................................................................... 3765
tTeradataSCDELT....................................................................................................3766
tTeradataSCDELT Standard properties................................................................................................................3766
Related scenario.......................................................................................................................................................... 3770
tTeradataTPTExec.................................................................................................. 3771
tTeradataTPTExec Standard properties.............................................................................................................. 3771
Supported optional attributes for each consumer operator....................................................................... 3775
Loading data into a Teradata database............................................................................................................. 3776
tTeradataTPTUtility................................................................................................3783
tTeradataTPTUtility Standard properties........................................................................................................... 3783
Related scenario.......................................................................................................................................................... 3787
tTeradataTPump..................................................................................................... 3788
tTeradataTPump Standard properties................................................................................................................. 3788
Inserting data into a Teradata database table................................................................................................ 3790
tUniqRow................................................................................................................. 3794
tUniqRow Standard properties...............................................................................................................................3794
Deduplicating entries.................................................................................................................................................3795
tUnite....................................................................................................................... 3799
tUnite Standard properties...................................................................................................................................... 3799
Iterating on files and merge the content.......................................................................................................... 3800
tVectorWiseCommit................................................................................................3803
tVectorWiseCommit Standard properties........................................................................................................... 3803
Related scenario.......................................................................................................................................................... 3804
tVectorWiseConnection......................................................................................... 3805
tVectorWiseConnection Standard properties.................................................................................................... 3805
Related scenario.......................................................................................................................................................... 3806
tVectorWiseInput.................................................................................................... 3807
tVectorWiseInput Standard properties................................................................................................................ 3807
Related scenario.......................................................................................................................................................... 3810
tVectorWiseOutput................................................................................................. 3811
tVectorWiseOutput Standard properties.............................................................................................................3811
Related scenario.......................................................................................................................................................... 3815
tVectorWiseRollback.............................................................................................. 3816
tVectorWiseRollback Standard properties..........................................................................................................3816
Related scenario.......................................................................................................................................................... 3817
tVectorWiseRow......................................................................................................3818
tVectorWiseRow Standard properties.................................................................................................................. 3818
Related scenario.......................................................................................................................................................... 3821
tVerticaBulkExec.....................................................................................................3822
tVerticaBulkExec Standard properties.................................................................................................................3822
Related scenarios........................................................................................................................................................ 3827
tVerticaClose........................................................................................................... 3828
tVerticaClose Standard properties........................................................................................................................ 3828
Related scenarios........................................................................................................................................................ 3829
tVerticaCommit....................................................................................................... 3830
tVerticaCommit Standard properties....................................................................................................................3830
Related scenario.......................................................................................................................................................... 3831
tVerticaConnection................................................................................................. 3832
tVerticaConnection Standard properties.............................................................................................................3832
Related scenario.......................................................................................................................................................... 3833
tVerticaInput........................................................................................................... 3834
tVerticaInput Standard properties.........................................................................................................................3834
Related scenarios........................................................................................................................................................ 3837
tVerticaOutput........................................................................................................ 3838
tVerticaOutput Standard properties..................................................................................................................... 3838
Related scenarios........................................................................................................................................................ 3843
tVerticaOutputBulk.................................................................................................3844
tVerticaOutputBulk Standard properties............................................................................................................ 3844
Related scenarios........................................................................................................................................................ 3846
tVerticaOutputBulkExec.........................................................................................3847
tVerticaOutputBulkExec Standard properties................................................................................................... 3847
Related scenarios........................................................................................................................................................ 3851
tVerticaRollback......................................................................................................3852
tVerticaRollback Standard properties..................................................................................................................3852
Related scenario.......................................................................................................................................................... 3853
tVerticaRow............................................................................................................. 3854
tVerticaRow Standard properties.......................................................................................................................... 3854
Related scenario.......................................................................................................................................................... 3857
tVerticaSCD............................................................................................................. 3858
tVerticaSCD Standard properties...........................................................................................................................3858
Related scenarios........................................................................................................................................................ 3861
tVtigerCRMInput..................................................................................................... 3862
tVtigerCRMInput Standard properties................................................................................................................. 3862
Related scenarios........................................................................................................................................................ 3863
tVtigerCRMOutput.................................................................................................. 3864
tVtigerCRMOutput Standard properties.............................................................................................................. 3864
Related scenarios........................................................................................................................................................ 3866
tWaitForFile.............................................................................................................3867
tWaitForFile Standard properties.......................................................................................................................... 3867
Waiting for a file to be created and stopping the iteration loop after a message is triggered.......3869
Waiting for a file to be created and continuing the iteration loop after a message is triggered....3871
tWaitForSocket....................................................................................................... 3873
tWaitForSocket Standard properties.................................................................................................................... 3873
Related scenarios........................................................................................................................................................ 3874
tWaitForSqlData..................................................................................................... 3875
tWaitForSqlData Standard properties..................................................................................................................3875
Waiting for insertion of rows in a table............................................................................................................ 3876
tWarn........................................................................................................................3879
tWarn Standard properties.......................................................................................................................................3879
Related scenarios........................................................................................................................................................ 3880
tWebService.............................................................................................................3881
tWebService Standard properties..........................................................................................................................3881
Getting country names using tWebService....................................................................................................... 3883
tWebServiceInput................................................................................................... 3890
tWebServiceInput Standard properties............................................................................................................... 3890
Getting country names using tWebServiceInput.............................................................................................3892
tWorkdayInput........................................................................................................ 3895
tWorkdayInput Standard properties..................................................................................................................... 3895
Related scenario.......................................................................................................................................................... 3896
tWriteJSONField......................................................................................................3897
Configuring a JSON Tree.......................................................................................................................................... 3897
tWriteJSONField Standard properties.................................................................................................................. 3897
Writing flat data into JSON fields.........................................................................................................................3899
Related Scenarios........................................................................................................................................................3903
tWriteXMLField.......................................................................................................3904
tWriteXMLField Standard properties................................................................................................................... 3904
Extracting the structure of an XML file and inserting it into the fields of a database table...........3906
tXMLMap................................................................................................................. 3910
tXMLMap Standard properties............................................................................................................................... 3910
Mapping and transforming XML data................................................................................................................. 3911
Restructuring products data using multiple loop elements....................................................................... 3933
tXMLRPCInput.........................................................................................................3943
tXMLRPCInput Standard properties..................................................................................................................... 3943
Guessing the State name from an XMLRPC..................................................................................................... 3944
tXSDValidator......................................................................................................... 3946
tXSDValidator Standard properties...................................................................................................................... 3946
Validating data flows against an XSD file........................................................................................................ 3948
tXSLT........................................................................................................................3953
tXSLT Standard properties.......................................................................................................................................3953
Transforming XML to html using an XSL stylesheet.................................................................................... 3954
Copyleft
Copyleft
Adapted for 7.3.1. Supersedes previous releases.
The content of this document is correct at the time of publication.
However, more recent updates may be available in the online version that can be found on Talend
Help Center.
This documentation is provided under the terms of the Creative Commons Public License (CCPL).
For more information about what you can and cannot do with this documentation in accordance with
the CCPL, please read: http://creativecommons.org/licenses/by-nc-sa/2.0/.
Notices
Talend is a trademark of Talend, Inc.
All brands, product names, company names, trademarks and service marks are the properties of their
respective owners.
License Agreement
The software described in this documentation is licensed under the Apache License, Version 2.0 (the
"License"); you may not use this software except in compliance with the License. You may obtain
a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.html. Unless required by
applicable law or agreed to in writing, software distributed under the License is distributed on an "AS
IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under the License.
This product includes software developed at AOP Alliance (Java/J2EE AOP standards), ASM, Amazon,
AntlR, Apache ActiveMQ, Apache Ant, Apache Avro, Apache Axiom, Apache Axis, Apache Axis 2,
Apache Batik, Apache CXF, Apache Cassandra, Apache Chemistry, Apache Common Http Client, Apache
Common Http Core, Apache Commons, Apache Commons Bcel, Apache Commons JxPath, Apache
Commons Lang, Apache Datafu, Apache Derby Database Engine and Embedded JDBC Driver, Apache
Geronimo, Apache HCatalog, Apache Hadoop, Apache Hbase, Apache Hive, Apache HttpClient, Apache
HttpComponents Client, Apache JAMES, Apache Log4j, Apache Lucene Core, Apache Neethi, Apache
Oozie, Apache POI, Apache Parquet, Apache Pig, Apache PiggyBank, Apache ServiceMix, Apache
Sqoop, Apache Thrift, Apache Tomcat, Apache Velocity, Apache WSS4J, Apache WebServices Common
Utilities, Apache Xml-RPC, Apache Zookeeper, Box Java SDK (V2), CSV Tools, Cloudera HTrace,
ConcurrentLinkedHashMap for Java, Couchbase Client, DataNucleus, DataStax Java Driver for Apache
Cassandra, Ehcache, Ezmorph, Ganymed SSH-2 for Java, Google APIs Client Library for Java, Google
Gson, Groovy, Guava: Google Core Libraries for Java, H2 Embedded Database and JDBC Driver, Hector:
A high level Java client for Apache Cassandra, Hibernate BeanValidation API, Hibernate Validator,
HighScale Lib, HsqlDB, Ini4j, JClouds, JDO-API, JLine, JSON, JSR 305: Annotations for Software Defect
Detection in Java, JUnit, Jackson Java JSON-processor, Java API for RESTful Services, Java Agent for
Memory Measurements, Jaxb, Jaxen, JetS3T, Jettison, Jetty, Joda-Time, Json Simple, LZ4: Extremely
Fast Compression algorithm, LightCouch, MetaStuff, Metrics API, Metrics Reporter Config, Microsoft
Azure SDK for Java, Mondrian, MongoDB Java Driver, Netty, Ning Compression codec for LZF encoding,
OpenSAML, Paraccel JDBC Driver, Parboiled, PostgreSQL JDBC Driver, Protocol Buffers - Google's
data interchange format, Resty: A simple HTTP REST client for Java, Riak Client, Rocoto, SDSU Java
Library, SL4J: Simple Logging Facade for Java, SQLite JDBC Driver, Scala Lang, Simple API for CSS,
Snappy for Java a fast compressor/decompresser, SpyMemCached, SshJ, StAX API, StAXON - JSON via
StAX, Super SCV, The Castor Project, The Legion of the Bouncy Castle, Twitter4J, Uuid, W3C, Windows
Azure Storage libraries for Java, Woden, Woodstox: High-performance XML processor, Xalan-J, Xerces2,
77
Copyleft
XmlBeans, XmlSchema Core, Xmlsec - Apache Santuario, YAML parser and emitter for Java, Zip4J,
atinject, dropbox-sdk-java: Java library for the Dropbox Core API, google-guice. Licensed under their
respective license.
78
tAccessBulkExec
tAccessBulkExec
Offers gains in performance when carrying out Insert operations in an Access database.
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data
to a delimited file and then to perform various actions on the file in an Access database, in a two step
process. These two steps are fused together in the tAccessOutputBulkExec component, detailed in a
separate section. The advantage of using a two step process is that it makes it possible to carry out
transformations on the data before loading it in the database.
This component executes an Insert action on the data provided.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
79
tAccessBulkExec
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create table: The table is removed and created
again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear table: The table content is deleted.
Table Name of the table to be written. Note that only one table
can be written at a time and that the table must exist
already for the insert operation to succeed.
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
80
tAccessBulkExec
Advanced settings
Include header Select this check box to include the column header.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Limitation If you are using an ODBC driver, make sure that your JVM
and ODBC versions match up: both 64-bit or 32-bit.
Related scenarios
For use cases in relation with tAccessBulkExec, see the following scenarios:
• Inserting transformed data in MySQL database on page 2482
• Inserting data in bulk in MySQL database on page 2489
81
tAccessClose
tAccessClose
Closes an active connection to the Access database so as to release occupied resources.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Advanced settings
tStat Catcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
82
tAccessClose
Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
Related scenarios
No scenario is available for the Standard version of this component yet.
83
tAccessCommit
tAccessCommit
Commits in one go a global transaction instead of doing that on every row or every batch, and
provides gain in performance, using a unique connection.
tAccessCommit validates the data processed through the Job into the connected database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.
Warning:
If you want to use a Row > Main connection to link
tAccessCommit to your Job, your data will be committed
row by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component is more commonly used with other tAccess*
components, especially with the tAccessConnection and
tAccessRollback components.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
84
tAccessCommit
Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
Related scenario
For tAccessCommit related scenario, see Inserting data in mother/daughter tables on page 2426
85
tAccessConnection
tAccessConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tAccessConnection opens a connection to the database for a current transaction.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.
86
tAccessConnection
Advanced settings
Usage
Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
When working with Java 8, this component supports only
the General collation mode of Access.
87
tAccessConnection
• Drop the following components from the Palette to the design workspace: tFileList, tFileInputDeli
mited, tMap, tAccessOutput (two), tAccessInput (two), tAccessCommit, tAccessClose and tLogRow
(x2).
• Connect the tFileList component to the input file component using an Iterate link. Thus, the name
of the file to be processed will be dynamically filled in from the tFileList directory using a global
variable.
• Connect the tFileInputDelimited component to the tMap component and dispatch the flow
between the two output Access components. Use a Row link for each of these connections
representing the main data flow.
• Set the tFileList component properties, such as the directory where files will be fetched from.
• Add a tAccessConnection component and connect it to the starter component of this Job. In this
example, the tFileList component uses an OnComponentOk link to define the execution order.
• In the tAccessConnection Component view, set the connection details manually or fetch them
from the Repository if you centrally store them as a Metadata DB connection entry. For more
information about Metadata, see Talend Studio User Guide .
• In the tFileInputDelimited component's Basic settings view, press Ctrl+Space bar to access the
variable list. Set the File Name field to the global variable: tFileList_1.CURRENT_FILEPATH. For
more information about using variables, see Talend Studio User Guide.
• Set the rest of the fields as usual, defining the row and field separators according to your file
structure.
• Then set the schema manually through the Edit schema dialog box or select the schema from the
Repository . Make sure the data type is correctly set, in accordance with the nature of the data
processed.
• In the tMap Output area, add two output tables, one called Name for the Name table, the second
called Birthday, for the Birthday table. For more information about the tMap component, see
Talend Studio User Guide.
• Drag the Name column from the Input area, and drop it to the Name table.
• Drag the Birthday column from the Input area, and drop it to the Birthday table.
88
tAccessConnection
• Then connect the output row links to distribute the flow correctly to the relevant DB output
components.
• In each of the tAccessOutput components' Basic settings view, select the Use an existing
connection check box to retrieve the tAccessConnection details.
• Set the Table name making sure it corresponds to the correct table, in this example either Name
or Birthday.
• There is no action on the table as they are already created.
• Select Insert as Action on data for both output components.
• Click on Sync columns to retrieve the schema set in the tMap.
• Then connect the first tAccessOutput component to the first tAccessInput component using an
OnComponentOk link.
• In each of the tAccessInput components' Basic settings view, select the Use an existing
connection check box to retrieve the distributed data flow. Then set the schema manually through
Edit schema dialog box.
• Then set the Table Name accordingly. In tAccessInput_1, this will be Name.
• Click on the Guess Query.
• Connect each tAccessInput component to tLogRow component with a Row > Main link. In each of
the tLogRow components' basic settings view, select Table in the Mode field.
• Add the tAccessCommit component below the tFileList component in the design workspace and
connect them together using an OnComponentOk link in order to terminate the Job with the trans
action commit.
• In the basic settings view of tAccessCommit component and from the Component list, select the
connection to be used, tAccessConnection_1 in this scenario.
• Save your Job and press F6 to execute it.
89
tAccessConnection
The parent table Table1 is reused to generate the Name table and Birthday table.
90
tAccessInput
tAccessInput
Reads a database and extracts fields based on a query.
tAccessInput executes a DB query with a strictly defined statement which must correspond to
the schema definition. Then it passes on the field list to the next component via a Row > Main
connection.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
91
tAccessInput
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
92
tAccessInput
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
Global Variables
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all possible SQL queries.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
93
tAccessInput
Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
When working with Java 8, this component supports only
the General collation mode of Access.
Related scenarios
For related topics, see:
Related topic in description of tContextLoad on page 496.
94
tAccessOutput
tAccessOutput
Writes, updates, makes changes or suppresses entries in a database.
tAccessOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
95
tAccessOutput
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create table: The table is removed and created
again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear table: The table content is deleted.
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries.
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.
96
tAccessOutput
Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You can
do that by clicking Edit Schema and selecting the check
box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings
view where you can simultaneously define primary keys for
the update and delete operations. To do that: Select the
Use field options check box and then in the Key in update
column, select the check boxes next to the column name on
which you want to base the update operation. Do the same
in the Key in delete column for the deletion operation.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
97
tAccessOutput
Note:
You can press Ctrl+Space to access a list of predefined
global variables.
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Use field options Select this check box to customize a request, especially
when there is double action on data.
Debug query mode Select this check box to display each step during processing
entries in a database.
Support null in "SQL WHERE" statement Select this check box if you want to deal with the Null
values contained in a DB table.
Note:
Make sure the Nullable check box is selected for the c
orresponding columns in the schema.
Global Variables
98
tAccessOutput
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a Access database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMysqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
When working with Java 8, this component supports only
the General collation mode of Access.
99
tAccessOutput
Related scenarios
For related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.
100
tAccessOutputBulk
tAccessOutputBulk
Prepares the file which contains the data used to feed the Access database.
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data
to a delimited file and then to perform various actions on the file in an Access database, in a two step
process. These two steps are fused together in the tAccessOutputBulkExec component, detailed in a
separate section. The advantage of using a two step process is that it makes it possible to carry out
transformations on the data before loading it in the database.
tAccessOutputBulk writes a delimited file.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
File Name Name and path to the file to be created and/or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
Create directory if not exists Select this check box to create the as yet non-existant file d
irectory that specified in the File name field.
Append Select this check box to add any new rows to the end of the
file.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
101
tAccessOutputBulk
Advanced settings
Include header Select this check box to include the column header in the
file.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
102
tAccessOutputBulk
Usage
Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
When working with Java 8, this component supports only
the General collation mode of Access.
Related scenarios
For use cases in relation with tAccessOutputBulk, see the following scenarios:
• Inserting transformed data in MySQL database on page 2482
• Inserting data in bulk in MySQL database on page 2489
103
tAccessOutputBulkExec
tAccessOutputBulkExec
Executes an Insert action on the data provided, in an Access database.
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data
to a delimited file and then to perform various actions on the file in an Access database, in a two step
process. These two steps are fused together in tAccessOutputBulkExec.
As a dedicated component, tAccessOutputBulkExec improves performance during Insert operations in
an Access database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
104
tAccessOutputBulkExec
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create table: The table is removed and created
again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
already exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear table: The table content is deleted.
Note:
Note that only one table can be written at a time and
that the table must already exist for the insert operation
to succeed
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
105
tAccessOutputBulkExec
Create directory if not exists Select this check box to create the as yet non existant file d
irectory specified in the File name field.
Append Select this check box to append new rows to the end of the
file.
Advanced settings
Note:
You can press Ctrl+Space to access a list of predefined
global variables.
Include header Select this check box to include the column header to the
file.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
tStatCatcher Statistics Select this check box to collect the log data at the
component level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
106
tAccessOutputBulkExec
Limitation If you are using an ODBC driver, make sure that your JVM
and ODBC versions match up: both 64-bit or 32-bit.
Related scenarios
For use cases in relation with tAccessOutputBulkExec, see the following scenarios:
• Inserting data in bulk in MySQL database on page 2489
• Inserting transformed data in MySQL database on page 2482
107
tAccessRollback
tAccessRollback
Cancels the transaction commit in the connected database and avoids to commit part of a transaction
involuntarily.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component is more commonly used with other tAccess*
components, especially with the tAccessConnection and
tAccessCommit components.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
108
tAccessRollback
Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
Related scenarios
No scenario is available for the Standard version of this component yet.
109
tAccessRow
tAccessRow
Executes the SQL query stated onto the specified database.
Depending on the nature of the query and the database, tAccessRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL
statements. tAccessRow is the specific component for this database query. The row suffix means the
component implements a flow in the job design although it does not provide output.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
DB Version Select the Access database version that you are using.
110
tAccessRow
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Table Name Name of the source table where changes made to data
should be captured.
Query type The query can be Built-in for a particular Job, or for
commonly used query, it can be stored in the Repository to
ease the query reuse.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
111
tAccessRow
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
112
tAccessRow
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
When working with Java 8, this component supports only
the General collation mode of Access.
Related scenarios
For related topics, see:
• Procedure on page 622
• Removing and regenerating a MySQL table index on page 2497.
113
tAddCRCRow
tAddCRCRow
Provides a unique ID which helps improving the quality of processed data. CRC stands for Cyclical
Redundancy Checking.
tAddCRCRow calculates a surrogate key based on one or several columns and adds it to the defined
schema.
Basic settings
Schema and Edit Schema A schema is a row description. it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
In this component, a new CRC column is automatically
added.
Implication Select the check box facing the relevant columns to be used
for the surrogate key checksum.
Advanced Settings
CRC type Select a CRC type in the list. The longer the CRC, the least
overlap you will have.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
114
tAddCRCRow
Usage
115
tAddCRCRow
2. Create the schema through the Edit Schema button, if the schema is not stored already in the
Repository . Remember to set the data type column and for more information on the Date pattern
to be filled in, visit http://docs.oracle.com/javase/6/docs/api/index.html .
Notice that a CRC column (read-only) has been added at the end of the schema.
2. Select CRC32 as CRC Type to get a longer surrogate key.
3. In the Basic settings view of tLogRow, select the Print values in cells of a table option to display
the output data in a table on the Console.
Job execution
Then save your Job and press F6 to execute it.
116
tAddCRCRow
An additional CRC Column has been added to the schema calculated on all previously selected
columns (in this case all columns of the schema).
117
tAddLocationFromIP
tAddLocationFromIP
Replaces IP addresses with geographical locations.
tAddLocationFromIP geolocates visitors through their IP addresses: this component identifies visitors'
geographical locations (country, region, city, latitude, longitude, ZIP code, etc.) using an IP address
lookup database file.
Basic settings
Schema and Edit schema A schema is a row description, it defines the fields to be
processed and passed on to the next component. The
schema of this component is read-only.
Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.
Input parameters Input column: Select the input column from which the input
values are to be taken.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
118
tAddLocationFromIP
Usage
119
tAddLocationFromIP
3. Click OK to close the dialog box, and accept propagating the changes when prompted by the
system. The defined column is displayed in the Values panel of the Basic settings view.
4. In the Number of rows field, enter the number of rows to be generated, and click in the Value cell
and set the value for the IP address.
5. In the design workspace, select tAddLocationFromIP and click the Component tab to define the
basic settings for tAddLocationFromIP.
6. Click the Sync columns button to synchronize the schema with the input schema set with
tFixedFlowInput.
7. Browse to the GeoIP.dat file to set its path in the Database filepath field.
120
tAddLocationFromIP
Note:
Ensure to download the latest version of the IP address lookup database file from the relevant
site as indicated in the Basic settings view of tAddLocationFromIp.
8. In the Input parameters panel, set your input parameters as needed. In this scenario, the input
column is the ip column defined earlier that holds an IP address.
9. In the Location type panel, set location type as needed. In this scenario, we want to display the
country name.
10. In the design workspace, select tLogRow and click the Component tab and define the basic
settings for tLogRow as needed. In this scenario, we want to display values in cells of a table.
Results
One row is generated to display the country name that is associated with the set IP address.
121
tAdvancedFileOutputXML
tAdvancedFileOutputXML
Writes an XML file with separated data values according to an XML tree structure.
tAdvancedFileOutputXML outputs data to an XML type of file and offers an interface to deal with loop
and group by elements if needed.
Basic settings
Use Output Stream Select this check box process the data flow of interest. Once
you have selected it, the Output Stream field displays and
you can type in the data flow of interest.
The data flow to be processed must be added to the flow
in order for this component to fetch these data via the
corresponding representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using
along with this component; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.
For further information about how to use a stream, see
Reading data from a remote file in streaming mode on page
1020.
File name Name or path to the output file and/or the variable to be
used.
This field becomes unavailable once you have selected the
Use Output Stream check box.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
Configure XML tree Opens the dedicated interface to help you set the XML
mapping. For details about the interface, see Defining the
XML tree on page 125.
122
tAdvancedFileOutputXML
Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either built-in or remote in the
Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the Output component.
Append the source xml file Select this check box to add the new lines at the end of
your source XML file.
Generate compact file Select this check box to generate a file that does not have
any empty space or line separators. All elements then are
presented in a unique line and this will reduce considerably
file size.
Include DTD or XSL Select this check box to to add the DOCTYPE declaration,
indicating the root element, the access path and the DTD
file, or to add the processing instruction, indicating the
type of stylesheet used (such as XSL types), along with the
access path and file name.
Advanced settings
Split output in several files If the XML file output is big, you can split the file every
certain number of rows.
Trim data This check box is activated when you are using the dom4j
generation mode. Select this check box to trim the leading
or trailing whitespace from the value of a XML element.
Create directory only if not exists This check box is selected by default. It creates a directory
to hold the output XML files if required.
123
tAdvancedFileOutputXML
Create empty element if needed This box is selected by default. If no column is associated
to an XML node, this option will create an open/close tag in
place of the expected tag.
Create attribute even if its value is NULL Select this check box to generate XML tag attribute for the
associated input column whose value is null.
Create attribute even if it is unmapped Select this check box to generate XML tag attribute for the
associated input column that is unmapped.
Create associated XSD file If one of the XML elements is defined as a Namespace
element, this option will create the corresponding XSD file.
Note:
To use this option, you must select Dom4J as the
generation mode.
Add Document type as node Select this check box to add column(s) of the Document
type as node(s) instead of escaped string(s) in the output
XML file.
This check box appears only when the generation mode
is set to Slow and memory-consuming (Dom4j) in the
Advanced settings tab.
Advanced separator (for number) Select this check box to change the expected data s
eparator.
Thousands separator: define the thousands separator,
between inverted commas
Decimal separator: define the decimals separator between
inverted commas
Note:
This option allows you to use dom4j to process the
XML files of high complexity.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
Don't generate empty file Select the check box to avoid the generation of an empty
file.
tStatCatcher Statistics Select the check box to collect the log data at a Job level as
well as at each component level.
124
tAdvancedFileOutputXML
Global Variables
Usage
Usage rule Use this component to write an XML file with data passed
on from other components using a Row link.
125
tAdvancedFileOutputXML
To the left of the mapping interface, under Schema List, all of the columns retrieved from the
incoming data flow are listed (only if an input flow is connected to the tAdvancedFileOutputXML
component).
To the right of the interface, define the XML structure you want to obtain as output.
You can easily import the XML structure or create it manually, then map the input schema columns
onto each corresponding element of the XML tree.
Procedure
1. Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2. Right-click on the root tag to display the contextual menu.
3. On the menu, select Import XML tree.
4. Browse to the file to import and click OK.
• You can import an XML tree from files in XML, XSD and DTD formats.
• When importing an XML tree structure from an XSD file, you can choose an element as the
root of your XML tree.
The XML Tree column is hence automatically filled out with the correct elements.
5. If you need to add or remove an element or sub-elements, right-click the relevant element of the
tree to display the contextual menu.
6. Select Delete to remove the selection from the tree or select the relevant option among: Add sub-
element, Add attribute, Add namespace to enrich the tree.
Procedure
1. Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2. Right-click on the root tag to display the contextual menu.
3. On the menu, select Add sub-element to create the first element of the structure.
4. If you need to add an attribute or a child element to any element or remove any element, right-
click the left of the corresponding element name to display the contextual menu.
5. Right-click to the left of the element name to display the contextual menu.
6. On the menu, select the relevant option among: Add sub-element, Add attribute, Add namespace
or Delete.
126
tAdvancedFileOutputXML
Procedure
1. Click on one of the Schema column name.
2. Drag it onto the relevant sub-element to the right.
3. Release to implement the actual mapping.
4. If you need to disconnect any mapping on any element of the XML tree, select the element and
right-click to the left of the element name to display the contextual menu.
5. Select Disconnect linker.
Procedure
1. Select the relevant element on the XML tree.
2. Right-click to the left of the element name to display the contextual menu.
3. Select Set as Loop Element.
Results
The Node Status column shows the newly added status.
There can only be one loop element at a time.
127
tAdvancedFileOutputXML
Procedure
1. Select the relevant element on the XML tree.
2. Right-click to the left of the element name to display the contextual menu.
3. Select Set as Group Element.
Results
The Node Status column shows the newly added status and any group status required are
automatically defined, if needed.
Click OK once the mapping is complete to validate the definition and continue the job configuration
where needed.
128
tAdvancedFileOutputXML
5. Select the Property type, according to whether you stored the file description in the Repository or
not. If you dragged & dropped the component directly from the Metadata, no changes to the setti
ng should be needed.
If you didn't setup the file description in the Repository, then select Built-in and manually fill out
the fields displayed on the Basic settings vertical tab.
The input file contains the following type of columns separated by semi-colons: id, name, category,
year, language, director and cast.
In this simple use case, the Cast field gathers different values and the id increments when
changing movie.
6. If needed, define the tFileDelimitedInput schema according to the file structure.
7. Once you checked that the schema of the input file meets your expectation, click on OK to
validate.
129
tAdvancedFileOutputXML
2. In the File Name field, browse to the file to be written if it exists or type in the path and file name
that needs to be created for the output.
By default, the schema (file description) is automatically propagated from the input flow. But you
can edit it if you need.
3. Then click on the three-dot button or double-click on the tAdvancedFileOutputXML component
on the design workspace to open the dedicated mapping editor.
To the left of the interface, are listed the columns from the input file description.
4. To the right of the interface, set the XML tree panel to reflect the expected XML structure output.
You can create the structure node by node. For more information about the manual creation of an
XML tree, see Defining the XML tree on page 125.
In this example, an XML template is used to populate the XML tree automatically.
5. Right-click on the root tag displaying by default and select Import XML tree at the end of the
contextual menu options.
6. Browse to the XML file to be imported and click OK to validate the import operation.
Note:
You can import an XML tree from files in XML, XSD and DTD formats.
7. Then drag & drop each column name from the Schema List to the matching (or relevant) XML tree
elements as described in Mapping XML data on page 127.
The mapping is shown as blue links between the left and right panels.
130
tAdvancedFileOutputXML
Finally, define the node status where the loop should take place. In this use case, the Cast being
the changing element on which the iteration should operate, this element will be the loop
element.
Right-click on the Cast element on the XML tree, and select Set as loop element.
8. To group by movie, this use case needs also a group element to be defined.
Right-click on the Movie parent node of the XML tree, and select Set as group element.
The newly defined node status show on the corresponding element lines.
9. Click OK to validate the configuration.
10. Press F6 to execute the Job.
131
tAdvancedFileOutputXML
132
tAggregateRow
tAggregateRow
Receives a flow and aggregates it based on one or more columns.
For each output line, are provided the aggregation key and the relevant result of set operations (min,
max, sum...).
tAggregateRow helps to provide a set of metrics based on values or calculations.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
133
tAggregateRow
Operations Select the type of operation along with the value to use for
the calculation and the output field.
Advanced settings
Delimiter(only for list operation) Enter the delimiter you want to use to separate the different
operations.
Use financial precision, this is the max precision for "sum" Select this check box to use a financial precision. This is a
and "avg" operations, checked option heaps more memory max precision but consumes more memory and slows the
and slower than unchecked. processing.
Warning:
We advise you to use the BigDecimal type for the output in
order to obtain precise results.
134
tAggregateRow
Check type overflow (slower) Checks the type of data to ensure that the Job doesn't
crash.
Check ULP (Unit in the Last Place), ensure that a value will Select this check box to ensure the most precise results
be incremented or decremented correctly, only float and possible for the Float and Double types.
double types. (slower)
tStatCatcher Statistics Check this box to collect the log data at component level.
Note that this check box is not available in the Map/Reduce
version of the component.
Global Variables
Usage
135
tAggregateRow
Procedure
1. Create a new Job and add a tFixedFlowInput component, a tAggregateRow component, a
tSortRow component, and a tLogRow component by typing their names in the design workspace
or dropping them from the Palette.
2. Link the tFixedFlowInput component to the tAggregateRow component using a Row > Main
connection.
3. Do the same to link the tAggregateRow component to the tSortRow component, and the tSortRow
component to the tLogRow component.
Procedure
1. Double-click the tFixedFlowInput component to open its Basic settings view.
2. Click the button next to Edit schema to open the schema dialog box and define the schema by
adding two columns, name of String type and score of Double type. When done, click OK to save
the changes and close the schema dialog box.
3. In the Mode area, select Use Inline Content (delimited file) and in the Content field displayed,
enter the following input data:
Peter;92
James;93
Thomas;91
Peter;94
James;96
Thomas;95
Peter;96
James;92
Thomas;98
Peter;95
James;96
Thomas;93
Peter;98
James;97
Thomas;95
136
tAggregateRow
5. Click the button next to Edit schema to open the schema dialog box and define the schema by
adding five columns, name of String type, and sum, average, max, and min of Double type.
When done, click OK to save the changes and close the schema dialog box.
6. Add one row in the Group by table by clicking the button below it, and select name from both
the Output column and Input column position column fields to group the input data by the name
column.
7. Add four rows in the Operations table and define the operations to be carried out. In this example,
the operations are sum, average, max, and min. Then select score from all four Input column po
sition column fields to aggregate the input data based on it.
8. Double-click the tSortRow component to open its Basic settings view.
137
tAggregateRow
9. Add one row in the Criteria table and specify the column based on which the sort operation is
performed. In this example, it is the name column. Then select alpha from the sort num or alpha?
column field and asc from the Order asc or desc? column field to sort the aggregated data in
ascending alphabetical order.
10. Double-click the tLogRow component to open its Basic settings view, and then select Table (print
values in cells of a table) in the Mode area for better readability of the result.
Procedure
1. Press Ctrl + S to save the Job.
2. Press F6 to execute the Job.
Results
As shown above, the students' comprehensive scores are aggregated and then sorted in ascending
alphabetical order based on the student names.
138
tAggregateSortedRow
tAggregateSortedRow
Aggregates the sorted input data for output column based on a set of operations. Each output column
is configured with many rows as required, the operations to be carried out and the input column from
which the data will be taken for better data aggregation.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.
Input rows count Specify the number of rows that are sent to the
tAggregateSortedRow component.
Note:
If you specified a Limit for the number of rows to be
processed in the input component, you will have to use
that same limit in the Input rows count field.
139
tAggregateSortedRow
Operations Select the type of operation along with the value to use for
the calculation and the output field.
Advanced settings
tStatCatcher Statistics Check this box to collect the log data at component level.
140
tAggregateSortedRow
Global Variables
Usage
141
tAggregateSortedRow
Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.
2. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
four columns: Id and Age of Integer type, and Name and Team of String type.
Click OK to close the schema editor and accept the propagation prompted by the pop-up dialog
box.
142
tAggregateSortedRow
3. In the Mode area, select Use Inline Content(delimited file), and then in the Content field
displayed, enter the input data to be sorted and aggregated. In this example, the input data is as
follows:
1;Thomas;28;Component Team
2;Harry;32;Doc Team
3;John;26;Component Team
4;Nicolas;27;QA Team
5;George;24;Component Team
6;Peter;30;Doc Team
7;Teddy;23;QA Team
8;James;26;Component Team
5. Click the [+] button below the Criteria table to add as many rows as required and then specify
the sorting criteria in the table. In this example, two rows are added, and the input entries will be
sorted based on the column Team and then the column Age, both in ascending order.
6. Double-click the first tLogRow to open its Basic settings view.
7. In the Mode area, select Table (print values in cells of a table) for better readability of the sorting
result.
Procedure
1. Double-click tAggregateSortedRow to open its Basic settings view.
143
tAggregateSortedRow
2. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
five columns: AggTeam of String type, AggCount, MinAge, MaxAge, and AvgAge of Integer type.
Click OK to close the schema editor and accept the propagation prompted by the pop-up dialog
box.
3. In the Input rows count field, enter the exact number of rows of the input data. In this example, it
is 8.
4. Click the [+] button below the Group by table to add as many rows as required and specify the
aggregation set in the table. In this example, the data will be aggregated based on the input
column Team.
5. Click the [+] button below the Operations table to add as many rows as required and specify the
operation to be carried out and the corresponding input column from which the data will be taken
for each output column. In this example, we want to calculate the number of the input entries, the
minimum age, the maximum age, and the average age for each team.
6. Double-click the second tLogRow to open its Basic settings view.
144
tAggregateSortedRow
7. In the Mode area, select Table (print values in cells of a table) for better readability of the sorting
result.
As shown above, the input entries are sorted based on the column Team and then the column Age,
both in ascending order, and the sorted entries are then aggregated based on the column Team.
145
tAmazonAuroraClose
tAmazonAuroraClose
Closes an active connection to an Amazon Aurora database instance to release the occupied
resources.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Usage rule This component is more commonly used with other Amazon
Aurora components, especially with the tAmazonAuroraC
onnection and tAmazonAuroraCommit components.
146
tAmazonAuroraClose
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Related scenario
For a related scenario, see Handling data with Amazon Aurora on page 156.
147
tAmazonAuroraCommit
tAmazonAuroraCommit
Commits in one go a global transaction instead of doing that on every row or every batch, and
provides gain in performance, using a unique connection.
tAmazonAuroraCommit validates the data processed through the Job into the connected Amazon
Aurora database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection This check box is selected by default and it allows you
to close the database connection once the commit is
done. Clear this check box to continue to use the selected
connection after the component has performed its task.
Warning:
If you want to use a Row > Main connection to link
tAmazonAuroraCommit to your Job, your data will be
committed row by row. In this case, do not select the Close
Connection check box or your connection will be closed
before the end of your first row commit.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
148
tAmazonAuroraCommit
Usage
Usage rule This component is more commonly used with other Amazon
Aurora components, especially with the tAmazonAuroraC
onnection and tAmazonAuroraRollback components.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Related scenario
For a related scenario, see Handling data with Amazon Aurora on page 156.
149
tAmazonAuroraConnection
tAmazonAuroraConnection
Opens a connection to an Amazon Aurora database instance that can then be reused by other Amazon
Aurora components.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Additional JDBC parameters Specify additional connection properties for the database
connection you are creating.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
150
tAmazonAuroraConnection
Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box disappears when the Use or register a shared
DB Connection check box is selected.
Data source alias Type in the alias of the data source created on the Talend
Runtime side.
This field appears only when the Specify a data source alias
check box is selected.
Advanced settings
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
151
tAmazonAuroraConnection
Usage
Usage rule This component is more commonly used with other Amazon
Aurora components, especially with the tAmazonAuroraC
ommit and tAmazonAuroraRollback components.
Related scenario
For a related scenario, see Handling data with Amazon Aurora on page 156.
152
tAmazonAuroraInput
tAmazonAuroraInput
Reads an Amazon Aurora database and extracts fields based on a query.
tAmazonAuroraInput executes a database query with a strictly defined order which must correspond
to the schema definition. Then it passes on the field list to the next component via a Row >Main link.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
153
tAmazonAuroraInput
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Query Type and Query Enter the database query paying particularly attention to
the proper sequence of the fields in order to match the
schema definition.
Guess Query Click the button to generate the query which corresponds to
the table schema in the Query field.
Guess schema Click the button to retrieve the schema from the table.
Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box disappears when the Use an existing
connection check box is selected.
Data source alias Type in the alias of the data source created on the Talend
Runtime side.
154
tAmazonAuroraInput
This field appears only when the Specify a data source alias
check box is selected.
Advanced settings
Additional JDBC parameters Specify additional connection properties for the database
connection you are creating. When you need to handle
data of the time-stamp type 0000-00-00 00:00:00 using
this component, set the parameter to noDatetimeStri
ngSync=true&zeroDateTimeBehavior=convertT
oNull.
This field disappears when the Use an existing connection
check box in the Basic settings view is selected.
Enable stream Select this check box to enable streaming over buffering
which allows the code to read from a large table without
consuming a large amount of memory in order to optimize
the performance.
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
Trim column Select the check box(es) in the Trim column to remove
leading and trailing whitespace from the corresponding
column(s).
This option disappears when the Trim all the String/Char
columns check box is selected.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
155
tAmazonAuroraInput
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
156
tAmazonAuroraInput
157
tAmazonAuroraInput
Procedure
1. Double-click tAmazonAuroraConnection to open its Basic settings view.
2. In the Host, Port, Database, Username and Password fields, enter the information required for the
connection to Amazon Aurora.
Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.
158
tAmazonAuroraInput
2. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
three columns: id of Integer type, and name and city of String type.
Click OK to validate the changes and accept the propagation prompted by the pop-up dialog box.
3. In the Mode area, select Use Inline Content (delimited file) and enter the following user
information in the Content field.
1;George;Bismarck
2;Abraham;Boise
3;Taylor;Nashville
4;William;Jefferson City
5;Alexander;Jackson
6;James;Boise
7;Gerald;Little Rock
8;Tony;Richmond
9;Thomas;Springfield
10;Andre;Nashville
159
tAmazonAuroraInput
5. Select the Use an existing connection check box and in the Component List that appears, select
the connection component you have configured.
6. In the Table field, enter or browse to the table into which you want to write the data. In this
example, it is TalendUser.
7. Select Drop table if exists and create from the Action on table drop-down list, and select Insert
from the Action on data drop-down list.
8. Double-click tAmazonAuroraCommit to open its Basic settings view.
Procedure
1. Double-click tAmazonAuroraInput to open its Basic settings view.
2. Select the Use an existing connection check box and in the Component List that appears, select
the connection component you have configured.
160
tAmazonAuroraInput
3. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
three columns: id of Integer type, and name and city of String type. The data structure is same as
the structure you have defined for tFixedFlowInput.
4. In the Table Name field, enter or browse to the table into which you write the data. In this
example, it is TalendUser.
5. Click the Guess Query button to generate the query. The Query field will be filled with the autom
atically generated query.
6. Double-click tLogRow to open its Basic settings view.
7. In the Mode area, select Table (print values in cells of a table) for better readability of the result.
Procedure
1. Double-click tAmazonAuroraClose to open its Basic settings view.
2. In the Component List, select the connection component you have configured.
161
tAmazonAuroraInput
As shown above, the user information is written into Amazon Aurora, and then the data is retrie
ved from Amazon Aurora and displayed on the console.
162
tAmazonAuroraOutput
tAmazonAuroraOutput
Writes, updates, makes changes or suppresses entries in an Amazon Aurora database.
tAmazonAuroraOutput executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
163
tAmazonAuroraOutput
Table Type in the name of the table to be written. Note that only
one table can be written at a time.
Action on table On the table defined, you can perform one of the following
operations:
• None: No operation is carried out.
• Drop and create table: The table is removed and
created again.
• Create table: The table does not exist and gets
created.
• Create table if not exists: The table is created if it does
not exist.
• Drop table if exists and create: The table is removed if
it already exists and created again.
• Clear table: The table content is deleted.
• Truncate table: The table content is quickly deleted.
However, you will not be able to rollback the
operation.
Action on data On the data of the table defined, you can perform one of the
following operations:
• Insert: Add new entries to the table. If duplicates are
found, the job stops.
• Update: Make changes to existing entries.
• Insert or update: Insert a new record. If the record with
the given reference already exists, an update would be
made.
• Update or insert: Update the record with the given
reference. If the record does not exist, a new record
would be inserted.
• Delete: Remove entries corresponding to the input
flow.
• Replace: Add new entries to the table. If an old row
in the table has the same value as a new row for a
PRIMARY KEY or a UNIQUE index, the old row is
deleted before the new row is inserted.
• Insert or update on duplicate key or unique index: Add
entries if the inserted value does not exist or update
entries if the inserted value already exists and there is
a risk of violating a unique index or primary key.
• Insert Ignore: Add only new rows to prevent duplicate
key errors.
164
tAmazonAuroraOutput
Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You
can do that by clicking Edit schema and selecting the check
box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings
view where you can simultaneously define primary keys for
the update and delete operations. To do that: Select the
Use field options check box and then in the Key in update
column, select the check boxes next to the column name on
which you want to base the update operation. Do the same
in the Key in delete column for the deletion operation.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box disappears when the Use an existing
connection check box is selected.
Data source alias Type in the alias of the data source created on the Talend
Runtime side.
165
tAmazonAuroraOutput
This field appears only when the Specify a data source alias
check box is selected.
Die on error This check box is selected by default. Clear the check box to
skip the row in error and complete the process for error-free
rows. If needed, you can retrieve the rows in error via a Row
> Rejects link.
Advanced settings
Additional JDBC parameters Specify additional connection properties for the database
connection you are creating.
This field disappears when the Use an existing connection
check box in the Basic settings view is selected.
Note:
You can press Ctrl+Space to access a list of predefined
global variables.
Extend Insert Select this check box to carry out a bulk insert of a defined
set of lines instead of inserting lines one by one. The gain in
system performance is considerable.
This check box appears only when the Insert option is
selected from the Action on data list in the Basic settings
view.
Note:
This option is not compatible with the Reject link. You
should therefore clear the check box if you are using a
Row > Rejects link with this component.
Number of rows per insert Enter the number of rows to be inserted per operation. Note
that the higher the value specified, the lower performance
levels shall be due to the increase in memory demands.
This field appears only when the Extend Insert check box is
selected.
Use Batch Select this check box to activate the batch mode for data
processing.
This check box is available only when the Update or Delete
option is selected from the Action on data list in the Basic
settings view.
Additional columns This option allows you to call SQL functions to perform
actions on columns, provided that these are not insert,
update or delete actions, or actions that require pre-
166
tAmazonAuroraOutput
Use field options Select the check box for the corresponding column to
customize a request, particularly if multiple actions are
being carried out on the data.
• Key in update: Select the check box for the c
orresponding column based on which the data is up
dated.
• Key in delete: Select the check box for the c
orresponding column based on which the data is de
leted.
• Updatable: Select the check box if the data in the c
orresponding column can be updated.
• Insertable: Select the check box if the data in the c
orresponding column can be inserted.
Use Hint Options Select this check box to configure the hint(s) which can help
you optimize a query's execution.
Hint Options Click the [+] button under the table to add hint(s) and set
the following parameters for each hint. This table appears
only when the Use Hint Options check box is selected.
• HINT: Specify the hint you need, using the syntax /*+
*/.
• POSITION: Specify where you put the hint in an SQL s
tatement.
• SQL STMT*: Select an SQL statement INSERT, UPDATE,
or DELETE you need to use.
Debug query mode Select this check box to display each step during processing
entries in a database.
Use duplicate key update mode insert Select this check box to activate the ON DUPLICATE KEY
UPDATE mode, and then click the [+] button under the
table displayed to add column(s) to be updated and specify
the update action to be performed on the corresponding
column.
• Column: Enter the name of the column to be updated.
• Value: Enter the action to be performed on the column.
This check box is available only when the Insert option is
selected from the Action on data list in the Basic settings
view.
167
tAmazonAuroraOutput
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
168
tAmazonAuroraOutput
Related scenario
For a related scenario, see Handling data with Amazon Aurora on page 156.
169
tAmazonAuroraRollback
tAmazonAuroraRollback
Rolls back any changes made in the Amazon Aurora database to prevent partial transaction commit if
an error occurs.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection This check box is selected by default and it allows you
to close the database connection once the rollback is
done. Clear this check box to continue to use the selected
connection after the component has performed its task.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
170
tAmazonAuroraRollback
Usage
Usage rule This component is more commonly used with other Amazon
Aurora components, especially with the tAmazonAuroraC
onnection and tAmazonAuroraCommit components.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Related Scenario
No scenario is available for the Standard version of this component yet.
171
tAmazonEMRListInstances
tAmazonEMRListInstances
Lists the details about the instance groups in a cluster on Amazon EMR (Elastic MapReduce).
Basic settings
Access key and Secret key Specify the access keys (the access key ID in the Access
Key field and the secret access key in the Secret Key field)
required to access the Amazon Web Services. For more
information on AWS access keys, see Access keys (access key
ID and secret access key).
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. These credentials can be used on Amazon
EC2 instances, and are delivered through the Amazon
EC2 metadata service. To use this option, your Job must
be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.
Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks
(for example "us-east-1"). For more information about how to
specify the AWS region, see Choose an AWS Region.
Filter master and core instances Select this check box to ignore the master and core instance
groups and list only the task instance groups.
Cluster id Enter the ID of the cluster for which you want to list the
instance groups.
Advanced settings
STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,
172
tAmazonEMRListInstances
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
No scenario is available for the Standard version of this component yet.
173
tAmazonEMRManage
tAmazonEMRManage
Launches or terminates a cluster on Amazon EMR (Elastic MapReduce).
Basic settings
Access key and Secret key Specify the access keys (the access key ID in the Access
Key field and the secret access key in the Secret Key field)
required to access the Amazon Web Services. For more
information on AWS access keys, see Access keys (access key
ID and secret access key).
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. The credentials can be used on Amazon EC2
instances or AWS ECS, and are delivered through the
Amazon EC2 metadata service. To use this option, your Job
must be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.
Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks
(for example "us-east-1"). For more information about how to
specify the AWS region, see Choose an AWS Region.
174
tAmazonEMRManage
Service role Enter the IAM (Identity and Access Management) role for the
Amazon EMR service. The default role is EMR_DefaultRole.
To use this default role, you must have already created it.
Job flow role Enter the IAM role for the EC2 instances that Amazon EMR
manages. The default role is EMR_EC2_DefaultRole. To use
this default role, you must have already created it.
Enable log Select this check box to enable logging and in the field
displayed specify the path to a folder in an S3 bucket where
you want Amazon EMR to write the log data.
Use EC2 key pair Select this check box to associate an Amazon EC2 (Elastic
Compute Cloud) key pair with the cluster and in the field
displayed enter the name of your EC2 key pair.
Master instance type Select the type of the master instance to initialize.
Slave instance type Select the type of the slave instance to initialize.
Advanced settings
STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,
sts.amazonaws.com, where session credentials are
retrieved from.
This check box is available only when the Assume role
check box is selected.
175
tAmazonEMRManage
Wait for cluster ready Select this check box to let your Job wait until the launch of
the cluster is completed.
Visible to all users Select this check box to make the cluster visible to all IAM
users.
Enable debug Select this check box to enable the debug mode.
Customize Version and Application Select this check box to customize the version of the cluster
and the applications to be installed on the cluster.
• Cluster version: enter the version of the cluster.
• Applications: click the [+] button below the table
to add as many rows as needed, each row for an
application, and specify the application by clicking
the right side of the cell and selecting the application
from the drop-down list displayed, or just entering the
application name in the cell if it is not in the list.
Availability Zone Specify the availability zone for your cluster's EC2 instances.
Master security group Specify the security group for the master instance.
Additional master security groups Specify additional security groups for the master instance
and separate them with a comma, for example, gname1,
gname2, gname3.
Slave security group Specify the security group for the slave instances.
Additional slave security groups Specify additional security groups for the slave instances
and separate them with a comma, for example, gname1,
gname2, gname3.
Service Access Security Group Specify the identifier of the Amazon EC2 security group for
the Amazon EMR service to access clusters in VPC private
subnet.
For how to create a private subnet to enable service access
security group on Amazon EMR, see Scenario 2: VPC with
Public and Private Subnets (NAT).
176
tAmazonEMRManage
Keep alive after steps complete Select this check box to keep the job flow alive after
completing all steps.
Wait for steps to complete Select this check box to let your Job wait until the job flow
steps are completed.
This check box is available only when the Wait for cluster
ready check box is selected.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
177
tAmazonEMRManage
Usage
Procedure
1. Create a new Job and add a tAmazonEMRManage component, a tAmazonEMRResize component, a
tAmazonEMRListInstances component, and a tJava component by typing their names in the design
workspace or dropping them from the Palette.
2. Link the tAmazonEMRManage component to the tAmazonEMRResize component using a Trigger >
OnSubjobOk connection.
3. Link the tAmazonEMRResize component to the tAmazonEMRListInstances component using a
Trigger > OnSubjobOk connection.
4. Link the tAmazonEMRListInstances component to the tJava component using a Row > Iterate
connection.
Procedure
1. Double-click the tAmazonEMRManage component to open its Basic settings view.
178
tAmazonEMRManage
2. In the Access Key and Secret Key fields, enter the authentication credentials required to access
Amazon S3.
3. From the Action list, select Start to start a cluster.
4. Select the AWS region from the Region drop-down list. In this example, it is Asia Pacific (Tokyo).
5. In the Cluster name field, enter the name of the cluster to be started. In this example, it is talend-
doc-emr-cluster.
6. From the Cluster version and Application drop-down list, select the version of the cluster and the
application to be installed on the cluster.
7. Select the Enable log check box and in the field displayed, specify the path to a folder in an S3
bucket where you want Amazon EMR to write the log data. In this example, it is s3://talend-doc-
emr-bucket.
Resizing the Amazon EMR cluster by adding a new task instance group
Configure the tAmazonEMRResize component to resize a running Amazon EMR cluster by adding a
new task instance group.
Procedure
1. Double-click the tAmazonEMRResize component to open its Basic settings view.
179
tAmazonEMRManage
2. In the Access Key and Secret Key fields, enter the authentication credentials required to access
Amazon S3.
3. From the Action drop-down list, select Add task instance group to resize the cluster by adding a
new task instance group.
4. In the Cluster id field, enter the ID of the cluster to be resized. In this example, the returned value
of the global variable CLUSTER_FINAL_ID of the previous tAmazonEMRManage component is used.
Note that you can retrieve the global variable by pressing Ctrl + Space and selecting the relevant
global variable from the list.
5. In the Group name field, enter the name of the task instance group to be added in the cluster. In
this example, it is talend-doc-instance-group.
6. In the Instance count field, specify the number of the instances to be created.
7. From the Task instance type drop-down list, select the type of the instances to be created.
Procedure
1. Double-click the tAmazonEMRListInstances component to open its Basic settings view.
2. In the Access Key and Secret Key fields, enter the authentication credentials required to access
Amazon S3.
3. Select the AWS region from the Region drop-down list. In this example, it is Asia Pacific (Tokyo).
4. Clear the Filter master and core instances check box to list all instance groups, including the
Master, Core, and Task type instance groups.
5. In the Cluster id field, enter the ID of the cluster for which to list the instance groups. In
this example, the returned value of the global variable CLUSTER_FINAL_ID of the previous
tAmazonEMRManage component is used.
6. Double-click the tJava component to open its Basic settings view.
180
tAmazonEMRManage
7. In the Code field, enter the following code to print the ID and Name information of each instance
group in the cluster.
Procedure
1. Press Ctrl + S to save the Job and then F6 to execute the Job.
As shown above, the Job starts and resizes the Amazon EMR cluster, and then lists all instance
groups in the cluster.
2. View the cluster details on the Amazon EMR Cluster List page to validate the Job execution result.
181
tAmazonEMRResize
tAmazonEMRResize
Adds or resizes a task instance group in a cluster on Amazon EMR (Elastic MapReduce).
Basic settings
Access key and Secret key Specify the access keys (the access key ID in the Access
Key field and the secret access key in the Secret Key field)
required to access the Amazon Web Services. For more
information on AWS access keys, see Access keys (access key
ID and secret access key).
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. These credentials can be used on Amazon
EC2 instances, and are delivered through the Amazon
EC2 metadata service. To use this option, your Job must
be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.
Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks
(for example "us-east-1"). For more information about how to
specify the AWS region, see Choose an AWS Region.
Group name Enter the name of the task instance group to be added.
This field is available only when Add task instance group is
selected from the Action drop-down list.
182
tAmazonEMRResize
Instance count Enter the number of instances for the task instance group.
Task instance type Select an instance type for all instances in the task instance
group to be added from the drop-down list.
This list is available only when Add task instance group is
selected from the Action drop-down list.
Request spot Select this check box to launch Spot instances, and in the
Bid price($) field displayed, enter the maximum hourly rate
(in dollars) you are willing to pay per instance.
This check box is available only when Add task instance
group is selected from the Action drop-down list.
Advanced settings
STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,
sts.amazonaws.com, where session credentials are
retrieved from.
This check box is available only when the Assume role
check box is selected.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
183
tAmazonEMRResize
Related scenario
No scenario is available for the Standard version of this component yet.
184
tAmazonMysqlClose
tAmazonMysqlClose
Closes the transaction committed in the connected DB.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
185
tAmazonMysqlClose
Related scenarios
No scenario is available for the Standard version of this component yet.
186
tAmazonMysqlCommit
tAmazonMysqlCommit
Commits in one go a global transaction instead of doing that on every row or every batch, and
provides gain in performance, using a unique connection.
tAmazonMysqlCommit validates the data processed through the Job into the connected database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.
Warning:
If you want to use a Row > Main connection to link
tAmazonMysqlCommit to your Job, your data will be
committed row by row. In this case, do not select the Close
connection check box or your connection will be closed
before the end of your first row commit.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
187
tAmazonMysqlCommit
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Related scenario
For tAmazonMysqlCommit related scenario, see Inserting data in mother/daughter tables on page
2426.
188
tAmazonMysqlConnection
tAmazonMysqlConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tAmazonMysqlConnection opens a connection to the database for a current transaction.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
189
tAmazonMysqlConnection
Advanced settings
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
190
tAmazonMysqlConnection
Related scenario
For a related scenario using this component, see Inserting data in mother/daughter tables on page
2426
191
tAmazonMysqlInput
tAmazonMysqlInput
Reads a database and extracts fields based on a query.
tAmazonMysqlInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Row > Main link.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
192
tAmazonMysqlInput
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
Advanced settings
Note:
When you need to handle data of the time-stamp type
0000-00-00 00:00:00 using this component, set the
parameter as:
noDatetimeStringSync=true&zeroDa-
teTimeBehavior=convertToNull.
Enable stream Select this check box to enables streaming over buffering
which allows the code to read from a large table without
consuming a large amount of memory in order to optimize
the performance.
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
193
tAmazonMysqlInput
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Usage rule This component covers all possible SQL queries for Mysql
databases.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Related scenarios
For related scenarios, see tMysqlInput on page 2437.
194
tAmazonMysqlOutput
tAmazonMysqlOutput
Writes, updates, makes changes or suppresses entries in a database.
tAmazonMysqlOutput executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
195
tAmazonMysqlOutput
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create table: The table is removed and created
again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear table: The table content is deleted.
Truncate table: The table content is quickly deleted.
However, you will not be able to rollback the operation.
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
the job stops.
Update: Make changes to existing entries.
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.
Replace: Add new entries to the table. If an old row in the
table has the same value as a new row for a PRIMARY KEY
or a UNIQUE index, the old row is deleted before the new
row is inserted.
Insert or update on duplicate key or unique index: Add
entries if the inserted value does not exist or update entries
if the inserted value already exists and there is a risk of
violating a unique index or primary key.
Insert Ignore: Add only new rows to prevent duplicate key
errors.
196
tAmazonMysqlOutput
Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You can
do that by clicking Edit Schema and selecting the check
box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings
view where you can simultaneously define primary keys for
the update and delete operations. To do that: Select the
Use field options check box and then in the Key in update
column, select the check boxes next to the column name on
which you want to base the update operation. Do the same
in the Key in delete column for the deletion operation.
Schema and Edit schema A schema is a row description, i.e. it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.
Built-In: You create and store the schema locally for this
component only.
Die on error This check box is selected by default. Clear the check box to
skip the row in error and complete the process for error-free
rows. If needed, you can retrieve the rows in error via a Row
> Rejects link.
Advanced settings
Note:
You can press Ctrl+Space to access a list of predefined
global variables.
Extend Insert Select this check box to carry out a bulk insert of a defined
set of lines instead of inserting lines one by one. The gain in
system performance is considerable.
197
tAmazonMysqlOutput
Note:
This option is not compatible with the Reject link. You
should therefore clear the check box if you are using a
Row > Rejects link with this component.
Warning:
If you are using this component with tMysqlLastInsertID, en
sure that the Extend Insert check box in Advanced Settings
is not selected. Extend Insert allows for batch loading,
however, if the check box is selected, only the ID of the last
line of the last batch will be returned.
Use Batch Select this check box to activate the batch mode for data
processing.
Note:
This check box is available only when you have selected,
the Update or the Delete option in the Action on data
field.
Additional Columns This option is not available if you have just created the DB
table (even if you delete it beforehand). This option allows
you to call SQL functions to perform actions on columns,
provided that these are not insert, update or delete actions,
or actions that require pre-processing.
Use field options Select this check box to customize a request, particularly if
multiple actions are being carried out on the data.
Use Hint Options Select this check box to activate the hint configuration area
which helps you optimize a query's execution. In this area,
parameters are:
198
tAmazonMysqlOutput
/*+ */.
Debug query mode Select this check box to display each step during processing
entries in a database.
Use duplicate key update mode insert Updates the values of the columns specified, in the event of
duplicate primary keys.:
Column: Between double quotation marks, enter the name
of the column to be updated.
Value: Enter the action you want to carry out on the column.
Note:
To use this option you must first of all select the Insert
mode in the Action on data list found in the Basic
Settings view.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
199
tAmazonMysqlOutput
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a MySQL database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tAmazonMysqlOutput in use, see .
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Related scenarios
For related scenarios, see tMysqlSCD on page 2508.
200
tAmazonMysqlRollback
tAmazonMysqlRollback
Cancels the transaction commit in the connected database and avoids to commit part of a transaction
involuntarily.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
201
tAmazonMysqlRollback
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Related scenario
For a related scenario, see Rollback from inserting data in mother/daughter tables on page 2429.
202
tAmazonMysqlRow
tAmazonMysqlRow
Executes the SQL query stated onto the specified database.
Depending on the nature of the query and the database, tAmazonMysqlRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write easily
your SQL statements. tAmazonMysqlRow is the specific component for this database query. The row
suffix means the component implements a flow in the job design although it does not provide output.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
203
tAmazonMysqlRow
Schema and Edit Schema A schema is a row description, that is to say, it defines the
number of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Built-In: You create and store the schema locally for this
component only.
Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
204
tAmazonMysqlRow
Propagate QUERY's recordset Select this check box to insert the result of the query in a
COLUMN of the current flow. Select this column from the
use column list.
Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component is
usually followed by tParseRecordSet.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
205
tAmazonMysqlRow
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Related scenario
For a related scenario, see:
• Combining two flows for selective output on page 2503
206
tAmazonOracleClose
tAmazonOracleClose
Closes the transaction committed in the connected database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
207
tAmazonOracleClose
Related scenario
This component is to be used with tAmazonOracleConnection and tAmazonOracleRollback
components. It is generally used with a tAmazonOracleConnection to close a connection for the
ongoing transaction.
For a related scenario, see tMysqlConnection on page 2425.
208
tAmazonOracleCommit
tAmazonOracleCommit
Commits in one go a global transaction instead of doing that on every row or every batch, and
provides gain in performance, using a unique connection.
tAmazonOracleCommit validates the data processed through the Job into the connected database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.
Warning:
If you want to use a Row > Main connection to link
tAmazonOracleCommit to your Job, your data will be
committed row by row. In this case, do not select the Close
connection check box or your connection will be closed
before the end of your first row commit.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
209
tAmazonOracleCommit
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Related scenario
For tAmazonOracleCommit related scenario, see Inserting data in mother/daughter tables on page
2426
210
tAmazonOracleConnection
tAmazonOracleConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tAmazonOracleConnection opens a connection to the database for a current transaction.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use tns file Select this check box to use the metadata of a context
included in a tns file.
Note:
One tns file may have many contexts.
TNS File: Enter the path to the tns file manually or browse
to the file by clicking the three-dot button next to the filed.
Select a DB Connection in Tns File: Click the three-dot
button to display all the contexts held in the tns file and
select the desired one.
211
tAmazonOracleConnection
Note:
You can set the encoding parameters through this field.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.
Advanced settings
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.
Global Variables
212
tAmazonOracleConnection
Usage
Related scenario
For tAmazonOracleConnection related scenario, see tMysqlConnection on page 2425
213
tAmazonOracleInput
tAmazonOracleInput
Reads a database and extracts fields based on a query.
tAmazonOracleInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Row > Main link.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
214
tAmazonOracleInput
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Schema and Edit Schema A schema is a row description, i.e. it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Use cursor When selected, helps to decide the row set to work with at a
time and thus optimize performance.
215
tAmazonOracleInput
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
Global Variables
Usage
Usage rule This component covers all possible SQL queries for Oracle
databases.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
216
tAmazonOracleInput
Related scenarios
For related scenarios, see:
• Reading data from different MySQL databases using dynamically loaded connection parameters
on page 497.
217
tAmazonOracleOutput
tAmazonOracleOutput
Writes, updates, makes changes or suppresses entries in a database.
tAmazonOracleOutput executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
218
tAmazonOracleOutput
Table Name of the table to be written. Note that only one table
can be written at a time.
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Warning:
If you select the Use an existing connection check box
and select an option other than None from the Action
on table list, a commit statement will be generated
automatically before the data update/insert/delete
operation.
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.
219
tAmazonOracleOutput
Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit schema A schema is a row description, i.e. it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.
Built-In: You create and store the schema locally for this
component only.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Use alternate schema Select this option to use a schema other than the one
specified by the component that establishes the database
connection (that is, the component selected from the
Component list drop-down list in Basic settings view).
After selecting this option, provide the name of the desired
schema in the Schema field.
This option is available when Use an existing connection is
selected in Basic settings view.
220
tAmazonOracleOutput
Note:
You can press Ctrl+Space to access a list of predefined
global variables.
Override any existing NLS_LANG environment variable Select this check box to override variables already set for a
NLS language environment.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
Use field options Select this check box to customize a request, especially
when there is double action on data.
Use Hint Options Select this check box to activate the hint configuration area
which helps you optimize a query's execution. In this area,
parameters are:
- HINT: specify the hint you need, using the syntax
/*+ */.
Convert columns and table to uppercase Select this check box to set the names of columns and table
in upper case.
Debug query mode Select this check box to display each step during processing
entries in a database.
Use Batch Select this check box to activate the batch mode for data
processing.
221
tAmazonOracleOutput
This field appears only when the Use batch mode check box
is selected.
Support null in "SQL WHERE" statement Select this check box to validate null in "SQL WHERE"
statement.
Global Variables
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a Oracle database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For such an example, see Retrieving data in error with a
Reject link on page 2474.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
222
tAmazonOracleOutput
Related scenarios
For tAmazonOracleOutput related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.
223
tAmazonOracleRollback
tAmazonOracleRollback
Cancels the transaction commit in the connected database and avoids to commit part of a transaction
involuntarily.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
224
tAmazonOracleRollback
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Related scenario
For tAmazonOracleRollback related scenario, see tMysqlRollback on page 2491.
225
tAmazonOracleRow
tAmazonOracleRow
Executes the SQL query stated onto the specified database.
Depending on the nature of the query and the database, tAmazonOracleRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write easily
your SQL statements. tAmazonOracleRow is the specific component for this database query. The row
suffix means the component implements a flow in the job design although it does not provide output.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
226
tAmazonOracleRow
Schema and Edit Schema A schema is a row description, i.e. it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.
Use NB_LINE_ This option allows you feed the variable with the number
of rows inserted/updated/deleted to the next component or
subJob. This field only applies if the query entered in Query
field is a INSERT, UPDATE or DELETE query.
• NONE: does not feed the variable.
• INSERTED: feeds the variable with the number of rows
inserted.
• UPDATED: feeds the variable with the number of rows
updated.
• DELETED: feeds the variable with the number of rows
deleted.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
227
tAmazonOracleRow
Advanced settings
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
228
tAmazonOracleRow
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Related scenarios
For related topics, see:
• Combining two flows for selective output on page 2503
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.
229
tAmazonRedshiftManage
tAmazonRedshiftManage
Manages Amazon Redshift clusters and snapshots.
tAmazonRedshiftManage manages the work of creating a new Amazon Redshift cluster, creating a
snapshot of an Amazon Redshift cluster, resizing an existing Amazon Redshift cluster, and deleting an
existing cluster or snapshot.
Basic settings
Access Key and Secret Key Specify the access keys (the access key ID in the Access
Key field and the secret access key in the Secret Key field)
required to access the Amazon Web Services. For more
information on AWS access keys, see Access keys (access key
ID and secret access key).
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. These credentials can be used on Amazon
EC2 instances, and are delivered through the Amazon
EC2 metadata service. To use this option, your Job must
be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.
Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks
230
tAmazonRedshiftManage
Create snapshot Select this check box to create a final snapshot of the
Amazon Redshift cluster before it is deleted.
This check box is available only when Delete cluster is
selected from the Action list.
Database Enter the name of the first database to be created when the
cluster is created.
This field is available only when Create cluster is selected
from the Action list.
Master username and Master password The user name and the password associated with the master
user account for the cluster to be created.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
The two fields are available only when Create cluster is
selected from the Action list.
Advanced settings
STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,
sts.amazonaws.com, where session credentials are
retrieved from.
231
tAmazonRedshiftManage
Wait for cluster ready Select this check box to let your Job wait until the launch of
the cluster is completed.
This check box is available when Create cluster or Restore
from snapshot is selected from the Action list.
Original cluster id of snapshot Enter the name of the cluster the source snapshot was
created from.
This field is available when Restore from snapshot or Delete
snapshot is selected from the Action list.
Parameter group name Enter the name of the parameter group to be associated
with the cluster.
This field is available when Create cluster or Restore from
snapshot is selected from the Action list.
Subnet group name Enter the name of the subnet group where you want the
cluster to be restored.
This field is available when Create cluster or Restore from
snapshot is selected from the Action list.
Publicly accessible Select this check box so that the cluster can be accessed
from a public network.
This check box is available when Create cluster or Restore
from snapshot is selected from the Action list.
Set public ip address Select this check box and in the field displayed enter the
Elastic IP (EIP) address for the cluster.
This check box is available only when the Publicly
accessible check box is selected.
Availability zone Enter the EC2 Availability Zone in which you want Amazon
Redshift to provision the cluster.
This field is available when Create cluster or Restore from
snapshot is selected from the Action list.
VPC security group ids Enter Virtual Private Cloud (VPC) security groups to be
associated with the cluster and separate them with a
comma, for example, gname1, gname2, gname3.
This field is available when Create cluster or Restore from
snapshot is selected from the Action list.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
232
tAmazonRedshiftManage
Usage
Related scenario
No scenario is available for the Standard version of this component yet.
233
tApacheLogInput
tApacheLogInput
Reads the access-log file for an Apache HTTP server.
To effectively manage the Apache HTTP Server, it is necessary to get feedback about the activity and
performance of the server as well as any problems that may be occurring.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
In the context of tApacheLogInput usage, the schema is rea
d-only.
Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.
Advanced settings
Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.
234
tApacheLogInput
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
Usage
Procedure
Procedure
1. Drop a tApacheLogInput component and a tLogRow component from the Palette onto the design
workspace.
2. Right-click on the tApacheLogInput component and connect it to the tLogRow component using
a Main Row link.
235
tApacheLogInput
4. Click the Component tab to define the basic settings for tApacheLogInput.
5. If desired, click the Edit schema button to see the read-only columns.
6. In the File Name field, enter the file path or browse to the access-log file you want to read.
7. In the design workspace, select tLogRow and click the Component tab to define its basic settings.
For more information, see tLogRow on page 1977
8. Press F6 to execute the Job.
Results
The log lines of the defined file are displayed on the console.
236
tAS400Close
tAS400Close
Closes the transaction committed in the connected database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
237
tAS400Close
Related scenario
No scenario is available for the Standard version of this component yet.
238
tAS400Commit
tAS400Commit
Commits in one go a global transaction instead of doing that on every row or every batch, and
provides gain in performance, using a unique connection.
tAS400Commit validates the data processed through the Job into the connected database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.
Warning:
If you want to use a Row > Main connection to link
tAS400Commit to your Job, your data will be committed
row by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component is more commonly used with other tAS400*
components, especially with the tAS400Connection and
tAS400Rollback components.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
239
tAS400Commit
Related scenario
For a similar scenario using other database, see Inserting data in mother/daughter tables on page
2426.
240
tAS400Connection
tAS400Connection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tAS400Connection opens a connection to the database for a current transaction.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
241
tAS400Connection
Advanced settings
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.
Usage
Usage rule This component is more commonly used with other tAS400*
components, especially with the tAS400Commit and
tAS400Rollback components.
Related scenario
For similar scenarios using other database, see tMysqlConnection on page 2425.
242
tAS400Input
tAS400Input
Reads a database and extracts fields based on a query.
tAS400Input executes a DB query with a strictly defined order which must correspond to the schema
definition. Then it passes on the field list to the next component via a Row > Main link.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
243
tAS400Input
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
Advanced settings
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
244
tAS400Input
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
245
tAS400Input
Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.
2. Click the [...] button next to Edit schema and in the Schema dialog box define the schema by
adding three columns: id of Integer type, and name and city of String type.
246
tAS400Input
Click OK to close the Schema dialog box and accept the propagation prompted by the pop-up
dialog box.
3. In the Mode area, select Use Inline Content (delimited file) and enter the following user
information in the Content field.
1;George;Bismarck
2;Abraham;Boise
3;Taylor;Nashville
4;William;Jefferson City
5;Alexander;Jackson
6;James;Boise
7;Gerald;Little Rock
8;Tony;Richmond
9;Thomas;Springfield
10;Andre;Nashville
5. In the Host, Database, Username and Password fields, enter the information required for the
connection to AS/400.
6. In the Table field, specify the table into which you want to write the data. In this example, it is
doct1018.
7. Select Drop table if exists and create from the Action on table drop-down list, and select Insert
from the Action on data drop-down list.
247
tAS400Input
Procedure
1. Double-click tAS400Input to open its Basic settings view.
2. In the Host, Database, Username and Password fields, enter the information required for the
connection to AS/400.
3. Click the [...] button next to Edit schema and in the Schema dialog box define the schema by
adding three columns: id of Integer type, and name and city of String type. The data structure is
same as the structure you have defined for tFixedFlowInput.
4. In the Table Name field, enter or browse to the table into which you write the data. In this
example, it is doct1018.
5. In the Query field, enter the SQL query sentence to be used to retrieve the user data from AS/400.
In this example, it is SELECT * FROM doct1018.
6. Double-click tLogRow to open its Basic settings view.
7. In the Mode area, select Table (print values in cells of a table) for better readability of the result.
248
tAS400Input
As shown above, the user information is written into AS/400, and then the data is retrieved from
AS/400 and displayed on the console.
Related scenarios
For similar scenarios using other databases, see:
Related topic in tContextLoad, see Reading data from different MySQL databases using dynamically
loaded connection parameters on page 497.
249
tAS400LastInsertId
tAS400LastInsertId
Obtains the primary key value of the record that was last inserted in an AS/400 table.
tAS400LastInsertId fetches the last inserted ID from a selected AS/400 Connection.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.
250
tAS400LastInsertId
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
Related scenario
For a similar scenario using other database, see Getting the ID for the last inserted record with
tMysqlLastInsertId on page 2455.
251
tAS400Output
tAS400Output
Writes, updates, makes changes or suppresses entries in a database.
tAS400Output executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
252
tAS400Output
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Action on data
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
253
tAS400Output
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Use commit control Select this check box to have access to the Commit every
field where you can define the commit operation.
Note:
You can press Ctrl+Space to access a list of predefined
global variables.
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
254
tAS400Output
Use field options Select this check box to customize a request, especially
when there is double action on data.
Debug query mode Select this check box to display each step during processing
entries in a database.
Use Batch Select this check box to activate the batch mode for data
processing.
Note:
This check box is available only when you have selected
the Insert, Update or Delete option in the Action on data
field.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a AS/400 database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
255
tAS400Output
Related scenarios
For related scenario, see Handling data with AS/400 on page 245.
For similar scenarios using other databases, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.
256
tAS400Rollback
tAS400Rollback
Cancels the transaction commit in the connected database and avoids to commit part of a transaction
involuntarily.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component is more commonly used with other tAS400*
components, especially with the tAS400Connection and
tAS400Commit components.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
257
tAS400Rollback
Related scenarios
For a similar scenario using other database, see Rollback from inserting data in mother/daughter
tables on page 2429.
258
tAS400Row
tAS400Row
Executes the SQL query stated onto the specified database.
Depending on the nature of the query and the database, tAS400Row acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL
statements. tAS400Row is the specific component for this database query. The row suffix means the
component implements a flow in the job design although it does not provide output.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
259
tAS400Row
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via
a Row > Rejects link.
260
tAS400Row
Advanced settings
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component is
usually followed by tParseRecordSet.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
261
tAS400Row
Related scenarios
For similar scenarios using other databases, see:
• Combining two flows for selective output on page 2503.
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.
262
tAssert
tAssert
Generates the boolean evaluation on the concern for the Job execution status and provides the Job
status messages to tAssertCatcher.
The status includes:
• Ok: the Job execution succeeds.
• Fail: the Job execution fails.
The tested Job's result does not match the expectation or an execution error occurred at runtime.
The tAssert component works alongside tAssertCatcher to evaluate the status of a Job execution. It
concludes with the boolean result based on an assertive statement related to the execution and feed
the result to tAssertCatcher for proper Job status presentation.
Basic settings
Expression Type in the assertive statement you base the evaluation on.
Global Variables
Usage
Usage rule This component follows the action the assertive condition
is directly related to. It can be the intermediate or end
component of the main Job, or the start, intermediate or end
component of the secondary Job.
263
tAssert
264
tAssert
Note that the orders listed are just for illustration of how tAssert functions and the number here is
less than 20.
2. Click the Edit schema button to open the schema editor.
265
tAssert
3. Click the [+] button to add four columns, namely product_id, product_name, date and price, of the
String, Date, Float types respectively.
Click OK to validate the setup and close the editor.
4. Double-click tMysqlOutput to display the Basic settings view.
5. In the Host, Port, Database, Username and Password fields, enter the connection details and the
authentication credentials.
6. In the Table field, enter the name of the table, for example order.
7. In the Action on table list, select the option Drop table if exists and create.
8. In the Action on data list, select the option Insert.
9. Double-click tAssert to display the Basic settings view.
266
tAssert
10. In the description field, enter the descriptive information for the purpose of tAssert in this case.
11. In the expression field, enter the expression allowing you to compare the data to a fixed number:
((Integer)globalMap.get("tMysqlOutput_1_NB_LINE_INSERTED"))>=20
13. In the Mode area, select Table (print values in cells of a table) for a better display.
As shown above, the orders status indicates Failed as the number of orders is less than 20.
267
tAssert
up in its settings. For more detailed information on tFileCompare, see tFileCompare on page
984.
• tAssertCatcher. It captures the evaluation generated by tAssert. For more information on
tAssertCatcher, see tAssertCatcher on page 273.
• tLogRow. It allows you to read the captured evaluation. For more information on tLogRow, see
tLogRow on page 1977.
First proceed as follows to design the main Job:
• Prepare a delimited .csv file as the source file read by your main Job.
• Edit two rows in the delimited file. The contents you edit are not important, so feel free to
simplify them.
• Name it source.csv.
• In Talend Studio , create a new job JobAssertion.
• Place tFileInputDelimited and tFileOutputDelimited on the workspace.
• Connect them with a Row Main link to create the main Job.
• Still in the Component view, set Property Type to Built-In and click next to Edit schema to
define the data to pass on to tFileOutputDelimited. In the scenario, define the data presented in
source.csv you created.
For more information about schema types, see Talend Studio User Guide.
• Define the other parameters in the corresponding fields according to source.csv you created.
• Double-click tFileOutputDelimited to open its Component view.
• In the File Name field of the Component view, fill in or browse to specify the path to the output
file, leaving the other fields as they are by default.
268
tAssert
• Press F6 to execute the main Job. It reads source.csv, pass the data to tFileOutputDelimited and
output an delimited file, out.csv.
Then contine to edit the Job to see how tAssert evaluates the execution status of the main Job.
• Rename out.csv as reference.csv.This file is used as the expected result the main Job should output.
• Place tFileCompare, tAssert and tLogRow on the workspace.
• Connect them with Row Main link.
• Connect tFileInputDelimited to tFileCompare with OnSubjobOk link.
269
tAssert
For more information on the tFileCompare component, see tFileCompare on page 984.
• Then click tAssert and click the Component tab on the lower side of the workspace.
• In the Component view, edit the assertion row2.differ==0 in the expression field and the
descriptive message of the assertion in description field.
In the expression field, row2 is the data flow transmissing from tFileCompare to tAssert, differ
is one of the columns of the tFileCompare schema and presents whether the compared files
are identical, and 0 means no difference is detected between the out.csv and reference.csv by
tFileCompare. Hence when the compared files are identical, the assertive condition is thus fulfilled,
tAssert concludes that the main Job succeeds; otherwise, it concludes failure.
Note:
The differ column is in the read-only tFileCompare schema. For more information on its schema, see
tFileCompare on page 984.
The console shows the comparison result of tFileCompare: Files are identical. But you find
nowhere the evaluation result of tAssert.
So you need tAssertCatcher to capture the evaluation.
• Place tAssertCatcher and tLogRow on the workspace.
• Connect them with Row Main link.
270
tAssert
2010-01-29 15:37:33|fAvAzH|TASSERT|JobAssertion|java|tAssert_1|Ok|--|
The output file should be identical with the reference file
271
tAssert
Then you will perform operations to make the main Job fail to generate the expected file. To do so,
proceed as follows in the same Job you have executed:
• Delete a row in reference.csv.
• Press F6 to execute the Job again.
• Check the result presented in Run view.
2010-02-01 19:47:43|GeHJNO|TASSERT|JobAssertion|tAssert_1|Failed|Test
logically failed|The output file should be identical with the reference
file
The console shows that the execution status of the main Job is Failed. The detailed explanation for
this status is closely behind it, reading Test logically failed.
You can thus get a basic idea about your present Job status: it fails to generate the expected file
because of a logical failure. This logical failure could come from a logical mistake during the Job
design.
The status and its explanatory information are presented respectively in the status and the substatus
columns of the tAssertCatcher schema. For more information on the columns, see tAssertCatcher on
page 273.
272
tAssertCatcher
tAssertCatcher
Generates a data flow consolidating the status information of a job execution and transfer the data
into defined output files.
Based on its pre-defined schema, tAssertCatcher fetches the execution status information from
repository, Job execution and tAssert.
Basic settings
Schema and Edit schema A schema is a row description, it defines the fields to be
processed and passed on to the next component. In this
particular case, the schema is read-only, as this component
gathers standard log information including:
273
tAssertCatcher
Catch Java Exception This check box allows to capture Java exception errors and
show the message in the Description column (Get original
exception not selected) or in the Exception column (Get
original exception selected) column, once checked.
Get original exception This check box allows to show the original exception object
in the Exception column, once checked.
Available when Catch Java Exception is selected.
Catch tAssert This check box allows to capture the evaluations of tAssert.
Global Variables
Usage
Related scenarios
For using case in relation with tAssertCatcher, see tAssert scenario:
• Setting up the assertive condition for a Job execution on page 267
274
tAzureAdlsGen2Input
tAzureAdlsGen2Input
Retrieves data from an ADLS Gen2 file system of an Azure storage account and passes the data to the
subsequent component connected to it through a Main>Row link.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.
275
tAzureAdlsGen2Input
Guess schema Click this button to retrieve the schema from the data object
specified.
Authentication method Select one of the following authentication method from the
drop-down list.
• Shared key, which requires an account access key. See
Manage a storage account for related information.
• Shared access signature, which requires a shared
access signature. See Constructing the Account SAS
URI for related information.
Account name Enter the name of the Data Lake Storage account you need
to access. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
account.
Shared key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access. To know
how to get your key, read Manage a storage account.
This field is available if you select Shared key from
Authentication method drop-down list.
SAS token Enter your account SAS token. You can get the SAS token
for each allowed service on the Microsoft Azure portal
after generating SAS. The SAS token format is https://
<$storagename><$service>.core.windows.net/
<$sastoken>, where <$storagename> is the storage
account name, <$service> is the allowed service name
(blob, file, queue or table), and <$sastoken> is the SAS
token value. For more information, read Constructing the
Account SAS URI.
This field is available if you select Shared access signature
from Authentication method drop-down list.
Format Set the format for the incoming data. Currently, the
following formats are supported: CSV, AVRO, JSON, and
Parquet.
Field Delimiter Set the field delimiter. You can select Semicolon, Comma,
Tabulation, and Space from the drop-down list; you can
also select Other and enter your own in the Custom field
delimiter field.
276
tAzureAdlsGen2Input
Record Separator Set the record separator. You can select LF, CR, and CRLF
from the drop-down list; you can also select Other and enter
your own in the Custom Record Separator field.
Header Select this check box to insert a header row to the data
retrieved.
Note:
• Select this option if the data to be retrieved has a
header row. In this case, you need also to make sure
that the column names in the schema are consistent
with the column headers of the data.
• Clear this option if the data to be retrieved does not
have a header row. In this case, you need to name
the columns in the schema as field0, field1,
field2, and so on.
File Encoding Select the file encoding from the drop-down list.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
For a related scenario, see Accessing Azure ADLS Gen2 storage on page 280.
277
tAzureAdlsGen2Output
tAzureAdlsGen2Output
Uploads incoming data to an ADLS Gen2 file system of an Azure storage account in the specified
format.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.
278
tAzureAdlsGen2Output
Sync colnmns Click this button to retrieve the schema from the previous
component connected in the Job.
Authentication method Select one of the following authentication method from the
drop-down list.
• Shared key, which requires an account access key. See
Manage a storage account for related information.
• Shared access signature, which requires a shared
access signature. See Constructing the Account SAS
URI for related information.
Account name Enter the name of the Data Lake Storage account you need
to access. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
account.
Shared key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access. To know
how to get your key, read Manage a storage account.
This field is available if you select Shared key from
Authentication method drop-down list.
SAS token Enter your account SAS token. You can get the SAS token
for each allowed service on the Microsoft Azure portal
after generating SAS. The SAS token format is https://
<$storagename><$service>.core.windows.net/
<$sastoken>, where <$storagename> is the storage
account name, <$service> is the allowed service name
(blob, file, queue or table), and <$sastoken> is the SAS
token value. For more information, read Constructing the
Account SAS URI.
This field is available if you select Shared access signature
from Authentication method drop-down list.
Format Set the format for the incoming data. Currently, the
following formats are supported: CSV, AVRO, JSON, and
Parquet.
Field Delimiter Set the field delimiter. You can select Semicolon, Comma,
Tabulation, and Space from the drop-down list; you can
also select Other and enter your own in the Custom field
delimiter field.
279
tAzureAdlsGen2Output
Record Separator Set the record separator. You can select LF, CR, and CRLF
from the drop-down list; you can also select Other and enter
your own in the Custom Record Separator field.
Header Select this check box to insert a header row to the data. The
schema column names will be used as column headers.
File Encoding Select the file encoding from the drop-down list.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Max batch size Set the maximum number of lines allowed in each batch.
Do not change the default value unless you are facing
performance issues. Increasing the batch size can improve
the performance but a value too high could cause Job
failures.
Blob Template Name Enter a string as the name prefix for the Blob files
generated. The name of a Blob file generated will be the
name prefix followed by another string.
Global Variables
Usage
280
tAzureAdlsGen2Output
1;James
2;Josephine
3;Donette
4;Simona
5;Mitsue
6;Leota
This scenario requires an Azure storage user account with permissions for reading and writing files.
Optionally, you can monitor the data using Microsoft Azure Storage Explorer, a utility for managing
your Azure storage resources. Check Azure Storage Explorer for related information.
1;James
2;Josephine
3;Donette
4;Simona
5;Mitsue
6;Leota
281
tAzureAdlsGen2Output
• Enter the name of an existing Blob container in Filesystem. You can also click ... to the right of
this field and select the Blob container from the list in the dialog box.
• In Blobs Path, enter the name of the directory where you want to put the data.
• Select CSV for Format; Semicolon for Field Delimiter; and CRLF for Record Separator. Select
the Header option.
• Leave other options as they are.
3. In the Advanced settings view of tAzureAdlsGen2Input, enter the prefix for the Blob files
generated in the Blob Template Name field (data- in this example).
4. Do exact the same described in step 2 for the tAzureAdlsGen2Input component. Be sure to
propagate the schema to the subsequent component when prompted.
5. In the Basic settings view of tLogRow:
• Select Table (print values in cells of a table).
• Leave other options as they are.
3. (Option) Check the Blob file generated using Microsoft Azure Storage Explorer. See Get started
with Storage Explorer for related information.
282
tAzureStorageConnection
tAzureStorageConnection
Uses authentication and the protocol information to create a connection to the Microsoft Azure
Storage system that can then be reused by other Azure Storage components.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
283
tAzureStorageConnection
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Usage rule This component is generally used with other Azure Storage
components.
Knowledge about Microsoft Azure Storage is required.
Related scenario
For related scenarios, see:
• Retrieving files from a Azure Storage container on page 303
• Creating a container in Azure Storage on page 286
• Handling data with Microsoft Azure Table storage on page 313
284
tAzureStorageContainerCreate
tAzureStorageContainerCreate
Creates a new storage container used to hold Azure blobs (Binary Large Object) for a given Azure
storage account.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
285
tAzureStorageContainerCreate
Container name Enter the name of the blob container you need to create.
Access control Select the access restriction level you need to apply on the
container to be created.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
286
tAzureStorageContainerCreate
Before replicating this scenario, you must have appropriate rights and permissions to read and write
files in the Azure storage account to be used. For further information, see Microsoft's documentation
for Azure Storage: http://azure.microsoft.com/en-us/documentation/services/storage/.
2. In the Account name field, enter the name of the storage account to be connected to. In this exam
ple, it is talendstorage, an account that has been created for demonstration purposes.
3. In the Account key field, paste the primary or the secondary key associated with the storage
account to be used. These keys can be found in the Manage Access Key dashboard in the Azure St
orage system to be connected to.
4. From the Protocol list, select the protocol for the endpoint of the storage account to be used. In
this example, it is HTTPS.
287
tAzureStorageContainerCreate
Creating a container
Procedure
1. Double-click tAzureStorageContainerCreate to open its Component view.
2. Select the component whose connection details will be used to set up the Azure storage
connection. In this example, it is tAzureStorageConnection_1.
3. In the Container name field, enter the name of the container you need to create. If a container
using the same name exists, that container will be overwritten at runtime.
4. From the Access control list, select the access restriction level for the container to be created. In
this example, select Private.
2. Select the component whose connection details will be used to set up the Azure storage c
onnection. In this example, it is tAzureStorageConnection_1.
3. In the Container name field, enter the name of the container you need to check whether it exists.
4. Double-click tJava to open its Component view.
288
tAzureStorageContainerCreate
7. From the Outline panel, drop the CONTAINER_EXSIT global variable into the parentheses
in the code in the Component view in order to make the code read: System.out.pri
ntln(((Boolean)globalMap.get("tAzureStorageContainerExist_1_CONTAINER_
EXIST")));
You can read that the Job returns true as the verification result, that is to say, the
talendcontainer container has been created in the storage account being used.
3. Double-check the result in the web console of the Azure storage account.
289
tAzureStorageContainerCreate
You can read as well that the talendcontainer container has been created.
290
tAzureStorageContainerDelete
tAzureStorageContainerDelete
Automates the removal of a given blob container from the space of a specific storage account.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
291
tAzureStorageContainerDelete
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
292
tAzureStorageContainerExist
tAzureStorageContainerExist
Automates the verification of whether a given blob container exists or not within a storage account.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
293
tAzureStorageContainerExist
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Container name Enter the name of the blob container you need to verify
whether it exists.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
CONTAINER_EXIST The result of whether the given container exists or not. This
is an After variable and it returns a boolean.
Usage
Related scenario
For a related scenario, see Creating a container in Azure Storage on page 286
294
tAzureStorageContainerList
tAzureStorageContainerList
Lists all containers in a given Azure storage account.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
295
tAzureStorageContainerList
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with a single
column ContainerName of String type, which indicates
the name of each container to be listed.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
296
tAzureStorageContainerList
Usage
Related scenario
No scenario is available for this component yet.
297
tAzureStorageDelete
tAzureStorageDelete
Deletes blobs from a given container for an Azure storage account according to the specified blob
filters.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
298
tAzureStorageDelete
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Container name Enter the name of the container from which you need to
delete blobs.
Blob filter Complete this table to select the blobs to be deleted. The
parameters to be provided are:
• Blob prefix: enter the common prefix of the names of
the blobs you need to delete. This prefix allows you to
filter the blobs which have the specified prefix in their
names in the given container.
A blob name contains the virtual hierarchy of the blob
itself. This hierarchy is a virtual path to that blob and is
relative to the container where that blob is stored. For
example, in a container named photos, the name of a
photo blob might be 2014/US/Oakland/Talend.jpg.
For this reason, when you define a prefix, you are
actually designating a directory level as the blob filter,
for example, 2014/ or 2014/US/.
• Include subdirectories: select this check box to select
all of the sub-folders and the blobs in those folders
beneath the designated directory level. If you leave
this check box clear, tAzureStorageDelete deletes only
the blobs directly beneath that directory level.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
299
tAzureStorageDelete
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
300
tAzureStorageGet
tAzureStorageGet
Retrieves blobs from a given container for an Azure storage account according to the specified filters
applied on the virtual hierarchy of the blobs and then write selected blobs in a local folder.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
301
tAzureStorageGet
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Container Enter the name of the container you need to retrieve blobs
from.
Local folder Enter the path, or browse to the folder in which you need to
store the retrieved blobs.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
302
tAzureStorageGet
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Before replicating this scenario, you must have appropriate rights and permissions to read and write
files in the Azure storage account to be used. For further information, see Microsoft's documentation
for Azure Storage: http://azure.microsoft.com/en-us/documentation/services/storage/.
303
tAzureStorageGet
The talendcontainer container used in this scenario was created using tAzureStorageContainerCreate
in the scenario Creating a container in Azure Storage on page 286.
2. In the Account name field, enter the name of the storage account to be connected to. In this exam
ple, it is talendstorage, an account that has been created for demonstration purposes.
3. In the Account key field, paste the primary or the secondary key associated with the storage
account to be used. These keys can be found in the Manage Access Key dashboard in the Azure St
orage system to be connected to.
4. From the Protocol list, select the protocol for the endpoint of the storage account to be used. In
this example, it is HTTPS.
304
tAzureStorageGet
2. Select the component whose connection details will be used to set up the Azure storage c
onnection. In this example, it is tAzureStorageConnection_1.
3. In the Container name field, enter the name of the container you need to write files in. In this
example, it is talendcontainer, a container created in the scenario Creating a container in
Azure Storage on page 286.
4. In the Local folder field, enter the path, or browse, to the directory where the files to be used are
stored. In this scenario, they are some pictures showing technical process and stored locally in
E:/photos. Therefore, put E:/photos; this allows tAzureStoragePut to upload all the files of
this folder and its sub-folders into the talendcontainer container.
For demonstration purposes, the example photos are organized as follows in the E:/photos
folder.
• Directly beneath the E:/photos level:
components-use_case_triakinput_1.png
components-use_case_triakinput_2.png
components-use_case_triakinput_3.png
components-use_case_triakinput_4.png
components-use_case_tmongodbbulkload_1.png
components-use_case_tmongodbbulkload_2.png
components-use_case_tmongodbbulkload_3.png
components-use_case_tmongodbbulkload_4.png
components-use_case_tmongodbbulkload_5.png
components-use_case_tmongodbbulkload_6.png
components-use_case_tmongodbbulkload_7.png
components-use_case_tmongodbbulkload_8.png
5. In the Azure Storage folder field, enter the directory where you want to write files. This directory
will be created in the container to be used if it does not exist. In this example, enter photos.
Procedure
1. Double-click tAzureStorageList to open its Component view.
305
tAzureStorageGet
2. Select the component whose connection details will be used to set up the Azure storage c
onnection. In this example, it is tAzureStorageConnection_1.
3. In the Container name field, enter the name of the container in which you need to check whether
the given files exist. In this scenario, it is talendcontainer.
4. Under the Blob filter table, click the [+] button to add one row in the table.
5. In the Prefix column, enter the common prefix of the names of the files (blobs) to be checked.
This prefix represents a virtual directory level you designate as the starting point down from
which files (blobs) are checked. In this example, it is photos/.
For further information about blob names, see http://msdn.microsoft.com/en-us/library/dd
135715.aspx.
6. In the Include sub-directories column, select the check box in the newly added row. This allows
tAzureStorageList to check all the files at any hierarchical level beneath the designated starting
point.
Configuring tJava
Procedure
1. Double-click tJava to open its Component view.
306
tAzureStorageGet
4. From the Outline panel, drop the CONTAINER_BLOB global variable into the parentheses in the
code in the Component view so as to make the code read: System.out.println(((Boolean
)globalMap.get("tAzureStorageList_1_CURRENT_BLOB")));
2. Select the component whose connection details will be used to set up the Azure storage c
onnection. In this example, it is tAzureStorageConnection_1.
3. In the Container name field, enter the name of the container from which you need to retrieve files.
In this scenario, it is talendcontainer.
4. In the Local folder field, enter the path, or browse, to the directory where you want to put the retri
eved files. In this example, it is E:/screenshots.
5. Under the Blob table, click the [+] button to add one row in the table.
6. In the Prefix column, enter the common name prefix of the files (blobs) to be retrieved. In this
example, it is photos/mongodb/.
7. In the Include sub-directories column, select the check box in the newly added row. This allows
tAzureStorageGet to retrieve all the files (blobs) beneath the photos/mongodb/ level.
8. In the Create parent directories column, select the check box in the newly added row to create the
same directory in the specified local folder as the retrieved blobs have in the container.
307
tAzureStorageGet
Note that having this same directory is necessary for successfully retrieving blobs. If you leave
this check box clear, then you need to create the same directory yourself in the target local folder.
You can read that the Job returns the list of the blobs with the photos prefix in the container.
3. Double-check the resut in the web console of the Azure storage account.
308
tAzureStorageGet
You can see the blobs with the photos/mongodb/ prefix have been retrieved and their prefix
transformed to directories.
309
tAzureStorageInputTable
tAzureStorageInputTable
Retrieves a set of entities that satisfy the specified filter criteria from an Azure storage table.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
310
tAzureStorageInputTable
Table name Specify the name of the table from which the entities will
be retrieved.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with the
following columns that describe the three system properties
of each entity:
• PartitionKey: the partition key for the partition that the
entity belongs to.
• RowKey: the row key for the entity within the partition.
PartitionKey and RowKey are string type values that
uniquely identify every entity in a table, and the user
must include them in every insert, update, and delete
operation.
• Timestamp: the time that the entity was last modified.
This DateTime value is maintained by the Azure server
and it can not be modified by the user.
For more information about these system properties, see
Understanding the Table Service Data Model.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Use filter expression Select this check box and complete the Filter expressions
table displayed to specify the conditions used to filter the
entities to be retrieved by clicking the [+] button to add as
311
tAzureStorageInputTable
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Advanced settings
Name mappings Complete this table to map the column name of the
component schema with the property name of the Azure
table entity if they are different.
• Schema column name: enter the column name of the
component schema between double quotation marks.
• Entity property name: enter the property name of the
Azure table entity between double quotation marks.
For example, if there are three schema columns
CompanyID, EmployeeID, and EmployeeName that
are used to feed the values for the PartitionKey,
RowKey, and Name entity properties respectively, since
the PartitionKey and RowKey columns have already
been added to the schema automatically and you do not
need to specify the mapping relationship for them, you
only need to add one row and set the value of the Schema
column name cell with "EmployeeName" and the value of
the Entity property name cell with "Name" to specify the
mapping relationship for the EmployeeName column when
retrieving data from the Azure table.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global variables
312
tAzureStorageInputTable
Usage
313
tAzureStorageInputTable
Procedure
1. Create a new Job and add a tAzureStorageConnection component, a tFixedFlowInput component,
a tAzureStorageOutputTable component, a tAzureStorageInputTable component, and a tLogRow
component by typing their names in the design workspace or dropping them from the Palette.
2. Link the tFixedFlowInput component to the tAzureStorageOutputTable component using a Row >
Main connection.
3. Do the same to link the tAzureStorageInputTable component to the tLogRow component.
4. Link the tAzureStorageConnection component to the tFixedFlowInput component using a Trigger
> OnSubjobOk connection.
5. Do the same to link the tFixedFlowInput component to the tAzureStorageInputTable component.
Procedure
1. Double-click the tAzureStorageConnection component to open its Basic settings view on the
Component tab.
2. In the Account Name field, specify the name of the storage account you need to access.
3. In the Account Key field, specify the key associated with the storage account you need to access.
Procedure
1. Double-click the tFixedFlowInput component to open its Basic settings view on the Component
tab.
314
tAzureStorageInputTable
2. Click next to Edit schema to open the schema dialog box and define the schema by adding
six columns: Id, Name, Site, and Job of String type, Date of Date type, and Salary of Double
type. Then click OK to save the changes and accept the propagation prompted by the pop-up
dialog box.
Note that in this example, the Site and Id columns are used to feed the values of the
PartitionKey and RowKey system properties of each entity and they should be of String type,
and the Name column is used to feed the value of the EmployeeName property of each entity.
3. In the Mode area, select Use Inline Content(delimited file) and in the Content field displayed,
enter the employee data that will be written into the Azure Storage table.
4. Double-click the tAzureStorageOutputTable component to open its Basic settings view on the
Component tab
315
tAzureStorageInputTable
5. From the connection component drop-down list, select the component whose connection
details will be used to set up the connection to the Azure Storage service, tAzureStorageC
onnection_1 in this example.
6. In the Table name field, enter the name of the table into which the employee data will be written,
employee in this example.
7. From the Action on table drop-down list, select the operation to be performed on the specified
table, Drop table if exist and create in this example.
8. Click Advanced settings to open its view.
9. Click under the Name mappings table to add three rows and map the schema column name
with the property name of each entity in the Azure table. In this example,
• the Site column is used to feed the value of the PartitionKey system property, in the
first row you need to set the Schema column name cell with the value "Site" and the Entity
property name cell with the value "PartitionKey".
• the Id column is used to feed the value of the RowKey system property, in the second row
you need to set the Schema column name cell with the value "Id" and the Entity property
name cell with the value "RowKey".
• the Name column is used to feed the value of the EmployeeName property, in the third row
you need to set the Schema column name cell with the value "Name" and the Entity property
name cell with the value "EmployeeName".
Procedure
1. Double-click the tAzureStorageInputTable component to open its Basic settings view.
316
tAzureStorageInputTable
2. From the connection component drop-down list, select the component whose connection
details will be used to set up the connection to the Azure Storage service, tAzureStorageC
onnection_1 in this example.
3. In the Table name field, enter the name of the table from which the employee data will be
retrieved, employee in this example.
4. Click next to Edit schema to open the schema dialog box.
Note that the schema has already been predefined with two read-only columns RowKey and
PartitionKey of String type, and another column Timestamp of Date type. The RowKey
and PartitionKey columns correspond to the Id and Site columns of the tAzureStorageO
utputTable schema.
5. Define the schema by adding another four columns that hold other employee data, Name and Job
of String type, Date of Date type, and Salary of Double type. Then click OK to save the changes
and accept the propagation prompted by the pop-up dialog box.
6. Click Advanced settings to open its view.
317
tAzureStorageInputTable
7. Click under the Name mappings table to add one row and set the Schema column name cell
with the value "Name" and the Entity property name cell with the value "EmployeeName" to
map the schema column name with the property name of each entity in the Azure table.
Note that for the tAzureStorageInputTable component, the PartitionKey and RowKey
columns have already been added automatically to the schema and you do not need to specify the
mapping relationship for them.
8. Double-click the tLogRow component to open its Basic settings view and in the Mode area, select
Table (print values in cells of a table) for a better display of the result.
Procedure
1. Press Ctrl + S to save the Job.
2. Press F6 to execute the Job.
As shown above, the Job is executed successfully and the employee data is displayed on the
console, with the timestamp value that indicates when each entity was inserted.
3. Double-check the employee data that has been written into the Azure Storage table employee
using Microsoft Azure Storage Explorer if you want.
318
tAzureStorageInputTable
319
tAzureStorageList
tAzureStorageList
Lists blobs in a given container according to the specified blob filters.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
320
tAzureStorageList
Container name Enter the name of the container from which you need to
select blobs to be listed.
Blob filter Complete this table to select the blobs to be listed. The par
ameters to be provided are:
• Prefix: enter the common prefix of the names of the
blobs you need to list. This prefix allows you to filter
the blobs which have the specified prefix in their
names in the given container.
A blob name contains the virtual hierarchy of the blob
itself. This hierarchy is a virtual path to that blob and is
relative to the container where that blob is stored. For
example, in a container named photos, the name of a
photo blob might be 2014/US/Oakland/Talend.jpg.
For this reason, when you define a prefix, you are
actually designating a directory level as the blob filter,
for example, 2014/ or 2014/US/.
If you want to select the blobs stored directly beneath
the container level, that is to say, the blobs without
virtual path in their names, remove quotation marks
and enter null.
• Include sub-directories: select this check box to select
all of the sub-folders and the blobs in those folders
beneath the designated directory level. If you leave
this check box clear, tAzureStorageList returns only
the blobs, if any, directly beneath that directory level.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with a single
column BlobName of String type, which indicates the nam
e of each blob to be listed.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
321
tAzureStorageList
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
For a related scenario, see Retrieving files from a Azure Storage container on page 303
322
tAzureStorageOutputTable
tAzureStorageOutputTable
Performs the defined action on a given Azure storage table and inserts, replaces, merges or deletes
entities in the table based on the incoming data from the preceding component.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
323
tAzureStorageOutputTable
Table name Specify the name of the table into which the entities will be
written.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Partition Key Select the schema column that holds the partition key value
from the drop-down list.
Row Key Select the schema column that holds the row key value
from the drop-down list.
324
tAzureStorageOutputTable
Process in batch Select this check box to process the input entities in batch.
Note that the entities to be processed in batch should
belong to the same partition group, which means, they
should have the same partition key value.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Advanced settings
Name mappings Complete this table to map the column name of the
component schema with the property name of the Azure
table entity if they are different.
• Schema column name: enter the column name of the
component schema between double quotation marks.
• Entity property name: enter the property name of the
Azure table entity between double quotation marks.
For example, if there are three schema columns
CompanyID, EmployeeID, and EmployeeName that are
used to feed the values for the PartitionKey, RowKey,
and Name entity properties respectively, then you need to
add the following rows for the mapping when writing data
into the Azure table.
• the Schema column name cell with the value
"CompanyID" and the Entity property name cell with
the value "PartitionKey".
• the Schema column name cell with the value
"EmployeeID" and the Entity property name cell
with the value "RowKey".
• the Schema column name cell with the value
"EmployeeName" and the Entity property name cell
with the value "Name".
325
tAzureStorageOutputTable
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global variables
Usage
Related scenario
For a related scenario, see Handling data with Microsoft Azure Table storage on page 313.
326
tAzureStoragePut
tAzureStoragePut
Uploads local files into a given container for an Azure storage account.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
327
tAzureStoragePut
Container name Enter the name of the container you need to write files in.
This container must exist in the Azure Storage system you
are using.
Local folder Enter the path, or browse to the folder from which you need
to upload files.
Azure storage folder Enter the path to the virtual blob folder in the remote Azure
storage system you want to upload files into.
If you do not put any value in this field but leave this
field as it is with only its default quotation marks,
tAzureStoragePut writes files directly beneath the
container level.
Use file list Select this check box to be able to define file filtering
conditions. Once selecting it, the Files table is displayed.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
328
tAzureStoragePut
Usage
Related scenario
For a related scenario, see Retrieving files from a Azure Storage container on page 303
329
tAzureStorageQueueCreate
tAzureStorageQueueCreate
Creates a new queue under a given Azure storage account.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
330
tAzureStorageQueueCreate
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Queue name Specify the name of the Azure queue to be created. For
more information about the queue naming rules, see
Naming Queues and Metadata.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
QUEUE_NAME The name of the Azure queue. This is an After variable and
it returns a string.
Usage
Related scenario
No scenario is available for this component yet.
331
tAzureStorageQueueDelete
tAzureStorageQueueDelete
Deletes a specified queue permanently under a given Azure storage account.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
332
tAzureStorageQueueDelete
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
QUEUE_NAME The name of the Azure queue. This is an After variable and
it returns a string.
Usage
Related scenario
No scenario is available for this component yet.
333
tAzureStorageQueueInput
tAzureStorageQueueInput
Retrieves one or more messages from the front of an Azure queue.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
334
tAzureStorageQueueInput
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Queue name Specify the name of the Azure queue from which the
messages will be retrieved.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with the
following columns:
• MessageId: the id of the message.
• MessageContent: the body of the message.
• InsertionTime: the time when the message was added
to the queue.
• ExpirationTime: the time when the message will be
expired.
• NextVisibleTime: the time when the message becomes
visible next time.
• DequeueCount: the number of times that the message
has been dequeued. This value is incremented each
time the message is dequeued, but it will not be
incremented when the message is peeked.
• PopReceipt: the pop receipt value that is required to
delete the message.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
335
tAzureStorageQueueInput
Delete the message while streaming Select this check box to delete the messages while
retrieving them from the queue.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Advanced settings
Visibility timeout in seconds Enter the visibility timeout value (in seconds) relative
to the server time. This timeout value is added to the
time at which the message is retrieved to determine its
NextVisibleTime value. The message will not be visible
to other consumers for this time interval after it has been
retrieved.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
QUEUE_NAME The name of the Azure queue. This is an After variable and
it returns a string.
Usage
Related scenario
No scenario is available for this component yet.
336
tAzureStorageQueueInputLoop
tAzureStorageQueueInputLoop
Runs an endless loop to retrieve messages from the front of an Azure queue.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
337
tAzureStorageQueueInputLoop
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Queue name Specify the name of the Azure queue from which the
messages will be retrieved.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with the
following columns:
• MessageId: the id of the message.
• MessageContent: the body of the message.
• InsertionTime: the time when the message was added
to the queue.
• ExpirationTime: the time when the message will be
expired.
• NextVisibleTime: the time when the message becomes
visible next time.
• DequeueCount: the number of times that the message
has been dequeued. This value is incremented each
time the message is dequeued, but it will not be
incremented when the message is peeked.
• PopReceipt: the pop receipt value that is required to
delete the message.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
338
tAzureStorageQueueInputLoop
Loop wait time Specify the duration (in seconds) for which the loop will
wait for the message to arrive in the queue before returning.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
QUEUE_NAME The name of the Azure queue. This is an After variable and
it returns a string.
Usage
Related scenario
No scenario is available for this component yet.
339
tAzureStorageQueueList
tAzureStorageQueueList
Returns all queues associated with the given Azure storage account.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
340
tAzureStorageQueueList
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with one single
column QueueName that stores the name of each queue to
be returned.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
341
tAzureStorageQueueList
Usage
Related scenario
No scenario is available for this component yet.
342
tAzureStorageQueueOutput
tAzureStorageQueueOutput
Adds messages to the back of an Azure queue.
Note that this component can only be used with Java 8.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
343
tAzureStorageQueueOutput
Queue name Specify the name of the Azure queue to which the messages
will be added.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with one single
column MessageContent that stores the body of each
message.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
344
tAzureStorageQueueOutput
Usage
Related scenario
No scenario is available for this component yet.
345
tAzureStorageQueuePurge
tAzureStorageQueuePurge
Purges messages in an Azure queue.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.
Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.
Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
346
tAzureStorageQueuePurge
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.
Queue name Specify the name of the Azure queue in which the messages
will be purged.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
No scenario is available for this component yet.
347
tBarChart
tBarChart
Generates a bar chart from the input data to ease technical analysis.
tBarChart reads data from an input flow and transforms the data into a bar chart in a PNG image file.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Note:
The schema of tBarChart contains three read-only
columns named series (string), category (string), and
value (integer) respectively, in a fixed order. The data
in any extra columns will be only passed to the next
component, if any, without being presented in the bar c
hart.
Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the output component.
348
tBarChart
Generated image path Name and path of the output image file.
Include legend Select this check box if you want the bar chart to include a
legend, indicating all series in different colors.
Image width and Image height Enter the width and height of the image file, in pixels.
Category axis name and Value axis name Enter the category axis name and value axis name.
Plot orientation Select the plot orientation of the bar chart: VERTICAL or
HORIZONTAL.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
349
tBarChart
City;Population(x1000);LandArea(km2);PopulationDensity(people/km2)
Beijing;10233;1418;7620
Moscow;10452;1081;9644
Seoul;10422;605;17215
Tokyo;8731;617;14151
Jakarta;8490;664;12738
New York;8310;789;10452
Because the input file has a different structure than the one required by the tBarChart component,
this use case uses the tMap component to adapt the source data to the three-column schema of
tBarChart so that a temporary CSV file can be created as the input to the tBarChart component.
Note:
You will usually use the tMap component to adjust the input schema in accordance with the
schema structure of the tBarChart component. For more information about how to use the tMap
component, see Talend Studio User Guide and tMap on page 1983.
To ensure correct generation of the temporary input file, a pre-treatment subJob is used to delete the
temporary file in case it already exists before the main Job is executed; as this temporary file serves
this specific Job only, a post-treatment subJob is used to deleted it after the main Job is executed.
350
tBarChart
Results
351
tBarChart
352
tBarChart
8. Click OK to save the mappings and close the Map Editor and propagate the output schemas to the
output components.
2. In the File Name field, define a temporary CSV file to send the mapped data flows to. In this use
case, we name this file Temp.csv. This file will be used as the input to the tBarChart component.
3. Select the Append check box.
4. Repeat the steps above to define the properties of the other two tFileOutputDelimited
components, using exactly the same settings as in the first tFileOutputDelimited component.
353
tBarChart
Note:
Note that the order of output flows from the tMap component is not necessarily the actual
order of writing data to the target file. To ensure the target file is correctly generated, delete
the file by the same name if it already exists before Job execution and select the Append check
box in all the tFileOutputDelimited components in this step.
2. Fill in the File name field with the path to the temporary input file generated by the
tFileOutputDelimited components. In this use case, the temporary input file to the tBarChart is
Temp.csv.
3. Double-click the tBarChart component to display its Basic settings view.
4. In the Generated image path field, define the file path of the image file to be generated.
5. In the Chart title field, define a title for the bar chart.
6. Define the category and series axis names.
354
tBarChart
7. Define the size and transparency degree of the image if needed. In this use case, we simply use
the default settings.
8. Click Edit schema to open the schema dialog box.
9. Copy all the columns from the output schema to the input schema by clicking the left-pointing
double arrow button. Then, click OK to close the schema dialog box.
Procedure
1. Double-click the first tFileDelete component to display its Basic settings view.
2. Fill in the File name field with the path to the temporary input file.
If the Fail on error check box is selected while the pre-treatment subJob fails because of errors
such as the file to delete does not exist, this failure will prevent the main subJob from being
launched. In this situation, you can clear the Fail on error check box to avoid this interruption.
355
tBarChart
356
tBigQueryBulkExec
tBigQueryBulkExec
Transfers given data to Google BigQuery.
The tBigQueryOutputBulk and tBigQueryBulkExec components are generally used together as parts
of a two step process. In the first step, an output file is generated. In the second step, this file is used
to feed a dataset. These two steps are fused together in the tBigQueryOutput component, detailed
in a separate section. The advantage of using two separate components is that the data can be
transformed before it is loaded in the dataset.
This component transfers a given file from Google Cloud Storage to Google BigQuery, or uploads a
given file into Google Cloud Storage and then transfers it to Google BigQuery.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.
357
tBigQueryBulkExec
Service account credentials file Enter the path to the credentials file created for the service
account to be used. This file must be stored in the machine
in which your Talend Job is actually launched and executed.
For further information about how to create a Google
service account and obtain the credentials file, see
Getting Started with Authentication from the Google
documentation.
Client ID and Client secret Paste the client ID and the client secret, both created and
viewable on the API Access tab view of the project hosting
the Google BigQuery service and the Cloud Storage service
you need to use.
To enter the client secret, click the [...] button next to the
client secret field, and then in the pop-up dialog box enter
the client secret between double quotes and click OK to
save the settings.
Authorization code Paste the authorization code provided by Google for the
access you are building.
To obtain the authorization code, you need to execute
the Job using this component and when this Job pauses
execution to print out an URL address, you navigate to this
address to copy the authorization code displayed.
Dataset Enter the name of the dataset you need to transfer data to.
Table Enter the name of the table you need to transfer data to.
If this table does not exist, select the Create the table if it
doesn't exist check box.
Action on data Select the action to be performed from the drop-down list
when transferring data to the target table. The action may
be:
• Truncate: it empties the contents of the table and
repopulates it with the transferred data.
• Append: it adds rows to the existing data in the table.
• Empty: it populates the empty table.
358
tBigQueryBulkExec
Bulk file already exists in Google storage Select this check box to reuse the authentication
information for Google Cloud Storage connection, then,
complete the File and the Header fields.
Access key and Secret key Paste the authentication information obtained from Google
for making requests to Google Cloud Storage.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project.
Bucket Enter the name of the bucket, the Google Cloud Storage
container, which holds the data to be transferred to Google
BigQuery.
Header Set values to ignore the header of the transferred data. For
example, enter 0 to ignore no rows for the data without
header.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
token properties File Name Enter the path to, or browse to the refresh token file you
need to use.
At the first Job execution using the Authorization code you
have obtained from Google BigQuery, the value in this field
is the directory and the name of that refresh token file to
be created and used; if that token file has been created and
you need to reuse it, you have to specify its directory and
file name in this field.
With only the token file name entered, Talend Studio
considers the directory of that token file to be the root of
the Studio folder.
For further information about the refresh token, see the
manual of Google BigQuery.
Set the field delimiter Enter character, string or regular expression to separate
fields for the transferred data.
359
tBigQueryBulkExec
Drop table if exists Select the Drop table if exists check box to remove the
table specified in the Table field, if this table already exists.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
tStatCatcher Statistics Select this check box to collect the log data at the
component level.
Global Variables
Usage
Related Scenario
For related topic, see Writing data in Google BigQuery on page 371
360
tBigQueryInput
tBigQueryInput
Performs the queries supported by Google BigQuery.
This component connects to Google BigQuery and performs queries in it.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.
361
tBigQueryInput
Service account credentials file Enter the path to the credentials file created for the service
account to be used. This file must be stored in the machine
in which your Talend Job is actually launched and executed.
For further information about how to create a Google
service account and obtain the credentials file, see
Getting Started with Authentication from the Google
documentation.
Client ID and Client secret Paste the client ID and the client secret, both created and
viewable on the API Access tab view of the project hosting
the Google BigQuery service and the Cloud Storage service
you need to use.
To enter the client secret, click the [...] button next to the
client secret field, and then in the pop-up dialog box enter
the client secret between double quotes and click OK to
save the settings.
Authorization code Paste the authorization code provided by Google for the
access you are building.
To obtain the authorization code, you need to execute
the Job using this component and when this Job pauses
execution to print out an URL address, you navigate to this
address to copy the authorization code displayed.
Use legacy SQL and Query Enter the query you need to use.
If the query to be used is the legacy SQL of BigQuery, select
this Use legacy SQL check box. For further information
about this legacy SQL, see Legacy SQL query reference from
the Google BigQuery documentation.
Result size Select the option depending on the volume of the query
result.
By default, the Small option is used, but when the query
result is larger than the maximum response size, you need
to select the Large option.
If the volume of the result is not certain, select Auto.
Advanced settings
token properties File Name Enter the path to, or browse to the refresh token file you
need to use.
At the first Job execution using the Authorization code you
have obtained from Google BigQuery, the value in this field
is the directory and the name of that refresh token file to
be created and used; if that token file has been created and
362
tBigQueryInput
you need to reuse it, you have to specify its directory and
file name in this field.
With only the token file name entered, Talend Studio
considers the directory of that token file to be the root of
the Studio folder.
For further information about the refresh token, see the
manual of Google BigQuery.
Advanced Separator (for number) Select this check box to change the separator used for the
numbers.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
Use custom temporary Dataset name Select this check box to use an existing dataset to which
you have access, instead of creating one, and in the field
that is displayed, enter the name of this dataset. This way,
you avoid rights and permissions issues related to dataset
creation.
This check box is available only when you have selected
Large from the Result size drop-down list in the Basic
settings tab.
tStatCatcher Statistics Select this check box to collect the log data at the
component level.
Global Variables
Usage
363
tBigQueryInput
The following figure shows the schema of the table, UScustomer, we use as example to perform the
SELECT query in.
We will select the State records and count the occurrence of each State among those records.
364
tBigQueryInput
Procedure
1. Double-click tBigQueryInput to open its Component view.
3. Click the button twice to add two rows and enter the names of your choice for each of them in
the Column column. In this scenario, they are: States and Count.
4. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog
box.
5. In the Authentication area, add the authentication information. In most cases, the Service account
mode is more straight-forward and easy to handle.
Authentication mode Description
365
tBigQueryInput
Procedure
In the Query field, enter select States, count (*) as Count from documentation.
UScustomer group by States
366
tBigQueryInput
Procedure
To execute this Job, press F6.
Results
Once done, the Run view is opened automatically, where you can check the execution result.
367
tBigQueryOutput
tBigQueryOutput
Transfers the data provided by its preceding component to Google BigQuery.
This component writes the data it receives in a user-specified directory and transfers the data to
Google BigQuery via Google Cloud Storage.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.
Property type Built-In: You create and store the schema locally for this
component only.
368
tBigQueryOutput
Local filename Browse to, or enter the path to the file you want to write the
received data in.
Append Select this check box to add rows to the existing data in the
file specified in Local filename.
Service account credentials file Enter the path to the credentials file created for the service
account to be used. This file must be stored in the machine
in which your Talend Job is actually launched and executed.
For further information about how to create a Google
service account and obtain the credentials file, see
Getting Started with Authentication from the Google
documentation.
Client ID and Client secret Paste the client ID and the client secret, both created and
viewable on the API Access tab view of the project hosting
the Google BigQuery service and the Cloud Storage service
you need to use.
To enter the client secret, click the [...] button next to the
client secret field, and then in the pop-up dialog box enter
the client secret between double quotes and click OK to
save the settings.
Authorization code Paste the authorization code provided by Google for the
access you are building.
To obtain the authorization code, you need to execute
the Job using this component and when this Job pauses
execution to print out an URL address, you navigate to this
address to copy the authorization code displayed.
Dataset Enter the name of the dataset you need to transfer data to.
Table Enter the name of the table you need to transfer data to.
If this table does not exist, select the Create the table if it
doesn't exist check box.
369
tBigQueryOutput
Action on data Select the action to be performed from the drop-down list
when transferring data to the target table. The action may
be:
• Truncate: it empties the contents of the table and
repopulates it with the transferred data.
• Append: it adds rows to the existing data in the table.
• Empty: it populates the empty table.
Access key and Secret key Paste the authentication information obtained from Google
for making requests to Google Cloud Storage.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project.
Bucket Enter the name of the bucket, the Google Cloud Storage
container, which holds the data to be transferred to Google
BigQuery.
Header Set values to ignore the header of the transferred data. For
example, enter 0 to ignore no rows for the data without
header and set 1 for the data with header at the first row.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
token properties File Name Enter the path to, or browse to the refresh token file you
need to use.
At the first Job execution using the Authorization code you
have obtained from Google BigQuery, the value in this field
is the directory and the name of that refresh token file to
be created and used; if that token file has been created and
you need to reuse it, you have to specify its directory and
file name in this field.
With only the token file name entered, Talend Studio
considers the directory of that token file to be the root of
the Studio folder.
For further information about the refresh token, see the
manual of Google BigQuery.
370
tBigQueryOutput
Drop table if exists Select the Drop table if exists check box to remove the
table specified in the Table field, if this table already exists.
Create directory if not exists Select this check box to create the directory you defined in
the File field for Google Cloud Storage, if it does not exist.
Custom the flush buffer size Enter the number of rows to be processed before the
memory is freed.
Check disk space Select this check box to throw an exception during
execution if the disk is full.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
tStatCatcher Statistics Select this check box to collect the log data at the
component level.
Global Variables
Usage
371
tBigQueryOutput
372
tBigQueryOutput
Procedure
1. Double-click tBigQueryOutput to open its Component view.
373
tBigQueryOutput
2. Click Sync columns to retrieve the schema from its preceding component.
3. In the Local filename field, enter the directory where you need to create the file to be transferred
to BigQuery.
4. Navigate to the Google APIs Console in your web browser to access the Google project hosting
the BigQuery and the Cloud Storage services you need to use.
5. Click Google Cloud Storage > Interoperable Access to open its view.
6. In Google storage configuration area of the Component view, paste Access key, Access secret from
the Interoperable Access tab view to the corresponding fields, respectively.
7. In the Bucket field, enter the path to the bucket you want to store the transferred data in. In this
example, it is talend/documentation
This bucket must exist in the directory in Cloud Storage
8. In the File field, enter the directory where in Google Clould Storage you receive and create the file
to be transferred to BigQuery. In this example, it is gs://talend/documentation/biquery_UScustom
er.csv. The file name must be the same as the one you defined in the Local filename field.
374
tBigQueryOutput
Troubleshooting: if you encounter issues such as Unable to read source URI of the file stored in
Google Cloud Storage, check whether you put the same file name in these two fields.
9. Enter 0 in the Header field to ignore no rows in the transferred data.
Procedure
1. In the Dataset field of the Component view, enter the dataset you need to transfer data in. In this
scenario, it is documentation.
This dataset must exist in BigQuery. The following figure shows the dataset used by this scenario.
2. In the Table field, enter the name of the table you need to write data in, for example, UScustomer.
3. In the Action on data field, select the action. In this example, select Truncate to empty the
contents, if there are any, of target table and to repopulate it with the transferred data.
4. In the Authentication area, add the authentication information. In most cases, the Service account
mode is more straight-forward and easy to handle.
Authentication mode Description
375
tBigQueryOutput
5. If you have been using the OAuth 2.0 authentication mode, in the Action on data field, select
the action to be performed on your data. In this example, select Truncate to empty the contents,
if there are any, of target table and to repopulate it with the transferred data. If your are using
Service account, ignore this step.
If the table to be used does not exist in BigQuery, select Create the table if it doesn't exist.
Results
Once done, the Run view is opened automatically, where you can check the execution result.
376
tBigQueryOutput
377
tBigQueryOutput
378
tBigQueryOutputBulk
tBigQueryOutputBulk
Creates a .txt or .csv file for the data of large size so that you can process it according to your
needs before transferring it to Google BigQuery.
The tBigQueryOutputBulk and tBigQueryBulkExec components are generally used together as parts
of a two step process. In the first step, an output file is generated. In the second step, this file is used
to feed a dataset. These two steps are fused together in the tBigQueryOutput component, detailed
in a separate section. The advantage of using two separate components is that the data can be
transformed before it is loaded in the dataset.
This component writes given data into a .txt or .csv file, ready to be transferred to Google
BigQuery.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.
379
tBigQueryOutputBulk
File name Browse, or enter the path to the .txt or .csv file you need to
generate.
Append Select the check box to write new data at the end of
the existing data. Otherwise, the existing data will be
overwritten.
Advanced settings
Create directory if not exists Select this check box to create the directory you defined in
the File field for Google Cloud Storage, if it does not exist.
Custom the flush buffer size Enter the number of rows to be processed before the
memory is freed.
Check disk space Select the this check box to throw an exception during
execution if the disk is full.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
tStatCatcher Statistics Select this check box to collect the log data at the
component level/
Global Variables
Usage
Usage rule This is an output component which needs the data provided
by its preceding component.
380
tBigQueryOutputBulk
Related Scenario
For related topic, see Writing data in Google BigQuery on page 371
381
tBigQuerySQLRow
tBigQuerySQLRow
Connects to Google BigQuery and performs queries to select data from tables row by row or create or
delete tables in Google BigQuery.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
Service account credentials file Enter the path to the credentials file created for the service
account to be used. This file must be stored in the machine
in which your Talend Job is actually launched and executed.
382
tBigQuerySQLRow
Client ID and Client secret Paste the client ID and the client secret, both created and
viewable on the API Access tab view of the project hosting
the Google BigQuery service and the Cloud Storage service
you need to use.
To enter the client secret, click the [...] button next to the
client secret field, and then in the pop-up dialog box enter
the client secret between double quotes and click OK to
save the settings.
Authorization code Paste the authorization code provided by Google for the
access you are building.
To obtain the authorization code, you need to execute
the Job using this component and when this Job pauses
execution to print out an URL address, you navigate to this
address to copy the authorization code displayed.
Use legacy SQL and Query Enter the query you need to use.
If the query to be used is the legacy SQL of BigQuery, select
this Use legacy SQL check box. For further information
about this legacy SQL, see Legacy SQL query reference from
the Google BigQuery documentation.
Advanced settings
token properties File Name Enter the path to, or browse to the refresh token file you
need to use.
At the first Job execution using the Authorization code you
have obtained from Google BigQuery, the value in this field
is the directory and the name of that refresh token file to
be created and used; if that token file has been created and
you need to reuse it, you have to specify its directory and
file name in this field.
With only the token file name entered, Talend Studio
considers the directory of that token file to be the root of
the Studio folder.
For further information about the refresh token, see the
manual of Google BigQuery.
Advanced Separator (for number) Select this check box to change the separator used for the
numbers.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
383
tBigQuerySQLRow
Result size Select the option depending on the volume of the query
result.
By default, the Small option is used, but when the query
result is larger than the maximum response size, you need
to select the Large option.
If the volume of the result is not certain, select Auto.
tStatCatcher Statistics Select this check box to collect the log data at the
component level.
Global Variables
Usage
384
tBonitaDeploy
tBonitaDeploy
Deploys a specific Bonita process to a Bonita Runtime.
This component configures any Bonita Runtime engine and deploys a specific Bonita process (a .bar
file exported from the Bonita solution) to this engine.
Basic settings
Bonita version Select a version number for the Bonita Runtime engine.
Bonita Runtime Environment File Browse to, or enter the path to the Bonita Runtime
environment file.
Note:
This field is displayed only when you select Bonita
version 5.3.1 from the Bonita version list.
Bonita Runtime Home Browse to, or enter the path to the Bonita Runtime
environment directory.
Note:
This field is displayed only when you select Bonita
version 5.6.1 from the Bonita version list.
Bonita Runtime Jaas File Browse to, or enter the path to the Bonita Runtime jaas file.
Bona Runtime logging file Browse to, or enter the path to the Bonita Runtime logging
file.
Login Module Type in the name of login module for logging in Bonita
Runtime engine which is defined in the Bonita Runtime jaas
file.
Business Archive Browse to, or enter the path to the Bonita process .bar file
you want to use.
User name Type in your user name used to log in Bonita studio.
385
tBonitaDeploy
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Related Scenario
For related topic, see Executing a Bonita process via a Talend Job on page 390.
386
tBonitaInstantiateProcess
tBonitaInstantiateProcess
Starts an instance for a specific process deployed in a Bonita Runtime engine.
This component instantiates a process already deployed in a Bonita Runtime engine.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
The schema of this component is read-only. You can click
Edit schema to view the schema.
In this component the schema is related to the Module
selected.
Note:
The ProcessInstanceUUID column is pre-defined in the
schema of this component, reserved for the identifier
number of the process instance being created.
Bonita Client Mode Select the client mode you want to use to instantiate a
Bonita process.
For more information about all the Bonita client modes, see
Bonita's manuals.
URL Enter the URL of the Bonita Web application server you
need to access for the process instantiation.
This field is available only in the HTTP client mode.
Auth Username and Auth Password Enter the authentication details used to connect to the
Bonita Web application server as technical user.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
The default authentication information is provided in these
fields. For further information about them, see Bonita's
manuals.
These fields are available only in the HTTP client mode.
Bonita version Select the version number of the Bonita Runtime engine to
be used.
387
tBonitaInstantiateProcess
Bonita Runtime Environment File Browse to, or enter the path to the Bonita Runtime
environment file.
This field is available only in the Java client mode.
Note:
This field is displayed only when you select Bonita
version 5.3.1 from the Bonita version list.
Bonita Runtime Home Browse to, or enter the path to the Bonita Runtime
environment directory.
Note:
This field is displayed only when you select Bonita
version 5.6.1 from the Bonita version list.
Bonita Runtime Jaas File Browse to, or enter the path to the Bonita Runtime jaas file.
This field is available only in the Java client mode.
Bonita Runtime logging file Browse to, or enter the path to the Bonita Runtime logging
file.
This field is available only in the Java client mode.
Note:
The process definition ID is created when the process is
deployed into the Bonita Runtime engine.
Process Name and Process Version Enter the ID information of a specific process you want
to instantiate. This information is used to automatically
generate the ID of this process.
This field is available in both of the Java client mode and
the HTTP client mode.
User name Type in your user name used to instantiate this process.
This filed is available in both of the Java client mode and
the HTTP client mode.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
388
tBonitaInstantiateProcess
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
389
tBonitaInstantiateProcess
390
tBonitaInstantiateProcess
Procedure
1. Double-click tBonitaDeploy to open its Basic settings view.
2. Select Bonita version 5.3.1 from the Bonita version list. The version you select should be in sync
with the version number of the Bonita Runtime engine you are using.
3. In the Bonita Runtime Configuration area, browse to the Bonita Runtime variable files. In the
Bonita Runtime Environment file field, browse to the bonita-environnement.xml file; in the Bonita
Runtime Jaas File field, browse to the jaas-standard.cfg file; in the Bonita Runtime Logging File
field, browse to the logging.properties file.
For users based on Bonita version 5.2.3, only the Bonita Runtime Jaas File field and the Bonita
Runtime Logging File field need to be filled.
For users based on Bonita version 5.6.1, in the Bonita Runtime Home field, browse to the Bonita
Runtime environment directory.
391
tBonitaInstantiateProcess
4. In the Business Archive field, browse to the Bonita .bar file that is the process exported from your
Bonita system and will be deployed into the Bonita Runtime engine.
5. In the Username and the Password fields, type in your authentication information to connect to
your Bonita.
2. Click the three-dot button next to Edit schema to open the schema editor.
392
tBonitaInstantiateProcess
3. Click the plus button to add one row and rename it as Name.
This name is identical with the parameter set in Bonita to execute the same process. This way,
Bonita can recognize this column as valid parameter and read its value to instantiate this process.
4. Click OK.
5. In the Mode area of the Basic settings view, select the Use inline table option and click the plus
button to add one row in the table.
6. In the inline table, click the added row and type in the person's name from your personnel
between the quotation marks: ychen, whose request will be treated by this deployed process.
2. Select Bonita version 5.3.1 from the Bonita version list. The version you select should be in sync
with the version number of the Bonita Runtime engine you are using.
3. In the Bonita Runtime Configuration area, browse to the Bonita Runtime variable files. In the
Bonita Runtime Environment file field, browse to the bonita-environnement.xml file; in the Bonita
Runtime Jaas File field, browse to the jaas-standard.cfg file; in the Bonita Runtime Logging File
field, browse to the logging.properties file.
For users based on Bonita version 5.2.3, only the Bonita Runtime Jaas File field and the Bonita
Runtime Logging File field need to be filled.
393
tBonitaInstantiateProcess
For users based on Bonita version 5.6.1, in the Bonita Runtime Home field, browse to the Bonita
Runtime environment directory.
4. Select the Use Process ID check box to activate the Process Definition Id field.
5. In the Process Definition Id field, click between the quotation marks and press Ctrl+space to open
the auto-completion drop-down list containing the available global variables for this Job.
6. Double-click the variable you need use to add it between the quotation marks. In this scenario,
double-click tBonitaDeploy_1_ProcessDefinitionUUID, which retrieves the process definition ID of
the process being deployed by tBonitaDeploy.
Note:
You can as well clear the Use Process ID check box to activate the Process name and the
Process version fields and enter the corresponding information in the two fields. tBonitaInstant
iateProcess concatenates the process name and the process version you type in to construct the
process definition ID.
7. In the Username and Password fields, enter the username and password to connect to your Bonita.
394
tBonitaInstantiateProcess
Results
This process is deployed into the Bonita Runtime and an instance is created for the personnel
requests.
Outputting the process instance UUID over the Row > Main
link
This scenario deploys a Bonita process into the Bonita Runtime, starts an instance and outputs the
process instance UUID via the Row > Main link.
395
tBonitaInstantiateProcess
2. In the Bonita Runtime Jaas File field, specify the path and name of the jaas file.
In the Bonita Runtime Logging File field, specify the path and name of the logging file.
In the Business Archive field, specify the path and name of the Bonita process.
3. In the Username and Password fields, enter the user authentication credentials.
4. Double-click tBonitaInstantiateProcess to open its Basic settings view.
5. In the Bonita Runtime Jaas File field, specify the path and name of the jaas file.
In the Bonita Runtime Logging File field, specify the path and name of the logging file.
6. In the Process Name and Process Version fields, enter the process information.
7. In the Username and Password fields, enter the user authentication credentials.
8. Double-click tLogRow to open its Basic settings view.
9. In the Mode area, select Table (print values in cells of a table for better display.
396
tBonitaInstantiateProcess
397
tBoxConnection
tBoxConnection
Creates a Box connection that the other Box components can reuse.
This component creates the connection to a given Box account.
Basic settings
Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.
Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.
Access token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.
Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.
Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
398
tBoxConnection
Usage
Related scenario
For a related scenario, see Uploading and downloading files from Box on page 411.
399
tBoxCopy
tBoxCopy
Copies or moves a given folder or file from Box.
Basic settings
Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Connection/Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.
Connection/Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.
Connection/Access Token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.
Connection/Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.
Connection/Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.
File Name Enter file name with the path in Box you want to copy.
Source Directory This option appears when the Move Directory or Copy
Directory check box is selected. Enter the source directory
in Box to be moved or copied.
400
tBoxCopy
Destination Directory Enter the destination directory in Box where the specified
file or directory will be copied or moved.
Remove Source File Select this check box to remove the source file during the
copy action.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Note that the schema of this component is read-only with
four columns named destinationFilePath, destinationFil
eName, sourceDirectory, and destinationDirectory.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
401
tBoxCopy
Related scenarios
No scenario is available for the Standard version of this component yet.
402
tBoxDelete
tBoxDelete
Removes a given folder or file from Box.
This component connects to a given Box account and removes a specified file or folder.
Basic settings
Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Connection/Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.
Connection/Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.
Connection/Access Token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.
Connection/Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.
Connection/Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.
Path Enter the path on Box pointing to the folder or the file you
need to remove.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Note that the schema of this component is read-only with
one column named filepath.
403
tBoxDelete
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
404
tBoxGet
tBoxGet
Downloads a selected file from a Box account.
This component connects to a given Box account and downloads files to a specified local directory.
Basic settings
Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Connection/Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.
Connection/Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.
Connection/Access Token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.
Connection/Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.
Connection/Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.
Path Enter the path on Box pointing to the file you need to
download.
Save as file Select this check box to display the Save To field and br
owse to, or enter the local directory where you want to store
the downloaded file. The existing file, if any, is replaced.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
405
tBoxGet
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Usage rule This component can be used alone or along with other
components via the Iterate link or a trigger link such as
OnSubjobOk.
Related scenario
For a related scenario, see Uploading and downloading files from Box on page 411.
406
tBoxList
tBoxList
Lists the files stored in a specified directory in Box.
This component reads the file(s) in Box held in the directory you specify and lists the metadata and
the contents of that file or those files.
Basic settings
Connection/Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.
Connection/Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.
Connection/Access Token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.
Connection/Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.
Connection/Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.
Path Enter the path pointing to the folder you need to list the
files from, or enter the path pointing to the exact file you
need to read.
List type Select the type of data you need to list from the specified
path, Files, Folders, or Both.
Include subdirectories Select this check box to list files from any existing sub-
folders in addition to the files in the directory defined in
the Path field.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
407
tBoxList
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
408
tBoxPut
tBoxPut
Uploads files to a Box account.
This component uploads data to Box from either a local file or a given data flow.
Basic settings
Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Connection/Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.
Connection/Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.
Connection/Access Token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.
Connection/Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.
Connection/Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.
Remote Path Enter the path pointing to the file you need to write
contents in. This file will be created on the fly if it does not
exist.
Replace if Existing Select this check box to use the uploaded file to replace the
existing one.
409
tBoxPut
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Note that the schema of this component is read-only with
a single column named content and it receives data from
the content column of its input schema only. This means
that you must use a content column in the input data flow
to carry the data to be uploaded. This type of column is typ
ically provided by the tFileInputRaw component. For further
information, see tFileInputRaw on page 1085.
The Schema field is not available when you have selected
the Expose as OutputStream or the Upload local file upload
mode.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
410
tBoxPut
Usage
Before replicating this scenario, you need to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used. For more information about Box App, see
https://app.box.com/developers/services/edit/. The client key and client secret can be obtained from
the account application settings. For how to get the access token and refresh token, check the Box
documentation you can access from https://developers.box.com/.
411
tBoxPut
2. Enter the client key, client secret, access token and refresh token in double quotation marks in
the relevant fields for accessing the Box account.
3. Double-click tBoxPut to open its Component view.
4. Select the Use Existing Connection check box to reuse the connection created by tBoxConnection.
In the Remote Path field, enter the destination path where you want to upload the file.
In the Upload mode area, select Upload Local File. In the File field, enter the file path or browse to
the file you want to upload.
5. Double-click tBoxGet to open its Component view.
6. Select the Use Existing Connection check box to reuse the connection created by tBoxConnection.
In the Path field, enter the path of the file that you want to download.
Select the Save As File check box. In the Save To field, enter the file path where to save the file
on the local file system.
7. Save the Job.
412
tBoxPut
The file box.txt from Box is downloaded to the local file system.
413
tBufferInput
tBufferInput
Retrieves data bufferized via a tBufferOutput component, for example, to process it in another
subJob.
This component retrieves bufferized data in order to process it in a second subJob.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either built-in or remote in the
Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
In the case of tBufferInput, the column position is more
important than the column label as this will be taken into
account.
Built-in: You create the schema and store it locally for this
component only. Related topic: see Talend Studio User
Guide.
Global Variables
414
tBufferInput
Usage
• Drop the following components from the Palette onto the design workspace: tFileInputDelimited
and tBufferOutput.
• Select the tFileInputDelimited and on the Basic Settings tab of the Component view, set the
access parameters to the input file.
• In the File Name field, browse to the delimited file holding the data to be bufferized.
• Define the Row and Field separators, as well as the Header.
415
tBufferInput
• Click [...] next to the Edit schema field to describe the structure of the file.
Note:
Generally speaking, the schema is propagated from the input component and automatically fed into
the tBufferOutput schema. But you can also set part of the schema to be bufferized if you want to.
• Drop the tBufferInput and tLogRow components from the Palette onto the design workspace
below the subJob you just created.
• Connect tFileInputDelimited and tBufferInput via a Trigger > OnSubjobOk link and connect
tBufferInput and tLogRow via a Row > Main link.
• Double-click tBufferInput to set its Basic settings in the Component view.
• In the Basic settings view, click [...] next to the Edit Schema field to describe the structure of the
file.
• Use the schema defined for the tFileInputDelimited component and click OK.
• The schema of the tBufferInput component is automatically propagated to the tLogRow.
Otherwise, double-click tLogRow to display the Component view and click Sync column.
• Save your Job and press F6 to execute it.
The standard console returns the data retrieved from the buffer memory.
416
tBufferOutput
tBufferOutput
Collects data in a buffer in order to access it later via webservice for example.
tBufferOutput has been designed to be exported as Webservice in order to access data on the web
application server directly. For more information, see Talend Studio User Guide.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either built-in or remote in the
Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
In the case of the tBufferOutput, the column position is
more important than the column label as this will be taken
into account.
Global Variables
417
tBufferOutput
Usage
Buffering data
This scenario describes an intentionally basic Job that bufferizes data in a child job while a parent Job
simply displays the bufferized data onto the standard output console. For an example of how to use
tBufferOutput to access output data directly on the Web application server, see Buffering output data
on the webapp server on page 421.
• Create two Jobs: a first Job (BufferFatherJob) runs the second Job and displays its content onto the
Run console. The second Job (BufferChildJob) stores the defined data into a buffer memory.
• On the first Job, drop the following components: tRunJob and tLogRow from the Palette to the
design workspace.
• On the second Job, drop the following components: tFileInputDelimited and tBufferOutput the
same way.
Let's set the parameters of the second Job first:
• Select the tFileInputDelimited and on the Basic Settings tab of the Component view, set the
access parameters to the input file.
418
tBufferOutput
• In File Name, browse to the delimited file whose data are to be bufferized.
• Define the Row and Field separators, as well as the Header.
• Generally the schema is propagated from the input component and automatically fed into the
tBufferOutput schema. But you could also set part of the schema to be bufferized if you want to.
• Now on the other Job (BufferFatherJob) Design, define the parameters of the tRunJob component.
• Edit the Schema if relevant and select the column to be displayed. The schema can be identical to
the bufferized schema or different.
• You could also define context parameters to be used for this particular execution. To keep it
simple, the default context with no particular setting is used for this use case.
Press F6 to execute the parent Job. The tRunJob looks after executing the child Job and returns the
data onto the standard console:
419
tBufferOutput
420
tBufferOutput
If you cannot find the Contexts view, go to Window > Show view > Talend, and select Contexts.
For more information about how to define context variables, see Talend Studio User Guide.
You can search for further information about how to define context variables on Talend Help
Center (https://help.talend.com).
2. Double-click the tJava component to open its Component view, and in the Code area, enter the
code according to your needs.
In this example, enter System.out.println("######################
#######"+context.xmlInput);.
3. Double-click the tFixedFlowInput component to open its Component view.
4. Click the [...] button next to Edit schema to open the dialog box and define the schema for the
data to be used by the source system.
In this example, add one new column col0 of the type String.
5. After the schema is defined, click Yes in the Propagate dialog box to propagate the schema
changes to the following component tBufferOutput.
6. In the Number of rows field, enter 1.
7. In the Mode area, select Use Single Table and enter "Paris" in the Value column that
corresponds to the column col0 you have defined.
In this example, the value of the col0 provides the agent region information to be retrieved by
MDM.
8. Double-click the tBufferOutput component to open its Component view, and then make sure its
schema is synchronized with the previous component tFixedFlowInput.
9. Run the Job and make sure the execution succeeds.
Creating a Job
Procedure
1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput and
tBufferOutput.
2. Connect tFixedFlowInput to tBufferOutput using a Row Main link.
421
tBufferOutput
Procedure
1. Select the Contexts tab view of your Job, and click the [+] button at the bottom of the view to add
two variables, respectively nb_lines of type Integer and lastname of type String.
2. In the Value field for the variables, set the last name to be displayed and the number of lines to
be generated, respectively Ford and 3 in this example.
422
tBufferOutput
4. Click OK to close the dialog box and accept propagating the changes when prompted by the
system. The three defined columns display in the Values panel of the Basic settings view of
tFixedFlowInput.
5. Click in the Value cell of each of the first two defined columns and press Ctrl+Space to access the
global variable list.
6. From the global variable list, select Talend Date.getCurrentDate() and talendDatagenerator.getFirst
Name, for the now and firstname columns respectively.
7. Click in the Value cell of lastname column and press Ctrl+Space to access the global variable list.
8. From the global variable list, select context.lastname, the context variable you created for the last
name column.
Procedure
1. In the Repository tree view, right-click on the above created Job and select Build Job. The Build
Job dialog box appears.
423
tBufferOutput
2. Click the Browse... button to select a directory to archive your Job in.
3. In the Build type panel, select the build type you want to use in the Tomcat webapp directory
(WAR in this example) and click Finish. The Build Job dialog box disappears.
4. Copy the War folder and paste it in a Tomcat webapp directory.
424
tBufferOutput
The Job uses the default values of the context variables: nb_lines and lastname, that is it generates
three lines with the current date, first name and Ford as a last name.
You can modify the values of the context variables directly from your browser. To call the Job from
your browser and modify the values of the two context variables, type the following URL:
http://localhost:8080//export_job/services/export_job3?method=runJob&arg1=--context_param%20lastna
me=MASSY&arg2=--context_param%20nb_lines=2.
%20 stands for a blank space in the URL language. In the first argument "arg1", you set the value
of the context variable to display "MASSY" as last name. In the second argument "arg2", you set the
value of the context variable to "2" to generate only two lines.
Click Enter to execute your Job from your browser.
425
tBufferOutput
• Set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema to
describe the data structure you want to call from the exported Job. In this scenario, the schema is
made of three columns, now, firstname, and lastname.
426
tBufferOutput
• Click the plus button to add the three parameter lines and define your variables. Click OK to close
the dialog box.
• In the WSDL field of the Basic settings view of tWebServiceInput, enter the URL http://localho
st:8080/export_job/services/export_job3?WSDL where "export_job" is the name od the webapp
directory where the Job to call is stored and "export_job3" is the name of the Job itself.
427
tBufferOutput
The system generates three columns with the current date, first name, and last name and displays
them onto the log console in a tabular mode.
428
tCassandraBulkExec
tCassandraBulkExec
Improves performance during Insert operations to a Cassandra column family.
The tCassandraOutputBulk and tCassandraBulkExec components are generally used together as parts
of a two step process. In the first step, an SSTable is generated. In the second step, this SSTable
is written into Cassandra. These two steps are fused together in thetCassandraOutputBulkExec
component, detailed in a separate section. The advantage of using two separate components is that
the data can be transformed before it is loaded into Cassandra.
tCassandraBulkExec writes data from an SSTable into Cassandra.
Basic settings
Required authentication Select this check box to provide credentials for the
Cassandra authentication.
Username Fill in this field with the username for the Cassandra
authentication.
Password Fill in this field with the password for the Cassandra
authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Use configuration file Select this check box and in the field that is displayed,
enter the path, or browse to cassandra.yaml, the main
configuration file for Cassandra.
This way, this component can import and directly use the
configuration from cassandra.yaml, which can contain many
advanced Cassandra properties, such as the properties for
SSL encryption.
When you need to run your Job in different Cassandra
environments, this feature allows your Job to easily switch
between the configurations.
429
tCassandraBulkExec
Keyspace Type in the name of the keyspace into which you want to
write the SSTable.
Column family Type in the name of the column family into which you want
to write the SSTable.
SSTable directory Specify the local directory of the SSTable to be loaded into
Cassandra. Note that the complete path to the SSTable will
be the local directory appended by the specified keyspace
name and column family name.
For example, if you set the local directory to /home/talend/
sstable, and specify testk as the keyspace name and testc as
the column family name, the complete path to the SSTable
will be /home/talend/sstable/testk/testc/.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
430
tCassandraClose
tCassandraClose
Disconnects a connection to a Cassandra server so as to release occupied resources.
Basic settings
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related Scenario
For a scenario in which tCassandraClose is used, see Handling data with Cassandra on page 439.
431
tCassandraConnection
tCassandraConnection
Enables the reuse of the connection it creates to a Cassandra server.
tCassandraConnection opens a connection to a Cassandra server.
Basic settings
Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Required authentication Select this check box to enable the database authentication.
Username Fill in this field with the username for the Cassandra
authentication.
Password Fill in this field with the password for the Cassandra
authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Use SSL connection Select this check box to enable the SSL or TLS encrypted
connection.
Then you need to use the tSetKeystore component in the
same Job to specify the encryption information.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
432
tCassandraConnection
Global Variables
Usage
Related scenario
For a scenario in which tCassandraConnection is used, see Handling data with Cassandra on page
439.
433
tCassandraInput
tCassandraInput
Extracts the desired data from a standard or super column family of a Cassandra keyspace so as to
apply changes to the data.
tCassandraInput allows you to read data from a Cassandra keyspace and send data in the Talend flow.
BigInt Long
Blob Byte[]
Boolean Boolean
Counter Long
Inet Object
List List
Map Object
Set Object
Timestamp Date
UUID String
TimeUUID String
VarInt Object
Boolean Boolean
Float Float
Double Double
434
tCassandraInput
Decimal BigDecimal
BytesType byte[]
AsciiType String
UTF8Type String
IntegerType Object
Int32Type Integer
LongType Long
UUIDType String
TimeUUIDType String
DateType Date
BooleanType Boolean
FloatType Float
DoubleType Double
DecimalType BigDecimal
Basic settings
435
tCassandraInput
Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
API type This drop-down list is displayed only when you have
selected the 2.0 version (deprecated) of Cassandra from the
DB version list. From this API type list, you can either select
Datastax to use CQL 3 (Cassandra Query Language) with
Cassandra, or select Hector (deprecated) to use CQL 2.
Note that the Hector API is deprecated along with the
support for Cassandra V2.0.
Along with the evolution of the CQL commands, the
parameters to be set in the Basic settings view varies.
Required authentication Select this check box to provide credentials for the
Cassandra authentication.
This check box appears only if you do not select the Use
existing connection check box.
Username Fill in this field with the username for the Cassandra
authentication.
Password Fill in this field with the password for the Cassandra
authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Keyspace Type in the name of the keyspace from which you want to
read data.
Column family Type in the name of the column family from which you want
to read data.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
436
tCassandraInput
Query Enter the query statements to be used to read data from the
Cassandra database.
By default, the query is not case-sensitive. This means that
at runtime, the column names you put in the query are
always taken in lower case. If you need to make the query
case-sensitive, put the column names in double quotation
marks.
The [...] button next to this field allows you to generate the
sample code that shows what the pre-defined variables are
for the data to be read and how these variables can be used.
This feature is available only for the Datastax API of
Cassandra 2.0 (deprecated) or a later version.
Include key in output columns Select this check box to include the key of the column
family in output columns.
• Key column: select the key column from the list.
Row key type Select the appropriate Talend data type for the row key
from the list.
Row key Cassandra type Select the corresponding Cassandra type for the row key
from the list.
Warning:
The value of the Default option varies with the selected
row key type. For example, if you select String from the
Row key type list, the value of the Default option will be
UTF8.
Include super key output columns Select this check box to include the super key of the column
family in output columns.
• Super key column: select the desired super key column
from the list.
This check box appears only if you select Super from the
Column family type drop-down list.
Super column type Select the type of the super column from the list.
Super column Cassandra type Select the corresponding Cassandra type for the super
column from the list.
For more information about the mapping table between
Cassandra type and Talend data type, see Mapping tables
between Cassandra type and Talend data type on page
434.
437
tCassandraInput
Specify row keys Select this check box to specify the row keys of the column
family directly.
Row Keys Type in the specific row keys of the column family in the
correct format depending on the row key type.
This field appears only if you select the Specify row keys
check box.
Key start Type in the start row key of the correct data type.
Key end Type in the end row key of the correct data type.
Key limit Type in the number of rows to be read between the start
row key and the end row key.
Specify columns Select this check box to specify the column names of the
column family directly.
Columns range start Type in the start column name of the correct data type.
Columns range end Type in the end column name of the correct data type.
Columns range limit Type in the number of columns to be read between the start
column and the end column.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
438
tCassandraInput
Usage
439
tCassandraInput
Procedure
1. Double-click the tCassandraConnection component to open its Basic settings view in
theComponent tab.
2. Select the Cassandra version that you are using from the DB Version list. In this example, it is
Cassandra 1.1.2.
3. In the Server field, type in the hostname or IP address of the Cassandra server. In this example, it
is localhost.
4. In the Port field, type in the listening port number of the Cassandra server.
5. If required, type in the authentication information for the Cassandra connection: Username and
Password.
Procedure
1. Double-click the tFileInputDelimited component to open its Component view.
440
tCassandraInput
2. Click the [...] button next to the File Name/Stream field to browse to the file that you want to
read data from. In this scenario, the directory is D:/Input/Employees.csv. The CSV file contains four
columns: id, age, name and ManagerID.id;age;name;ManagerID 1;20;Alex;1 2;40;Pet
er;1 3;25;Mark;1 4;26;Michael;1 5;30;Christophe;2 6;26;Stephane;3 7;37
;Cedric;3 8;52;Bill;4 9;43;Jack;2 10;28;Andrews;4
3. In the Header field, enter 1 so that the first row in the CSV file will be skipped.
4. Click Edit schema to define the data to pass on to the tCassandraOutput component.
Procedure
1. Double-click the tCassandraOutput component to open its Basic settings view in the Component
tab.
441
tCassandraInput
2. Type in required information for the connection or use the existing connection you have
configured before. In this scenario, the Use existing connection check box is selected.
3. In the Keyspace configuration area, type in the name of the keyspace: Employee in this example,
and select Drop keyspace if exists and create from the Action on keyspace list.
4. In the Column family configuration area, type in the name of the column family: Employee_Info in
this example, and select Drop column family if exists and create from the Action on column family
list.
The Define column family structure check box appears. In this example, clear this check box.
5. In the Action on data list, select the action you want to carry on, Upsert in this example.
6. Click Sync columns to retrieve the schema from the preceding component.
7. Select the key column of the column family from the Key column list. In this example, it is id.
If needed, select the Include key in columns check box.
Procedure
1. Double-click the tCassandraInput component to open its Component view.
2. Type in required information for the connection or use the existing connection you have
configured before. In this scenario, the Use existing connection check box is selected.
3. In the Keyspace configuration area, type in the name of the keyspace: Employee in this example.
442
tCassandraInput
4. In the Column family configuration area, type in the name of the column family: Employee_Info in
this example.
5. Select Edit schema to define the data structure to be read from the Cassandra keyspace. In this
example, three columns id, name and age are defined.
6. If needed, select the Include key in output columns check box, and then select the key column of
the column family you want to include from the Key column list.
7. From the Row key type list, select Integer because id is of integer type in this example.
Keep the Default option for the row key Cassandra type because its value will become the
corresponding Cassandra type Int32 automatically.
8. In the Query configuration area, select the Specify row keys check box and specify the row keys
directly. In this example, three rows will be read. Next, select the Specify columns check box and
specify the column names of the column family directly. This scenario will read three columns
from the keyspace: id, name and age.
9. If needed, the Key start and the Key end fields allow you to define the range of rows, and the
Key limit field allows you to specify the number of rows within the range of rows to be read.
Similarly, the Columns range start and the Columns range end fields allow you to define the range
of columns of the column family, and the Columns range limit field allows you to specify the
number of columns within the range of columns to be read.
Procedure
1. Double-click the tLogRow component to open its Component view.
2. In the Mode area, select Table (print values in cells of a table).
Procedure
1. Double-click the tCassandraClose component to open its Component view.
443
tCassandraInput
444
tCassandraOutput
tCassandraOutput
Writes data into or deletes data from a column family of a Cassandra keyspace.
tCassandraOutput receives data from the preceding component, and writes data into Cassandra.
Basic settings
Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
API type This drop-down list is displayed only when you have
selected the 2.0 version (deprecated) of Cassandra from the
DB version list. From this API type list, you can either select
Datastax to use CQL 3 (Cassandra Query Language) with
Cassandra, or select Hector (deprecated) to use CQL 2.
Note that the Hector API is deprecated along with the
support for Cassandra V2.0.
Along with the evolution of the CQL commands, the
parameters to be set in the Basic settings view varies.
Required authentication Select this check box to provide credentials for the
Cassandra authentication.
This check box appears only if you do not select the Use
existing connection check box.
Username Fill in this field with the username for the Cassandra
authentication.
Password Fill in this field with the password for the Cassandra
authentication.
445
tCassandraOutput
Use SSL Select this check box to enable the SSL or TLS encrypted
connection.
Then you need to use the tSetKeystore component in the
same Job to specify the encryption information.
Keyspace Type in the name of the keyspace into which you want to
write data.
Action on keyspace Select the operation you want to perform on the keyspace
to be used:
• None: No operation is carried out.
• Drop and create keyspace: The keyspace is removed
and created again.
• Create keyspace: The keyspace does not exist and gets
created.
• Create keyspace if not exists: A keyspace gets created if
it does not exist.
• Drop keyspace if exists and create: The keyspace is
removed if it already exists and created again.
Column family Type in the name of the keyspace into which you want to
write data.
Action on column family Select the operation you want to perform on the column
family to be used:
• None: no operation is carried out.
• Drop and create column family: the column family is
removed and created again.
• Create column family: the column family does not exist
and gets created.
• Create column family if not exists: a column family gets
created if it does not exist.
• Drop column family if exists and create: the column
family is removed if it already exists and created again.
Action on data On the data of the table defined, you can perform:
• Upsert: insert the columns if they do not exist or
update the existing columns.
• Insert: insert the columns if they do not exist. This
action also updates the existing ones.
• Update: update the existing columns or add the
columns that do not exist. This action does not support
the Counter Cassandra data type.
• Delete: remove columns corresponding to the input
flow.
Note that the action list varies depending on the Hector
(deprecated) or Datastax API you are using. When the API is
Datastax, more actions become available.
For more advanced actions, use the Advanced settings view.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
446
tCassandraOutput
Built-In: You create and store the schema locally for this
component only.
Sync columns Click this button to retrieve schema from the previous
component connected in the Job.
Die on error Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Row key column Select the row key column from the list.
Include row key in columns Select this check box to include row key in columns.
Include super columns in standard columns Select this check box to include the super columns in
standard columns.
447
tCassandraOutput
Delete super columns Select this check box to delete super columns.
This check box appears only if you select the Delete Row
check box.
Advanced settings
Use unlogged batch Select this check box to handle data in batch but with
Cassandra's UNLOGGED approach. This feature is available
to the following three actions: Insert, Update and Delete.
Then you need to configure how the batch mode works:
• Batch size: enter the number of lines in each batch to
be processed.
• Group batch method: select how to group rows into
batches:
1. Partition: rows sharing the same partition keys are
grouped.
2. Replica: rows to be written to the same replica are
grouped.
3. None: rows are grouped randomly. This option is
suitable for a single node Cassandra.
• Cache batch group: select this check box to load rows
into memory before grouping them. This way, grouping
is not impacted by the order of the rows.
If you leave this check box clear, only successive rows
that meet the same criteria are grouped.
• Async execute: select this check box if you want
tCassandraOutput to send batches in parallel. If you
leave it clear, tCassandraOutput waits for the result of
a batch before sending another batch to Cassandra.
• Maximum number of batches executed in parallel: once
you have selected Async execute, enter the number of
batches to be sent in parallel to Cassandra.
This number should not be a negative number or 0 and
it is also recommended not to use too large a value.
The ideal situation to use batches with Cassandra is when
a small number of tables must synchronize the data to be
inserted or updated.
In this UNLOGGED approach, the Job does not write batches
into Cassandra's batchlog system and thus avoids the
performance issue incurred by this writing. For further
information about Cassandra BATCH statement and
UNLOGGED approach, see Batches.
Insert if not exists Select this check box to insert rows. This row insertion takes
place only when they do not exist in the target table.
This feature is available to the Insert action only.
Delete if exists Select this check box to remove from the target table only
the rows that have the same records in the incoming flow.
448
tCassandraOutput
Use TTL Select this check box to write the TTL data in the target
table. In the column list that is displayed, you need to select
the column to be used as the TTL column. The DB type of
this column must be Int.
This feature is available to the Insert action and the Update
action only.
Use Timestamp Select this check box to write the timestamp data in the
target table. In the column list that is displayed, you need to
select the column to be used to store the timestamp data.
The DB type of this column must be BigInt.
This feature is available to the following actions: Insert,
Update and Delete.
IF condition Add the condition to be met for the Update or the Delete
action to take place. This condition allows you to be more
precise about the columns to be updated or deleted.
Special assignment operation Complete this table to construct advanced SET commands
of Cassandra to make the Update action more specific.
For example, add a record to the beginning or a particular
position of a given column.
In the Update column column of this table, you need
to select the column to be updated and then select the
operations to be used from the Operation column. The
following operations are available:
• Append: it adds incoming records to the end of the
column to be updated. The Cassandra data types it can
handle are Counter, List, Set and Map.
• Prepend: it adds incoming records to the beginning of
the column to be updated. The only Cassandra data
type it can handle is List.
• Remove: it removes records from the target table
when the same records exist in the incoming flow. The
Cassandra data types it can handle are Counter, List,
Set and Map.
• Assign based on position/key: it adds records to a
particular position of the column to be updated. The
Cassandra data types it can handle are List and Map.
Once you select this operation, the Map key/list
position column becomes editable. From this column,
you need to select the column to be used as reference
to locate the position to be updated.
For more details about these operations, see Datastax's
related documentation in http://docs.datastax.com/en/
cql/3.1/cql/cql_reference/update_r.html?scroll=reference
_ds_g4h_qzq_xj__description_unique_34.
Row key in the List type Select the column to be used to construct the WHERE clause
of Cassandra to perform the Update or the Delete action on
only selected rows. The column(s) to be used in this table
should be from the set of the Primary key columns of the
Cassandra table.
Delete collection column based on postion/key Select the column to be used as reference to locate the
particular row(s) to be removed.
449
tCassandraOutput
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related Scenario
For a scenario in which tCassandraOutput is used, see Handling data with Cassandra on page 439.
450
tCassandraOutputBulk
tCassandraOutputBulk
Prepares an SSTable of large size and processes it according to your needs before loading this
SSTable into a column family of a Cassandra keyspace.
The tCassandraOutputBulk and tCassandraBulkExec components are generally used together as parts
of a two step process. In the first step, an SSTable is generated. In the second step, this SSTable
is written into Cassandra. These two steps are fused together in thetCassandraOutputBulkExec
component, detailed in a separate section. The advantage of using two separate components is that
the data can be transformed before it is loaded into Cassandra.
tCassandraOutputBulk receives data from the preceding component, and creates an SSTable locally.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
451
tCassandraOutputBulk
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).
Table type Select the type of the data model to be used for the table
to be created. It can be CQL (actually CQL3) or non-CQL (the
legacy thrift-based API of Cassandra before CQL3).
This drop-down list is available only when the DB version
you are using is Cassandra 2.0.0 (deprecated). For the
Cassandra versions later than 2.0.0, CQL becomes the only
model used by this component and so this list is no longer
available.
Required authentication Select this check box to provide credentials for the
Cassandra authentication.
Username Fill in this field with the username for the Cassandra
authentication.
Password Fill in this field with the password for the Cassandra
authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Use configuration file Select this check box and in the field that is displayed,
enter the path, or browse to cassandra.yaml, the main
configuration file for Cassandra.
This way, this component can import and directly use the
configuration from cassandra.yaml, which can contain many
advanced Cassandra properties, such as the properties for
SSL encryption.
When you need to run your Job in different Cassandra
environments, this feature allows your Job to easily switch
between the configurations.
For further information about this cassandra.yaml file, see
Cassandra configuration.
Keyspace Type in the name of the keyspace into which you want to
write the SSTable.
Column family Type in the name of the column family into which you want
to write the SSTable.
452
tCassandraOutputBulk
Schema statement Enter the statement to define the schema of the column
family to be used or to be created on the fly.
• This statement is a Cassandra prepared statement,
which stores query results locally in the SSTable
directory you define with this component before
sending them to the server. For further information
about the prepared statements, see Prepared
statements.
• A Cassandra column family is a container for a
collection of rows of records that have a similar kind.
Its schema must contain strictly the same columns as
the component schema you have defined, that is to
say, the column names and the order of the columns in
both the schemas must be identical.
An example of this schema statement is provided in the
Schema statement field:
Insert statement Enter the statement to instruct how to write the data from
the input flow into the columns of the column family to be
used.
This statement is a Cassandra prepared statement, which
stores query results locally in the SSTable directory you
define with this component before sending them to
the server. For further information about the prepared
statements, see Prepared statements.
An example of this insert statement is provided in the Insert
statement field:
It will write data into the id, the name and the birthday
columns, respectively, of a column family called tb in the
keyspace ks. The question marks in the statement are the
bind variable markers for the three columns. For further
information about bind variables and their usage, see Bound
parameters.
This field is available only when the version of your
Cassandra database is later than 2.0.0. When it is 2.0.0
(deprecated), it is available only when you have selected
CQL from the Table type drop-down list.
453
tCassandraOutputBulk
Column name comparator Select the data type for the column names, which is used to
sort columns. This list is not available when the data model
to be used is CQL3.
For more information about the comparators, see http://
www.datastax.com/docs/1.1/ddl/column_family#about-
data-types-comparators-and-validators.
SSTable directory Specify the local directory for the SSTable. Note that the
complete path to the SSTable will be the local directory
appended by the specified keyspace name and column
family name.
For example, if you set the local directory to /home/talend/
sstable, and specify testk as the keyspace name and testc as
the column family name, the complete path to the SSTable
will be /home/talend/sstable/testk/testc/.
Buffer size Specify what size the SSTable must reach before it is
written into Cassandra.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
454
tCassandraOutputBulkExec
tCassandraOutputBulkExec
Improves performance during Insert operations to a column family of a Cassandra keyspace.
The tCassandraOutputBulk and tCassandraBulkExec components are generally used together to
output data to an SSTable and then to write the SSTable into Cassandra, in a two step process. These
two steps are fused together in the tCassandraOutputBulkExec component.
tCassandraOutputBulkExec receives data from the preceding component, creates an SSTable and then
writes the SSTable into Cassandra.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
455
tCassandraOutputBulkExec
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).
Table type Select the type of the data model to be used for the table
to be created. It can be CQL (actually CQL3) or non-CQL (the
legacy thrift-based API of Cassandra before CQL3).
This drop-down list is available only when the DB version
you are using is Cassandra 2.0.0 (deprecated). For the
Cassandra versions later than 2.0.0, CQL becomes the only
model used by this component and so this list is no longer
available.
Warning:
• Cassandra 2.0.0 (deprecated) only works with
JVM1.7.
Required authentication Select this check box to provide credentials for the
Cassandra authentication.
Username Fill in this field with the username for the Cassandra
authentication.
Password Fill in this field with the password for the Cassandra
authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Keyspace Type in the name of the keyspace into which you want to
write the SSTable.
Column family Type in the name of the column family into which you want
to write the SSTable.
Schema statement Enter the statement to define the schema of the column
family to be used or to be created on the fly.
• This statement is a Cassandra prepared statement,
which stores query results locally in the SSTable
directory you define with this component before
sending them to the server. For further information
456
tCassandraOutputBulkExec
Insert statement Enter the statement to instruct how to write the data from
the input flow into the columns of the column family to be
used.
This statement is a Cassandra prepared statement, which
stores query results locally in the SSTable directory you
define with this component before sending them to
the server. For further information about the prepared
statements, see Prepared statements.
An example of this insert statement is provided in the Insert
statement field:
It will write data into the id, the name and the birthday
columns, respectively, of a column family called tb in the
keyspace ks. The question marks in the statement are the
bind variable markers for the three columns. For further
information about bind variables and their usage, see Bound
parameters.
This field is available only when the version of your
Cassandra database is later than 2.0.0. When it is 2.0.0
(deprecated), it is available only when you have selected
CQL from the Table type drop-down list.
Column name comparator Select the data type for the column names, which is used to
sort columns.
For more information about the comparators, see http://
www.datastax.com/docs/1.1/ddl/column_family#about-
data-types-comparators-and-validators.
SSTable directory Specify the local directory for the SSTable. Note that the
complete path to the SSTable will be the local directory
457
tCassandraOutputBulkExec
Buffer size Specify what size the SSTable must reach before it is
written into Cassandra.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
458
tCassandraRow
tCassandraRow
Acts on the actual DB structure or on the data, depending on the nature of the query and the
database.
tCassandraRow is the specific component for this database query. It executes the Cassandra Query
Language (CQL) query stated in the specified database. The row suffix means the component
implements a flow in the Job design although it does not provide output.
Basic settings
Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Required Authentication Select this check box to provide credentials for the
Cassandra authentication.
This check box appears only if you do not select the Use
existing connection check box.
Username Fill in this field with the username for the Cassandra
authentication.
Password Fill in this field with the password for the Cassandra
authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
459
tCassandraRow
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Related scenario
For related topics, see
460
tCassandraRow
461
tChangeFileEncoding
tChangeFileEncoding
Transforms the character encoding of a given file and generates a new file with the transformed
character encoding.
tChangeFileEncoding changes the encoding of a given file.
Basic settings
Use Custom Input Encoding Select this check box to customize input encoding type.
When it is selected, a list of input encoding types appears,
allowing you to select an input encoding type or specify an
input encoding type by selecting CUSTOM.
Encoding From this list of character encoding types, you can select
one of the offered options or customize the character
encoding by selecting CUSTOM and specifying a character
encoding type.
Advanced settings
Create directory if does This check box is selected by default. It creates a directory to hold the output table if required.
not exist
tStatCatcher Statistics Select this check box to collect log data at the component level.
Global Variables
Global Variables EXISTS: the result of whether a specified file exists. This is a
Flow variable and it returns a boolean.
FILENAME: the name of the file processed. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
462
tChangeFileEncoding
Usage
Procedure
Procedure
1. Drop a tChangeFileEncoding component onto the design workspace.
3. Select Use Custom Input Encoding check box. Set the Encoding type to GB2312.
4. In the Input File Name field, enter the file path or browse to the input file.
5. In the Output File Name field, enter the file path or browse to the output file.
6. Select CUSTOM from the second Encoding list and enter UTF-16 in the text field.
463
tChangeFileEncoding
7.
Results
The encoding type of the file in.txt is transformed and out.txt is generated with the UTF-16 encoding
type.
464
tChronometerStart
tChronometerStart
Operates as a chronometer device that starts calculating the processing time of one or more subJobs
in the main Job, or that starts calculating the processing time of part of your subJob.
Starts measuring the time a subJob takes to be executed.
Global Variables
Global Variables STARTTIME: the start time to calculate the processing time
of subjob(s). This is a Flow variable and it returns a long.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
Related scenario
For related scenario, see Measuring the processing time of a subJob and part of a subJob on page
467.
465
tChronometerStop
tChronometerStop
Operates as a chronometer device that stops calculating the processing time of one or more subJobs
in the main Job, or that stops calculating the processing time of part of your subJob. tChronometerSt
op displays the total execution time.
Measures the time a subJob takes to be executed.
Basic settings
Display component name When selected, it displays the name of the component on
the console.
Display human readable duration When selected, it displays subJob execution information in
readable time unites.
Global Variables
Global Variables STOPTIME: the stop time to calculate the processing time of
subjob(s). This is a Flow variable and it returns a long.
DURATION: the processing time of subjob(s). This is a Flow
variable and it returns a long.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
466
tChronometerStop
Usage
Note: When connecting tMap to tFileOutputDelimited, you will be prompted to name the output
table. The name used in this example is "new_order".
467
tChronometerStop
• Click Edit schema to define the schema of the tRowGenerator. For this Job, the schema is
composed of two columns: First_Name and Last_Name, so click twice the [+] button to add two
columns and rename them.
• Click the RowGenerator Editor three-dot button to open the editor and define the data to be
generated.
• In the RowGenerator Editor, specify the number of rows to be generated in the Number of Rows
for RowGenerator field and click OK. The RowGenerator Editor closes.
• You will be prompted to propagate changes. Click Yes in the popup message.
• Double-click on the tMap component to open the Map editor. The Map editor opens displaying the
input metadata of the tRowGenerator component.
• In the Schema editor panel of the Map editor, click the plus button of the output table to add two
rows and define them.
468
tChronometerStop
• In the Map editor, drag the First_Name row from the input table to the Last_Name row in the
output table and drag the Last_Name row from the input table to the First_Name row in the output
table.
• Click Apply to save changes.
• You will be prompted to propagate changes. Click Yes in the popup message.
• Click OK to close the editor.
• Select tFileOutputDelimited and click the Component tab to display the component view.
• In the Basic settings view, set tFileOutputDelimited properties as needed.
• Select tChronometerStop and click the Component tab to display the component view.
• In the Since options panel of the Basic settings view, select Since the beginning option to measure
the duration of the subJob as a whole.
469
tChronometerStop
• Select/clear the other check boxes as needed. In this scenario, we want to display the subJob
duration on the console preceded by the component name.
• If needed, enter a text in the Caption field.
• Save your Job and press F6 to execute it.
Note: You can measure the duration of the subJob the same way by placing tChronometerStop
below tRowGenerator, and connecting the latter to tChronometerStop using an OnSubjobOk link.
470
tCloudStart
tCloudStart
Starts instances on Amazon EC2 (Amazon Elastic Compute Cloud).
This component accesses the cloud provider to be used (Amazon EC2) and launches instances, which
are virtual servers in that cloud. If an instance to be launched does not exist, tCloudStart creates it.
Basic settings
Access key and Secret key Enter or paste the access key and the secret key required by
Amazon to authenticate your requests to its web services.
These access credentials are generated from the Security
Credential tab of your Amazon account page.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Region and Zone Enter the region and the zone to be used as the geographic
location where you want to launch an instance.
The syntax used to express a location is predefined by
Amazon, for example, us-east-1 representing the US East
(Northern Virginia) region and us-east-1a representing
one of the Availability Zones within that region. For fu
rther information about available regions for Amazon, see
Amazon's documentation about regions and endpoints and
as well Amazon's FAQ about region and Availability Zone.
Instance name Enter the name of the instance to be launched. For example,
you can enter Talend.
Note that the upper letter will be converted to lower letter.
Instance type Select the type of the instance(s) to be launched. Each type
is predefined by Amazon and defines the performance of
every instance you want to launch.
471
tCloudStart
Proceed with a Key pair Select this check box to use Amazon Key Pair for your login
to Amazon EC2. Once selecting it, a drop-down list appears
to allow you to select :
• Use an existing Key Pair to enter the name of that Key
Pair in the field next to the drop-down list. If required,
Amazon will prompt you at runtime to find and use
that Key Pair.
• Create a Key Pair to enter the name of the new Key
Pair in the field next to the drop-down list and define
the location where you want to store this Key Pair in
the Advanced settings tab view.
Security group Add rows to this table and enter the names of the security
groups to which you need to assign the instance(s) to be
launched. The security groups set in this table must exist on
your Amazon EC2.
A security group applies specific rules on inbound traffic
to instances assigned to the group, such as the ports to be
used. For further information about security groups, see
Amazon's documentation about security groups.
Note that an instance can be assigned to a group by setting
its security group name or key pair name to jclouds#<
$group_name>, where <$group_name> identifies the
group to which the instance belongs. In this way, you can
change the status of all instances or running instances
in one group at the same time using the tCloudStop
component.
Advanced settings
Key Pair folder Browse to, or enter the path to the folder you use to store
the created Key Pair file.
This field appears when you select Creating a Key Pair in
the Basic settings tab view.
Volumes Add rows and define the volume(s) to be created for the
instances to be launched in addition to the volumes
predefined and allocated by the given Amazon EC2.
The parameters to be set in this table are the same
parameters used by Amazon for describing a volume.
If you need to remove automatically an additional volume
after terminating its related instance, select the check box
in the Delete on termination column.
tStatCatcher Statistics Select this check box to collect the log data at the
component level.
Global Variables
472
tCloudStart
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
473
tCloudStop
tCloudStop
Changes the status of a launched instance on Amazon EC2 (Amazon Elastic Compute Cloud).
This component accesses the cloud provider to be used (Amazon EC2) and suspends, resumes or
terminates given instance(s).
Basic settings
Access key and Secret key Enter or paste the access key and the secret key required by
Amazon to authenticate your requests to its web services.
These access credentials are generated from the Security
Credential view of your Amazon account page.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
474
tCloudStop
Group name Enter the name of the group in which you want to change
the status of given instances whose security group name or
key pair name is set to jclouds#<$group_name> in the
tCloudStart component, where <$group_name> identifies
the group to which the instance belongs.
This field is available only when Instances in a specific
group or Running instances in a specific group is selected
from the Predicate list.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at the
component level.
Global Variables
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
475
tCombinedSQLAggregate
tCombinedSQLAggregate
Provides a set of matrix based on values or calculations.
tCombinedSQLAggregate collects data values from one or more columns of a table for statistical
purposes. This component has real-time capabilities since it runs the data transformation on the
DBMS itself.
Basic settings
Schema and Edit schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either built-in or remote in the
Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component connected in the Job.
Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.
476
tCombinedSQLAggregate
Operations Select the type of operation along with the value to use for
the calculation and the output field.
Input column: Select the input column from which you want
to collect the values to be aggregated.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
477
tCombinedSQLAggregate
478
tCombinedSQLAggregate
Procedure
1. Launch MySQL Workbench and start a local connection on port 3306.
2. Create a new schema and name it test.
3. Back in the design workspace, select tMysqlConnection and click the Component tab to define its
basic settings.
479
tCombinedSQLAggregate
4. In the Basic settings view, set the database connection details manually or select Repository from
the Property Type list and select your DB connection if it has already been defined and stored in
the Metadata area of the Repository tree view.
For more information on centralizing DB connection details in the Repository, see Talend Studio
User Guide.
Procedure
1. In the design workspace, select tFixedFlowInput and click the Component tab to define its basic
settings
2. In the Basic settings view, in the Number of rows field, enter 500.
3. In this scenario, the source database table has seven columns: id, first_name, last_name, city, state,
date_of_birth, and salary
Click the [...] button next to Edit schema to define the following data structure.
480
tCombinedSQLAggregate
4. Click the floppy disk icon to save the schema as a generic schema for later reuse.
5. In the Select folder window, select default and click OK.
6. Choose a name for your generic schema and click Finish.
7. Click OK.
8. The first column of the Values table automatically reflects the data structure you entered
previously.
9. In the Values table, enter a value for each column.
10. In the design workspace, select tMysqlOutput and click the Component tab to define its basic set
tings.
The output schema will automatically be the same as the previous component, in this case
tFixedFlowInput.
Procedure
1. In the design workspace, select tCreateTable and click the Component tab to define its basic set
tings.
481
tCombinedSQLAggregate
2. Click the [...] button next to Edit schema to define the following data structure.
The schema you enter at this step must reflect the the differents aggregation operations you
want to perform on the input data.
Procedure
1. In the design workspace, select tCombinedSQLInput and click the Component tab to access the
configuration panel.
2. Enter the source table name, in this case employees in the Table field.
3. In the Schema field, select Repository from the list and click the [...] button right to the empty
field to load the schema you saved.while configuring the settings for tFixedFlowInput.
4. In the Repository Content window, expand Generic schemas and select your schema.
482
tCombinedSQLAggregate
Procedure
1. In the design workspace, select tCombinedSQLFilter and click the Component tab to access the
configuration panel.
2. Click the Sync columns button to retrieve the schema from the previous component, or configure
the schema manually by selecting Built-in from the Schema list and clicking the [...] button next
to Edit schema.
When you define the data structure for tCombinedSQLFilter, column names automatically appear
in the Input column list in the Conditions table.
In this scenario, the tCombinedSQLFilter component instantiates four columns: id, state,
date_of_birth, and salary.
3. In the Conditions table, set input parameters, operators and expected values in order to only
extract the records that fulfill these criteria.
Click two times on the [+] button under the Conditions table, and in Input column, select state and
date_of_birth from the drop-down list.
In this scenario, the tCombinedSQLFilter component filters the state and date_of_birth columns in
the source table to extract the employees who were born after Oct. 19, 1960 and who live in the
states Utah, Ohio and Iowa.
4. For the column state, select IN as operator from the drop-down list, and enter ('Utah','Ohia','Iowa')
as value.
5. For the column date_of_birth, select > as operator from the drop-down list, and enter ('1960-10-19')
as value.
6. Select And in the Logical operator between conditions list to apply the two conditions at the same
time. You can also customize the conditions by selecting the Use custom SQL box and editing the
conditions in the code box.
7. In the design workspace, select tCombinedSQLAggregate and click the Component tab to access
the configuration panel.
483
tCombinedSQLAggregate
8. Click on the [...] button.next to Edit schema to enter the following configuration:
The tCombinedSQLAggregate component instantiates four columns: id, state, date_of_birth, and
salary, coming from the previous component.
9. The Group by table helps you define the data sets to be processed based on a defined column. In
this example: State.
In the Group by table, click the [+] button to add one line.
10. In the Output column drop-down list, select State. This column will be used to hold the data filt
ered on State.
11. The Operations table helps you define the type of aggregation operations to be performed.
The Output column list available depends on the schema you want to output (through the
484
tCombinedSQLAggregate
Procedure
1. In the design workspace, select tCombinedSQLOutput and click the Component tab to access the
configuration panel.
Procedure
1. In the design workspace, select tCombinedSQLCommit and click the Component tab to access the
configuration panel.
2. On the Component list, select the relevant database connection component if more than one
connection is used.
3. Clear the check box Close Connection.
485
tCombinedSQLAggregate
Procedure
1. In the design workspace, select tMysqlIntput and click the Component tab to define its basic set
tings.
2. Select the check box Use an existing connection ans choose tMysqlConnection_1 from the list.
3. Click on the [...] button.next to Edit schema to enter the following schema:
4. In the field Table Name, enter empl_by_state and in the Query field, enter select * from emp
l_by_state.
5. In the design workspace, select tLogRow and click the Component tab to define its basic settings.
486
tCombinedSQLAggregate
6. Click the Sync columns button to retrieve the schema from the previous component and select the
Table (print values in cells of a table) mode.
Results
Rows are inserted into a seven-column table empl_by_state in the database. The table shows, per
defined state, the number of employees, the average salary, the lowest and highest salaries as well as
the oldest and youngest employees.
487
tCombinedSQLFilter
tCombinedSQLFilter
Filters data by reorganizing, deleting or adding columns based on the source table and to filter the
given data source using the filter conditions.
tCombinedSQLFilter allows you to alter the schema of a source table through column name mapping
and to define a row filter on that table. Therefore, it can be used to filter columns and rows at the
same time. This component has real-time capabilities since it runs the data filtering on the DBMS
itself.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component connected in the Job.
Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.
Logical operator between conditions Select the logical operator between the filter conditions
defined in the Conditions panel.
Two operators are available: Or, And.
488
tCombinedSQLFilter
Conditions Select the type of WHERE clause along with the values and
the columns to use for row filtering.
Operator: Select the type of the WHERE clause: =, < >, >, <,
>=, <=, LIKE, IN, NOT IN, and EXIST IN.
Use custom SQL Customize a WHERE clause by selecting this check box and
editing in the SQL Condition field.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Related Scenario
For a related scenario, see Filtering and aggregating table columns directly on the DBMS on page 478.
489
tCombinedSQLInput
tCombinedSQLInput
Extracts fields from a database table based on its schema definition.
Then it passes on the field list to the next component via a Combine row link. The schema of
tCombinedSQLInput can be different from that of the source database table but must correspond to it
in terms of the column order.
tCombinedSQLInput extracts fields from a database table based on its schema. This component also
has column filtering capabilities since its schema can be different from that of the database table.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.
Add additional columns This option allows you to call SQL functions to perform
actions on columns, provided that these are not insert,
490
tCombinedSQLInput
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Related scenario
For a related scenario, see Filtering and aggregating table columns directly on the DBMS on page 478.
491
tCombinedSQLOutput
tCombinedSQLOutput
Inserts records from the incoming flow to an existing database table.
Basic settings
Schema Name of the target database table's schema. This field has
to be filled if the database is Oracle.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component connected in the Job.
Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.
Action on data Select INSERT from the list to insert the records from the
incoming flow to the target database table.
492
tCombinedSQLOutput
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Related scenario
For a related scenario, see Filtering and aggregating table columns directly on the DBMS on page 478.
493
tContextDump
tContextDump
Copies the context setup of the current Job to a flat file, a database table, etc., which can then be used
by tContextLoad.
Together with tContextLoad, this component makes it simple to apply the context setup of one Job to
another.
tContextDump dumps the context setup of the current Job to the subsequent component.
Basic settings
Schema and Edit schema A schema is a row description, it defines the fields that will
be processed and passed on to the next component.
Note:
The schema of tContextDump is read only and made
up of two columns, Key and Value, corresponding to the
parameter name and the parameter value of the Job
context.
Hide Password Select this check box to hide the value of context
parameter password, namely displaying the value of context
parameters whose Type is Password as *.
Global Variables
494
tContextDump
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
495
tContextLoad
tContextLoad
Loads a context from a flow.
This component performs also two controls. It warns when the parameters defined in the incoming
flow are not defined in the context, and the other way around, it also warns when a context value is
not initialized in the incoming flow.But note that this does not block the processing.
tContextLoad modifies dynamically the values of the active context.
Basic settings
Schema and Edit schema A schema is a row description, it defines the fields that will
be processed and passed on to the next component.
In tContextLoad, the schema must be made of two columns,
including the parameter name and the parameter value to
be loaded.
If a variable loaded, but not in the context If a variable is loaded but does not appear in the context,
select how the notification must be displayed. In the shape
of an Error, a warning or an information (info).
If a variable in the context, but not loaded If a variable appears in the context but is not loaded, select
how the notification must be displayed. In the shape of an
Error, a warning or an information (info).
Print operations Select this check box to display the context parameters set
in the Run view.
Disable errors Select this check box to prevent the error from displaying.
Disable warnings Select this check box to prevent the warning from
displaying.
Disable infos Select this check box to prevent the information from
displaying.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows.
Advanced settings
tStat Catcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
496
tContextLoad
Global Variables
Usage
Usage rule This component relies on the data flow to load the context
values to be used, therefore it requires a preceding input
component and thus cannot be a start component.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to turn on or off the Print
operations option dynamically at runtime.
When a dynamic parameter is defined, the corresponding
Print operations option in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.
497
tContextLoad
of them. With the context settings in the Job, we can decide which database to connect to and choose
whether to display the set context parameters on the console dynamically at runtime.
host;localhost
port;3306
database;test
username;root
password;talend
2. Select the Contexts view of the Job, and click the [+] button at the bottom of the view to add
seven rows in the table to define the following parameters:
• host, String type
• port, String type
• database, String type
• username, String type
• password, Password type
• filename, File type
• printOperations, Boolean type
498
tContextLoad
Note that the host, port, database, username and password parameters correspond to the parameter
names in the delimited files and are used to set up the desired database connection, the filename
parameter is used to define the delimited file to read at Job execution, the printOperations
parameter is used to decide whether to print the context parameters set by the tContextLoad
component on the console.
3. Click the Contexts tab and click the [+] button at the upper right corner of the panel to open the
Configure Contexts dialog box.
4. Select the default context, click the Edit button and rename the context to Test.
5. Click New to add a new context named Production. Then click OK to close the dialog box.
6. Back in the Contexts tab view, define the value of the filename variable under each context by
clicking in the respective Value field and browse to the corresponding delimited file.
7. Select the Prompt check box next to the Value field of the filename variable for both contexts to
show the Prompt fields and enter the prompt message to be displayed at the execution time.
8. For the printOperations variable, click in the Value field under the Production context and select
false from the list; click in the Value field under the Test context and select true from the list.
Then select the Prompt check box under both contexts and enter the prompt message to be
displayed at the execution time.
499
tContextLoad
2. Define the file schema manually (Built-in). It contains two columns defined as: key and value.
3. Accept the defined schema to be propagated to the next component (tContextLoad).
4. In the Dynamic settings view of the tContextLoad component, click the [+] button to add a row
in the table, and fill the Code field with context.printOperations to use context variable
printOperations we just defined. Note that the Print operations check box in the Basic settings
view now becomes highlighted and unusable.
500
tContextLoad
7. Then fill in the Schema information. If you stored the schema in the Repository Metadata, then
you can retrieve it by selecting Repository and the relevant entry in the list.
In this example, the schema of both database tables is made of four columns: id (INT, 2 characters
long), firstName (VARCHAR, 15 characters long), lastName (VARCHAR, 15 characters long), and city
(VARCHAR, 15 characters long).
8. In the Query field, type in the SQL query to be executed on the DB table specified. In this
example, simply click Guess Query to retrieve all the columns of the table, which will be displayed
on the Run tab, through the tLogRow component.
9. In the Basic settings view of the tLogRow component, select the Table option to display data
records in the form of a table.
501
tContextLoad
You can specify a file other than the default one if needed, and clear the Show loaded variables
check box if you do not want to see the set context variables on the console. To run the Job using
the default settings, click OK.
The context parameters and content of the database table in the Test context are all displayed on
the Run console.
2. Now select the Production context and press F6 to launch the Job again. When the prompt dialog
box appears, simply click OK to run the Job using the default settings.
502
tContextLoad
The content of the database table in the Production context is displayed on the Run console.
Because the printOperations variable is set to false, the set context parameters are not displayed
on the console this time.
503
tConvertType
tConvertType
Converts one Talend java type to another automatically, and thus avoid compiling errors.
tConvertType allows specific conversions at runtime from one Talend java type to another.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
Manual Cast This mode is not visible if the Auto Cast check box is
selected. It allows you to precise manually the columns
where a java type conversion is needed.
Set empty values to Null before converting This check box is selected to set the empty values of String
or Object type to null for the input data.
Die on error This check box is selected to kill the Job when an error
occurs.
504
tConvertType
Note:
Not available for Map/Reduce Jobs.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Note:
In this scenario, the input schemas for the input delimited file are stored in the repository, you can
simply drag and drop the relevant file node from Repository - Metadata - File delimited onto the
design workspace to automatically retrieve the tFileInputDelimited component's setting. For more
information, see Talend Studio User Guide.
505
tConvertType
The input file used in this scenario is called input. It is a text file that holds string, integer, and
float java types.
Fill in all other fields as needed. For more information, see tFileInputDelimited on page 1015.
In this scenario, the header and the footer are not set and there is no limit for the number of
processed rows.
506
tConvertType
3. Click Edit schema to describe the data structure of this input file. In this scenario, the schema is
made of three columns, StringtoInteger, IntegerField, and FloatToInteger.
6. Set Schema Type to Built in, and click Sync columns to automatically retrieve the columns from
the tFileInputDelimited component.
7. Click Edit schema to describe manually the data structure of this processing component.
In this scenario, we want to convert a string type data into an integer type and a float type data
into an integer type.
Click OK to close the Schema of tConvertType dialog box.
8. Double-click tMap to open the Map editor.
The Map editor displays the input metadata of the tFileInputDelimited component
507
tConvertType
9. In the Schema editor panel of the Map editor, click the plus button of the output table to add two
rows and name them to StringToInteger and Sum.
10. In the Map editor, drag the StringToInteger row from the input table to the StringToInteger row in
the output table.
11. In the Map editor, drag each of the IntegerField and the FloatToInteger rows from the input table to
the Sum row in the output table and click OK to close the Map editor.
12. In the design workspace, select tLogRow and click the Component tab to define its basic settings.
For more information, see tLogRow on page 1977.
508
tConvertType
The string type data is converted into an integer type and displayed in the StringToInteger column
on the console. The float type data is converted into an integer and added to the IntegerField
value to give the addition result in the Sum column on the console.
509
tCosmosDBBulkLoad
tCosmosDBBulkLoad
Imports data files in different formats (CSV, TSV or JSON) into the specified Cosmos database so that
the data can be further processed.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
MongoDB directory Fill in this field with the MongoDB home directory.
Use replica set address or multiple query routers Select this check box to show the Server addresses table.
In the Server addresses table, define the sharded MongoDB
databases or the MongoDB replica sets you want to connect
to.
Server and Port Enter the IP address and listening port of the database
server.
Available when the Use replica set address or multiple
query routers check box is not selected.
Drop collection if exist Select this check box to remove the collection if it already
exists.
510
tCosmosDBBulkLoad
Data file Type in the full path of the file from which the data will be
imported or click the [...] button to browse to the desired
data file.
Make sure that the data file is in standard format. For
example, the fields in CSV files should be separated with
commas.
File type Select the proper file type from the list. CSV, TSV and JSON
are supported.
The JSON file starts with an array Select this check box to allow tCosmosDBBulkload to read
the JSON files starting with an array.
This check box appears when the File type you have
selected is JSON.
Action on data Select the action that you want to perform on the data.
• Insert: Insert the data into the database.
Note that when inserting data from CSV or TSV files
into the MongoDB database, you need to specify fields
either by selecting the First line is header check box or
defining them in the schema.
• Upsert: Insert the data if they do not exist or update
the existing data.
Note that when upserting data into the MongoDB
database, you need to specify a list of fields for the
query portion of the upsert operation.
511
tCosmosDBBulkLoad
Upsert fields Customize the fields that you want to upsert as needed.
This table is available when you select Upsert from the
Action on data list.
First line is header Select this check box to use the first line in CSV or TSV files
as a header.
This check box is available only when you select CSV or TSV
from the File type list.
Ignore blanks Select this check box to ignore the empty fields in CSV or
TSV files.
This check box is available only when you select CSV or TSV
from the File type list.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at a component
level.
Usage
512
tCosmosDBConnection
tCosmosDBConnection
Creates a connection to a CosmosDB database and reuse that connection in other components.
Basic settings
Use replica set address or multiple query routers Select this check box to show the Server addresses table.
In the Server addresses table, define the sharded MongoDB
databases or the MongoDB replica sets you want to connect
to.
Server and Port Enter the IP address and listening port of the database
server.
Available when the Use replica set address or multiple
query routers check box is not selected.
513
tCosmosDBConnection
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at a component
level.
No query timeout Select this check box to prevent MongoDB servers from
stopping idle cursors at the end of 10-minute inactivity of
these cursors. In this situation, an idle cursor will stay open
until either the results of this cursor are exhausted or you
manually close it using the cursor.close() method.
A cursor for MongoDB is a pointer to the result set of a
query. By default, that is to say, with this check box being
clear, a MongoDB server automatically stops idle cursors
after a given inactivity period to avoid excess memory use.
For further information about MongoDB cursors, see https://
docs.mongodb.org/manual/core/cursors/.
Usage
514
tCosmosDBInput
tCosmosDBInput
Retrieves certain documents from a Cosmos database collection by supplying a query document
containing the fields the desired documents should match.
Basic settings
Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Use replica set address or multiple query routers Select this check box to show the Server addresses table.
In the Server addresses table, define the sharded MongoDB
databases or the MongoDB replica sets you want to connect
to.
Server and Port Enter the IP address and listening port of the database
server.
Available when the Use replica set address or multiple
query routers check box is not selected.
Set read preference Select this check box and from the Read preference drop-
down list that is displayed, select the member to which you
need to direct the read operations.
If you leave this check box clear, the Job uses the default
Read preference, that is to say, uses the primary member in
a replica set.
For further information, see MongoDB's documentation
about Replication and its Read preferences.
515
tCosmosDBInput
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
If a column in the database is a JSON document and you
need to read the entire document, put an asterisk (*) in the
DB column column, without quotation marks around.
516
tCosmosDBInput
{
_id: ObjectId("5099
803df3f4948bd2f98391"),
person: { first:
"Joe", last: "Walker" }
}
The first and the last fields have person as their parent node
but the _id field does not have any parent node. So once
completed, this Mapping table should read as follows:
Sort by Specify the column and choose the order for the sort
operation.
This field is available only when you have selected Find
query from the Query type drop-down list.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at the
component level.
No query timeout Select this check box to prevent MongoDB servers from
stopping idle cursors at the end of 10-minute inactivity of
these cursors. In this situation, an idle cursor will stay open
until either the results of this cursor are exhausted or you
manually close it using the cursor.close() method.
A cursor for MongoDB is a pointer to the result set of a
query. By default, that is to say, with this check box being
clear, a MongoDB server automatically stops idle cursors
after a given inactivity period to avoid excess memory use.
For further information about MongoDB cursors, see https://
docs.mongodb.org/manual/core/cursors/.
Enable external sort Since the aggregation pipeline stages have a maximum
memory use limit (100 megabytes) and a stage exceeding
this limit will produce errors, when handling large datasets,
select this check box to avoid aggregation stages exceeding
this limit.
For further information about this external sort, see Large
sort operation with external sort.
517
tCosmosDBInput
Usage
518
tCosmosDBOutput
tCosmosDBOutput
Inserts, updates, upserts or deletes documents in a Cosmos database collection based on the incoming
flow from the preceding component in the Job.
Basic settings
Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Use replica set address or multiple query routers Select this check box to show the Server addresses table.
In the Server addresses table, define the sharded MongoDB
databases or the MongoDB replica sets you want to connect
to.
Server and Port Enter the IP address and listening port of the database
server.
Available when the Use replica set address or multiple
query routers check box is not selected.
Set write concern Select this check box to set the level of acknowledgemen
t requested from for write operations. Then you need to
select the level of this operation.
For further information, see the related MongoDB
documentation on http://docs.mongodb.org/manual/core/
write-concern/.
Bulk write Select this check box to insert, update or remove data in
bulk. Note this feature is available only when the version of
MongoDB you are using is 2.6+.
Then you need to select Ordered or Unordered to define
how the MongoDB database processes the data sent by the
Studio.
519
tCosmosDBOutput
Drop collection if exist Select this check box to drop the collection if it already
exists.
520
tCosmosDBOutput
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.
Built-In: You create and store the schema locally for this
component only.
{
_id: ObjectId("5099
803df3f4948bd2f98391"),
person: { first:
"Joe", last: "Walker" }
}
521
tCosmosDBOutput
The first and the last fields have person as their parent node
but the _id field does not have any parent node. So once
completed, this Mapping table should read as follows:
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
Generate JSON Document Select this check box for JSON configuration:
Configure JSON Tree: click the [...] button to open the
interface for JSON tree configuration. For more information,
see Configuring a JSON Tree on page 3897.
Group by: click the [+] button to add lines and choose the
input columns for grouping the records.
Remove root node: select this check box to remove the root
node.
Data node and Query node (available for update and upsert
actions): type in the name of data node and query node
configured on the JSON tree.
These nodes are mandatory for update and upsert actions.
They are intended to enable the update and upsert actions
though will not be stored in the database.
No query timeout Select this check box to prevent MongoDB servers from
stopping idle cursors at the end of 10-minute inactivity of
these cursors. In this situation, an idle cursor will stay open
until either the results of this cursor are exhausted or you
manually close it using the cursor.close() method.
A cursor for MongoDB is a pointer to the result set of a
query. By default, that is to say, with this check box being
clear, a MongoDB server automatically stops idle cursors
after a given inactivity period to avoid excess memory use.
For further information about MongoDB cursors, see https://
docs.mongodb.org/manual/core/cursors/.
tStatCatcher Statistics Select this check box to collect the log data at the
component level.
Usage
522
tCosmosDBOutput
523
tCosmosDBRow
tCosmosDBRow
Executes the commands of the Cosmos database.
Basic settings
Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Use replica set address or multiple query routers Select this check box to show the Server addresses table.
In the Server addresses table, define the sharded MongoDB
databases or the MongoDB replica sets you want to connect
to.
Server and Port Enter the IP address and listening port of the database
server.
Available when the Use replica set address or multiple
query routers check box is not selected.
524
tCosmosDBRow
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.
Execute command Select this check box to enter MongoDB commands in the
Command field for execution.
• Command: in this field, enter the command to be
executed, if this command contains one single
variable.
For example, if you need to construct the command
{"isMaster": 1}
525
tCosmosDBRow
{ renameCollection : "<source_names
pace>" , to : "<target_namespace>" ,
dropTarget : < true | false > }
"renameCollection" "old_name"
"to" "new_name"
"dropTarget" "false"
"{createIndexes: 'restaurants',
indexes : [{key : {restaurant_id
: 1}, name: 'id_index_2', unique:
true}]}"
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at the
component level.
Usage
526
tCouchbaseDCPInput
tCouchbaseDCPInput
Queries the documents from the Couchbase database, under the Database Change Protocol (DCP), a
streaming protocol.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
The schema of this component is read-only. The content
column stores the documents to be used, the key column
the IDs of these documents and the other columns the
Couchbase technical information.
Bucket Enter, within double quotation marks, the name of the data
bucket in the Couchbase database.
527
tCouchbaseDCPInput
Advanced settings
Connect Timeout Define the timeout interval (in seconds) for the connection
to be aborted.
Global Variables
Usage
528
tCouchbaseDCPOutput
tCouchbaseDCPOutput
Upserts documents in the Couchbase database based on the incoming flat data from preceding
components, under the Database Change Protocol (DCP), a streaming protocol.
This means that it adds a new document or replaces its value if it already exists.
Basic settings
Bucket Enter, within double quotation marks, the name of the data
bucket in the Couchbase database.
Ensure that the credentials you are using have the
appropriate rights and permissions to access this bucket.
If you are using Couchbase V5.0 and onwards, this bucket
name is the user name you have created in the Security tab
of your Couchbase UI.
529
tCouchbaseDCPOutput
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.
Built-In: You create and store the schema locally for this
component only.
Field to use as ID Enter, without double quotation marks, the name of the
column from the schema to provide IDs for the documents
to be written to Couchbase.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
530
tCouchbaseDCPOutput
Usage
531
tCouchbaseInput
tCouchbaseInput
Queries the documents from the Couchbase database.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
When using non-JSON documents, define an id column of
the String type, then define a content column. The type
of this content column should be String for the string
documents and byte[] for the binary documents.
When it comes to JSON documents, define the the fields
that present in your JSON documents.
532
tCouchbaseInput
Bucket Enter, within double quotation marks, the name of the data
bucket in the Couchbase database.
Ensure that the credentials you are using have the
appropriate rights and permissions to access this bucket.
If you are using Couchbase V5.0 and onwards, this bucket
name is the user name you have created in the Security tab
of your Couchbase UI.
Query Type Select the type of queries to be used from the following
options:
• Select All: select all the contents of a given bucket.
• N1QL: use a N1QL statement to perform fine-tuned
queries.
• Document ID: use the document IDs to select
documents. You need to enter the ID to be used in
theDocument ID field that is displayed. Only one
document ID is allowed per component.
Use N1QL query Select this check box and in the Query field that is
displayed, enter a N1QL query statement to perform
complex actions.
Only one statement is allowed and do not put quotation
marks around your statement.
• When you use wildcards in your query such as SELECT
*, the returned result of this query is wrapped in the
bucket name used in this query. In this situation, define
only one column for the result in the schema of this
component.
For example, when performing this query
533
tCouchbaseInput
[
{
"travel_sample": {
"callsign": "MILE-AIR",
"country": "United States",
"iata": "Q5",
"icao": "MLA",
"id": 10,
"name": "40-Mile Air",
"type": "airline"
}
},
{
"travel_sample": {
"callsign": "TXW",
"country": "United States",
"iata": "TQ",
"icao": "TXW",
"id": 10123,
"name": "Texas Wings",
"type": "airline"
}
},
{
"travel_sample": {
"callsign": "atifly",
"country": "United States",
"iata": "A1",
"icao": "A1F",
"id": 10226,
"name": "Atifly",
"type": "airline"
}
}
]
534
tCouchbaseInput
[
{
"callsign": "MILE-AIR",
"country": "United States",
"iata": "Q5",
"icao": "MLA",
"id": 10,
"name": "40-Mile Air",
"type": "airline"
},
{
"callsign": "TXW",
"country": "United States",
"iata": "TQ",
"icao": "TXW",
"id": 10123,
"name": "Texas Wings",
"type": "airline"
},
{
"callsign": "atifly",
"country": "United States",
"iata": "A1",
"icao": "A1F",
"id": 10226,
"name": "Atifly",
"type": "airline"
}
]
Advanced settings
Connect Timeout Enter, without quotation marks, the timeout interval (in
seconds) for the connection to be aborted.
Limit rows Enter the maximum number of rows to be read. This field is
not available when you use a N1QL query.
Global Variables
535
tCouchbaseInput
Usage
536
tCouchbaseOutput
tCouchbaseOutput
Upserts documents in the Couchbase database based on the incoming flat data from preceding
components.
This means that it adds a new document or replaces its value if it already exists.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
When using non-JSON documents, define an id column of
the String type, then define a content column. The type
of this content column should be String for the string
documents and byte[] for the binary documents.
537
tCouchbaseOutput
Bucket Enter, within double quotation marks, the name of the data
bucket in the Couchbase database.
Ensure that the credentials you are using have the
appropriate rights and permissions to access this bucket.
If you are using Couchbase V5.0 and onwards, this bucket
name is the user name you have created in the Security tab
of your Couchbase UI.
Field to use as ID Enter, without double quotation marks, the name of the
column from the schema to provide IDs for the documents
to be written to Couchbase.
Use N1QL Query with parameters Select this check box to apply variables in your N1QL
queries. Once selecting it, the Query field and the Query
Parameters wraps flat data into documents for storage in
the Couchbase database. table are displayed for you to
enter your query and define the variables to be used in your
query.
Only one query is allowed per tCouchbaseOutput.
For example, enter this query in the Query field:
538
tCouchbaseOutput
Advanced settings
Connect Timeout Enter, without quotation marks, the timeout interval (in
seconds) for the connection to be aborted.
Global Variables
Usage
539
tCreateTable
tCreateTable
Creates a table for a specific type of database.
Basic settings
Database Type Select the type of the database. The connection properties
may differ slightly according to the database type selected.
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to
be shared in the Basic settings view of the connection
component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
540
tCreateTable
Running Mode Select the Server Mode that corresponds to your database
setup.
This property is available only for the HSQLDb database
type.
Use TLS/SSL Sockets Select this check box to enable the security mode if
required.
This property is available only for the HSQLDb database
type.
541
tCreateTable
Temporary Table Select this check box to create a temporary table during
an operation, which is automatically dropped at the end
of the operation. Since temporary tables exist in a special
schema, you cannot specify a schema name when creating a
temporary table, and the name of the temporary table must
be distinct from the name of any other table, sequence,
index, and view in the same schema.
Note that once you select to create a temporary table, you
should empty the values when you edit schema.
This field is available only when Postgresql is selected from
the Database Type drop-down list.
Unlogged Table Select this check box to create an unlogged table during an
operation. This way, data is loaded considerably faster than
an ordinary table where the data is logged and then written.
However, the data in an unlogged table is not crash-safe.
This field is available only when Postgresql is selected from
the Database Type drop-down list and Temporary Table is
not selected.
Case Sensitive Select this check box to make the table/column name case
sensitive.
This property is available only for the HSQLDb database
type.
Temporary Table Select this check box if you want to save the created table
temporarily.
This property is available only for the MySQL database type.
542
tCreateTable
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Advanced settings
Enforce database delimited identifiers Select this check box to enable delimited identifiers.
This property is available only for the Snowflake database
type.
For more information on delimited identifiers, see
https://docs.intersystems.com/latest/csp/docbook/
DocBook.UI.Page.cls?KEY=GSQL_identifiers.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
543
tCreateTable
Global Variables
Usage
Procedure
Procedure
1. Drop a tCreateTable component from the Databases family in the Palette to the design
workspace.
2. In the Basic settings view, and from the Database Type list, select Mysql for this scenario.
544
tCreateTable
8. In any case (Built-in or Repository) click Edit Schema to check the data type mappingClick Edit
Schema to define the data structure.
9. Click the Reset DB Types button in case the DB type column is empty or shows discrepancies
(marked in orange). This allows you to map any data type to the relevant DB data type. Then, click
OK to validate your changes and close the dialog box.
10. Save your Job and press F6 to execute it.
Results
The table is created empty but with all columns defined in the Schema.
545
tCreateTemporaryFile
tCreateTemporaryFile
Creates a temporary file in a specified directory. This component allows you to either keep the
temporary file or delete it after the Job execution.
Basic settings
Remove file when execution is over Select this check box to delete the temporary file after the
Job execution.
Use default temporary system directory Select this check box to create the file in the default system
temporary directory.
Directory Specify the directory under which the temporary file will be
created.
This field is available only when the Use default temporary
system directory check box is cleared.
Use Prefix Select this check box to specify to use a string as the prefix
of the temporary file name.
File name prefix string helps you prevent existing files from
being overwritten.
Prefix Specify the file name prefix string for the temporary file.
The prefix string needs to be at least three characters in
length.
To prevent existing files from being overwritten, it is
suggested to use a prefix string that is different from those
of any existing file names in the directory.
This option is available only when the Use Prefix check box
is selected.
Template Enter the temporary file name which should contain the
characters XXXX, such as talend_XXXX.
This option is unavailable when the Use Prefix check box is
selected.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component level.
546
tCreateTemporaryFile
Global Variables
Global Variables FILEPATH: the path where the file was created. This is an
After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
547
tCreateTemporaryFile
Procedure
1. Double-click tCreateTemporaryFile to open its Basic settings view.
548
tCreateTemporaryFile
2. Select the Remove file when execution is over check box to delete the created temporary file after
the Job execution.
3. Select the Use default temporary system directory check box to create the file in the default
system temporary directory.
4. In the Template field, enter the temporary file name which should contain the characters XXXX. In
this example, it is talend_XXXX.
5. In the Suffix field, enter the filename extension of the temporary file. In this example, it is dat.
6. Double-click tJava to open its Basic settings view.
7. In the Code field, enter the following code to display the default system temporary directory and
the path to the temporary file that will be created on the console:
Procedure
1. Double-click tRowGenerator to open its RowGenerator Editor.
549
tCreateTemporaryFile
2. Click the [+] button to add two columns: id of Integer type and name of String type. Then in the
Functions column, select the predefined function Numeric.sequence(String,int,int) for id and
TalendDataGenerator.getFirstName() for name.
3. In the Number of Rows for RowGenerator field, enter 5 to generate five rows.
4. Click OK to validate the changes and accept the propagation prompted by the pop-up dialog box.
5. Double-click tFileOutputDelimited to open its Basic settings view.
6. In the File Name field, press Ctrl+Space and from the global variable list displayed select
((String)globalMap.get("tCreateTemporaryFile_1_FILEPATH")).
Procedure
1. Double-click tFileInputDelimited to open its Basic settings view.
550
tCreateTemporaryFile
2. In the File name/Stream field, press Ctrl+Space and from the global variable list displayed select
((String)globalMap.get("tCreateTemporaryFile_1_FILEPATH")).
3. Click the [...] button next to Edit schema and in the dialog box displayed define the schema by
adding two columns: id of Integer type and name of String type.
4. Click OK to validate the changes and accept the propagation prompted by the pop-up dialog box.
5. Double-click tLogRow to open its Basic settings view.
6. In the Mode area, select Table (print values in cells of a table) to display the output data in a
better way.
551
tCreateTemporaryFile
The file talend_MHTI.dat is created under the default system temporary directory C:\Users\lena_li
\AppData\Local\Temp\ during the Job execution, the five generated rows of data is written into it,
then the file is deleted after the Job execution.
552
tDB2BulkExec
tDB2BulkExec
Executes the Insert action on the provided data and gains in performance during Insert operations to
a DB2 database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
553
tDB2BulkExec
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create table: The table is removed and created
again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear table: The table content is deleted.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-in: You create the schema and store it locally for this
component only. Related topic: see Talend Studio User
Guide.
Use Ingest Command Select this check box to populate data into DB2 using the
INGEST command. For more information about the INGEST
command, see http://www.ibm.com/developerworks/
data/library/techarticle/dm-1304ingestcmd and https://
www-01.ibm.com/support/knowledgecenter/SSEPGG_10
554
tDB2BulkExec
.1.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0057198.html?
cp=SSEPGG_10.1.0%2F3-5-2-4-59.
Warning:
This file is located on the machine specified by the URI
in the Host field so it should be on the same machine as
the database server.
Folder Specify the path to the folder holding the files to be loaded.
This field is available only when FOLDER is selected from
the Load From drop-down list.
Action on Data On the data of the table defined, you can perform one of the
following operations:
• Insert: Add new records to the table. If duplicates are
found, Job stops.
• Replace: Add new records to the table. If an old record
in the table has the same value as a new record for
a PRIMARY KEY or a UNIQUE index, the old record is
deleted before the new record is inserted.
• Update: Make changes to existing records.
• Delete: Remove the records that match the input data.
• Merge: Merge the input data to the table.
Delete and Merge are available only when the Use Ingest
Command check box is selected.
File Glob Pattern Specify the global expression for the files to be loaded.
This field is available only when FOLDER is selected from
the Load From drop-down list.
Where Clause Enter the WHERE clause to filter the data to be processed.
This field is available only when update or delete is sel
ected from the Action on Data drop-down list.
Custom Insert Values Clause Select this check box and in the Insert Values Clause field
displayed enter the VALUES clause for the insert operation.
This check box is available only when the Use Ingest
Command check box is selected and insert is selected from
the Action on Data drop-down list.
555
tDB2BulkExec
Custom Update Set Clause Select this check box and specify the SET clause for the
update operation by completing the Set Mapping table.
This check box is available only when the Use Ingest
Command check box is selected and update is selected from
the Action on Data drop-down list.
Set Mapping Complete this table to specify the SET clause for the update
operation.
• Column: the name of the column. By default, the fields
in the Column column are same as what they are in the
schema.
• Expression: the expression for the corresponding
column.
This table is available only when the Custom Update Set
Clause check box is selected.
Merge Clause Specify the MERGE clause for the merge operation.
This table is available only when the Use Ingest Command
check box is selected and merge is selected from the Action
on Data drop-down list.
Content Format Select the format of the input file, either Delimited or
Positional.
This list is available only when the Use Ingest Command
check box is selected.
Optionally Enclosed By Enter the character that encloses the string in the delimited
file.
This field is available only when Delimited is selected from
the Content Format drop-down list.
Fixed Length Enter the length (in bytes) of the record in the positional
file.
This field is available only when Positional is selected from
the Content Format drop-down list.
556
tDB2BulkExec
Script Generated Folder Specify the directory under which the script file will be cre
ated.
This field is available only when the Use Ingest Command
check box is selected.
Advanced settings
Note:
You can set the encoding parameters through this field.
Date Format Use this field to define the way months and days are or
dered.
Time Format Use this field to define the way hours, minutes and seconds
are ordered.
Timestamp Format Use this field to define the way date and time are ordered.
Remove load pending When the box is ticked, tables blocked in "pending" status
following a bulk load are de-blocked.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
557
tDB2BulkExec
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For tDB2BulkExec related topics, see:
• Inserting transformed data in MySQL database on page 2482.
• Truncating and inserting file data into an Oracle database on page 2681.
558
tDB2Close
tDB2Close
Closes a transaction committed in the connected DB.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
559
tDB2Close
Related scenarios
No scenario is available for the Standard version of this component yet.
560
tDB2Commit
tDB2Commit
Commits in one go a global transaction instead of doing that on every row or every batch and thus
provides gain in performance.
tDB2Commit validates the data processed through the Job into the connected DB.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.
Warning:
If you want to use a Row > Main connection to link
tDB2Commit to your Job, your data will be committed row
by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component is more commonly used with other tDB2*
components, especially with the tDB2Connection and
tDB2Rollback components.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
561
tDB2Commit
Related scenario
For tDB2Commit related scenario, see Inserting data in mother/daughter tables on page 2426
562
tDB2Connection
tDB2Connection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
563
tDB2Connection
Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box is not visible when the Use or register a
shared DB Connection check box is selected.
Data source alias Enter the alias of the data source created on the Talend
Runtime side.
This field is available only when the Specify a data source
alias check box is selected.
Advanced settings
Note:
You can set the encoding parameters through this field.
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.
Usage
564
tDB2Connection
Related scenarios
For tDB2Connection related scenario, see tMysqlConnection on page 2425
565
tDB2Input
tDB2Input
Executes a DB query with a strictly defined order which must correspond to the schema definition.
Then tDB2Input passes on the field list to the next component via a Row > Main link.
If double quotes exist in the column names of a table, the double quotation marks cannot be retrieved
when retrieving the column. Therefore, it is recommended not to use double quotes in column names
in a DB2 database table.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
566
tDB2Input
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
567
tDB2Input
Table name Select the source table where to capture any changes made
on data.
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box is not available when the Use an existing
connection check box is selected.
Data source alias Enter the alias of the data source created on the Talend
Runtime side.
This field is available only when the Specify a data source
alias check box is selected.
Advanced settings
Note:
You can set the encoding parameters through this field.
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
568
tDB2Input
Usage
Usage rule This component covers all possible SQL queries for DB2 da
tabases.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related topics, see:
See also the related topic in Reading data from different MySQL databases using dynamically loaded
connection parameters on page 497.
569
tDB2Output
tDB2Output
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tDB2Output writes, updates, makes changes or suppresses entries in a database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
570
tDB2Output
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
Default: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Truncate table: The table content is deleted. You do not
have the possibility to rollback the operation.
Truncate table with reuse storage: The table content is
deleted. You do not have the possibility to rollback the
operation. However, you can reuse the existing storage
allocated to the table, even if the storage is considered
empty.
Warning:
If you select the Use an existing connection check
box, and then select Truncate table or Truncate table
with reuse storage from the Action on table list, a
commit statement will be invoked before the truncate
operation because the truncate statement must be the
first statement in a transaction.
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
571
tDB2Output
Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You can
do that by clicking Edit Schema and selecting the check
box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings
view where you can simultaneously define primary keys for
the update and delete operations. To do that: Select the
Use field options check box and then in the Key in update
column, select the check boxes next to the column name on
which you want to base the update operation. Do the same
in the Key in delete column for the deletion operation.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
572
tDB2Output
Data source alias Enter the alias of the data source created on the Talend
Runtime side.
This field is available only when the Specify a data source
alias check box is selected.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Use alternate schema Select this option to use a schema other than the one
specified by the component that establishes the database
connection (that is, the component selected from the
Component list drop-down list in Basic settings view).
After selecting this option, provide the name of the desired
schema in the Schema field.
This option is available when Use an existing connection is
selected in Basic settings view.
Note:
You can set the encoding parameters through this field.
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
Use field options Select this check box to customize a request, especially
when there is double action on data.
573
tDB2Output
Convert columns and table names to uppercase Select this check box to uppercase the names of the
columns and the name of the table.
Debug query mode Select this check box to display each step during processing
entries in a database.
Support null in "SQL WHERE" statement Select this check box if you want to deal with the Null
values contained in a DB table.
Note:
Make sure the Nullable check box is selected for the c
orresponding columns in the schema.
Use Batch Select this check box to activate the batch mode for data
processing.
Note:
This check box is available only when you have selected
the Insert, the Update or the Delete option in the Action
on data field.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
574
tDB2Output
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of a
table in a DB2 database. It also allows you to create a reject
flow using a Row > Rejects link to filter data in error. For
an example of tMySqlOutput in use, see Retrieving data in
error with a Reject link on page 2474.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For tDB2Output related topics, see
• Inserting a column and altering data using tMysqlOutput on page 2466.
575
tDB2Rollback
tDB2Rollback
Avoids to commit part of a transaction involuntarily.
tDB2Rollback cancels the transaction committed in the connected DB.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component is more commonly used with other tDB2*
components, especially with the tDB2Connection and
tDB2Commit components.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
576
tDB2Rollback
Related scenarios
For tDB2Rollback related scenario, see Rollback from inserting data in mother/daughter tables on
page 2429 of the tMysqlRollback.
577
tDB2Row
tDB2Row
Acts on the actual DB structure or on the data (although without handling data) depending on
the nature of the query and the database. The SQLBuilder tool helps you write easily your SQL
statements.
tDB2Row is the specific component for this database query. It executes the SQL query stated onto
the specified database. The row suffix means the component implements a flow in the job design
although it doesn't provide output.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
578
tDB2Row
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box is not available when the Use an existing
connection check box is selected.
579
tDB2Row
Data source alias Enter the alias of the data source created on the Talend
Runtime side.
This field is available only when the Specify a data source
alias check box is selected.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Note:
You can set the encoding parameters through this field.
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased
tStatCatcher Statistics Select this check box to collect log data at the component
level.
580
tDB2Row
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For tDB2Row related topics, see:
• Combining two flows for selective output on page 2503
• Procedure on page 622
• Removing and regenerating a MySQL table index on page 2497.
581
tDB2SCD
tDB2SCD
Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table
tDB2SCD reflects and tracks changes in a dedicated DB2 SCD table.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
582
tDB2SCD
Table Name of the table to be written. Note that only one table
can be written at a time.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
SCD Editor The SCD editor helps to build and configure the data flow
for slowly changing dimension outputs.
For more information, see SCD management methodology
on page 2511.
Use memory saving Mode Select this check box to maximize system performance.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
583
tDB2SCD
Note:
You can set the encoding parameters through this field.
End date time details Specify the time value of the SCD end date time setting in
the format of HH:mm:ss. The default value for this field is
12:00:00.
This field appears only when SCD Type 2 is used and Fixed
year value is selected for creating the SCD end date.
Debug mode Select this check box to display each step during
processing entries in a database.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
584
tDB2SCD
Limitation This component does not support using SCD type 0 together
with other SCD types.
Related scenarios
For related topics, see tMysqlSCD on page 2508.
585
tDB2SCDELT
tDB2SCDELT
Addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and
logs the changes into a dedicated DB2 SCD table.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
586
tDB2SCDELT
Table Name of the table to be written. Note that only one table
can be written at a time.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Surrogate Key Select the surrogate key column from the list.
587
tDB2SCDELT
Source fields value include Null Select this check box to allow the source columns to have
Null values.
Note:
The source columns here refer to the fields defined in
the SCD type 1 fields and SCD type 2 fields tables.
Use SCD Type 1 fields Use type 1 if tracking changes is not necessary. SCD Type
1 should be used for typos corrections for example. Select
the columns of the schema that will be checked for changes.
Use SCD Type 2 fields Use type 2 if changes need to be tracked down. SCD Type
2 should be used to trace updates for example. Select the
columns of the schema that will be checked for changes.
SCD type 2 fields Click the [+] button to add as many rows as needed, each
row for a column. Click the arrow on the right side of
the cell and select the column whose value changes will
be tracked using Type 2 SCD from the drop-down list
displayed .
This table is available only when the Use SCD type 2 fields
option is selected.
Start date Specify the column that holds the start date for type 2 SCD.
This list is available only when the Use SCD type 2 fields
option is selected.
End date Specify the column that holds the end date for type 2 SCD.
This list is available only when the Use SCD type 2 fields
option is selected.
Log active status Select this check box and from the Active field drop-down
list displayed, select the column that holds the true or false
status value, which helps to spot the active record for type 2
SCD.
This option is available only when the Use SCD type 2 fields
option is selected.
Log versions Select this check box and from the Version field drop-down
list displayed, select the column that holds the version
number of the record for type 2 SCD.
This option is available only when the Use SCD type 2 fields
option is selected.
588
tDB2SCDELT
Advanced settings
Note:
You can set the encoding parameters through this field.
Debug mode Select this check box to display each step during
processing entries in a database.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
589
tDB2SCDELT
Related Scenarios
For related scenarios,see:
• Tracking data changes in a Snowflake table using the tJDBCSCDELT component on page 1879.
• Tracking data changes in a PostgreSQL table using the tPostgreSQLSCDELT component on page
2948.
590
tDB2SP
tDB2SP
Offers a convenient way to call the database stored procedures.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
591
tDB2SP
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Parameters Click the Plus button and select the various Schema
Columns that will be required by the procedures. Note
that the SP schema can hold more columns than there are
parameters used in the procedure.
Select the Type of parameter:
IN: Input parameter
OUT: Output parameter/return value
IN OUT: Input parameters is to be returned as value, likely
after modification through the procedure (function).
RECORDSET: Input parameters is to be returned as a set of
values, rather than single value.
Note:
Check Inserting data in mother/daughter tables on page
2426 if you want to analyze a set of records from a
database table or DB query and return single records.
Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
592
tDB2SP
Data source alias Enter the alias of the data source created on the Talend
Runtime side.
This field is available only when the Specify a data source
alias check box is selected.
Advanced settings
Note:
You can set the encoding parameters through this field.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related scenarios, see:
593
tDB2SP
594
Dynamic database components
Talend provides a number of database components that allow you to change dynamically the type of
database you want to work on. These components are available in the Database Common group under
the Databases family of the Palette for standard data integration Jobs.
Each of these components has only one property, the Database list, on its Basic settings view for you
to select the type of database of your interest.
For more information on these dynamic database components, see:
• tDBBulkExec on page 596
• tDBClose on page 597
• tDBColumnList on page 598
• tDBCommit on page 599
• tDBConnection on page 600
• tDBInput on page 601
• tDBLastInsertId on page 603
• tDBOutput on page 604
• tDBOutputBulk on page 606
• tDBOutputBulkExec on page 607
• tDBRollback on page 608
• tDBRow on page 609
• tDBSCD on page 610
• tDBSCDELT on page 611
• tDBSP on page 612
• tDBTableList on page 613
595
tDBBulkExec
tDBBulkExec
Offers gains in performance while executing the Insert operations on a database.
This component works with a variety of databases depending on your selection.
The tDBOutputBulk and tDBBulkExec components are used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT statement
used to feed a database the selected database type. These two steps are fused together in the
tDBOutputBulkExec component, detailed in a separate section. The advantage of using two separate
steps is that the data can be transformed before it is loaded in the database.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessBulkExec on page 79)
• Amazon (tRedshiftBulkExec on page 2964)
• Greenplum (tGreenplumBulkExec on page 1311)
• IBM DB2 (tDB2BulkExec on page 553)
• Informix (tInformixBulkExec on page 1706)
• Ingres (tIngresBulkExec on page 1747)
• Microsoft SQL Server (tMSSqlBulkExec on page 2348)
• MySQL (tMysqlBulkExec on page 2412)
• Netezza (tNetezzaBulkExec on page 2616)
• Oracle (tOracleBulkExec on page 2676)
• ParAccel (tParAccelBulkExec on page 2803)
• PostgreSQL (tPostgresqlBulkExec on page 2906)
• PostgresPlus (tPostgresPlusBulkExec on page 2865)
• Snowflake (tSnowflakeBulkExec on page 3384)
• Sybase (ASE and IQ) (tSybaseBulkExec on page 3658)
• Sybase IQ (tSybaseIQBulkExec on page 3673)
• Vertica (tVerticaBulkExec on page 3822)
596
tDBClose
tDBClose
Closes the transaction committed in a connected database.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessClose on page 82)
• Amazon Aurora (tAmazonAuroraClose on page 146)
• Amazon Mysql (tAmazonMysqlClose on page 185)
• Amazon Oracle (tAmazonOracleClose on page 207)
• Amazon Redshift (tRedshiftClose on page 2980)
• AS400 (tAS400Close on page 237)
• FireBird (tFirebirdClose on page 1179)
• Greenplum (tGreenplumClose on page 1315)
• IBM DB2 (tDB2Close on page 559)
• Exasol (tEXAClose on page 895)
• Informix (tInformixClose on page 1711)
• Ingres (tIngresClose on page 1751)
• Interbase (tInterbaseClose on page 1784)
• JDBC (tJDBCClose on page 1850)
• MemSQL (tMemSQLClose (deprecated))
• Microsoft SQL Server (tMSSqlClose on page 2353)
• MySQL (tMysqlClose on page 2416)
• Netezza (tNetezzaClose on page 2620)
• Oracle (tOracleClose on page 2684)
• ParAccel (tParAccelClose on page 2807)
• PostgreSQL (tPostgresqlClose on page 2910)
• PostgresPlus (tPostgresPlusClose on page 2869)
• SAPHana (tSAPHanaClose on page 3303)
• SQLite (tSQLiteClose on page 3504)
• Snowflake (tSnowflakeClose on page 3398)
• Sybase (ASE and IQ) (tSybaseClose on page 3663)
• Teradata (tTeradataClose on page 3726)
• Vertica (tVerticaClose on page 3828)
597
tDBColumnList
tDBColumnList
Iterates on all columns of a given database table and lists column names.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Microsoft SQL Server (tMSSqlColumnList on page 2355)
• MySQL (tMysqlColumnList on page 2418)
598
tDBCommit
tDBCommit
Validates the data processed through the Job into the connected database.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessCommit on page 84)
• Amazon Aurora (tAmazonAuroraCommit on page 148)
• Amazon Mysql (tAmazonMysqlCommit on page 187)
• Amazon Oracle (tAmazonOracleCommit on page 209)
• AS400 (tAS400Commit on page 239)
• Amazon Redshift (tRedshiftCommit on page 2982)
• FireBird (tFirebirdCommit on page 1181)
• Greenplum (tGreenplumCommit on page 1317)
• IBM DB2 (tDB2Commit on page 561)
• Exasol (tEXACommit on page 897)
• Informix (tInformixCommit on page 1713)
• Ingres (tIngresCommit on page 1753)
• Interbase (tInterbaseCommit on page 1786)
• JDBC (tJDBCCommit on page 1854)
• Microsoft SQL Server (tMSSqlCommit on page 2358)
• MySQL (tMysqlCommit on page 2423)
• Netezza (tNetezzaCommit on page 2622)
• Oracle (tOracleCommit on page 2686)
• ParAccel (tParAccelCommit on page 2809)
• PostgreSQL (tPostgresqlCommit on page 2912)
• PostgresPlus (tPostgresPlusCommit on page 2871)
• SAPHana (tSAPHanaCommit on page 3304)
• SQLite (tSQLiteCommit on page 3506)
• Sybase (ASE and IQ) (tSybaseCommit on page 3665)
• Teradata (tTeradataCommit on page 3728)
• VectorWise (tVectorWiseCommit on page 3803)
• Vertica (tVerticaCommit on page 3830)
599
tDBConnection
tDBConnection
Opens a connection to a database to be reused in the subsequent subJob or subJobs.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessConnection on page 86)
• Amazon Aurora (tAmazonAuroraConnection on page 150)
• Amazon Mysql (tAmazonMysqlConnection on page 189)
• Amazon Oracle (tAmazonOracleConnection on page 211)
• Amazon Redshift (tRedshiftConnection on page 2984)
• AS400 (tAS400Connection on page 241)
• Exasol (tEXAConnection on page 899)
• FireBird (tFirebirdConnection on page 1183)
• Greenplum (tGreenplumConnection on page 1319)
• IBM DB2 (tDB2Connection on page 563)
• Informix (tInformixConnection on page 1715)
• Ingres (tIngresConnection on page 1755)
• Interbase (tInterbaseConnection on page 1788)
• JDBC (tJDBCConnection on page 1856)
• MemSQL (tMemSQLConnection (deprecated))
• Microsoft SQL Server (tMSSqlConnection on page 2360)
• MySQL (tMysqlConnection on page 2425)
• Netezza (tNetezzaConnection on page 2624)
• Oracle (tOracleConnection on page 2688)
• ParAccel (tParAccelConnection on page 2811)
• PostgreSQL (tPostgresqlConnection on page 2914)
• PostgresPlus (tPostgresPlusConnection on page 2873)
• SAPHana (tSAPHanaConnection on page 3306)
• SQLite (tSQLiteConnection on page 3508)
• Snowflake (tSnowflakeConnection on page 3401)
• Sybase (ASE and IQ) (tSybaseConnection on page 3667)
• Teradata (tTeradataConnection on page 3730)
• VectorWise (tVectorWiseConnection on page 3805)
• Vertica (tVerticaConnection on page 3832)
600
tDBInput
tDBInput
Extracts data from a database.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessInput on page 91)
• Amazon Aurora (tAmazonAuroraInput on page 153)
• Amazon Mysql (tAmazonMysqlInput on page 192)
• Amazon Oracle (tAmazonOracleInput on page 214)
• Amazon Redshift (tRedshiftInput on page 2987)
• AS400 (tAS400Input on page 243)
• Exasol (tEXAInput on page 902)
• FireBird (tFirebirdInput on page 1185)
• Greenplum (tGreenplumInput on page 1327)
• IBM DB2 (tDB2Input on page 566)
• Informix (tInformixInput on page 1717)
• Ingres (tIngresInput on page 1757)
• Interbase (tInterbaseInput on page 1790)
• JDBC (tJDBCInput on page 1861)
• MemSQL (tMemSQLInput (deprecated))
• Microsoft SQL Server (tMSSqlInput on page 2368)
• MySQL (tMysqlInput on page 2437)
• Netezza (tNetezzaInput on page 2626)
• Oracle (tOracleInput on page 2692)
• ParAccel (tParAccelInput on page 2813)
• PostgreSQL (tPostgresqlInput on page 2916)
• PostgresPlus (tPostgresPlusInput on page 2875)
• SAPHana (tSAPHanaInput on page 3308)
• SAS (tSasInput (deprecated))
• SQLite (tSQLiteInput on page 3510)
• Snowflake (tSnowflakeInput on page 3404)
• Sybase (ASE and IQ) (tSybaseInput on page 3669)
• Teradata (tTeradataInput on page 3742)
• VectorWise (tVectorWiseInput on page 3807)
601
tDBInput
602
tDBLastInsertId
tDBLastInsertId
Obtains the primary key value of the record that was last inserted in a database table by a user.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• AS400 (tAS400LastInsertId on page 250)
• Microsoft SQL Server (tMSSqlLastInsertId on page 2372)
• MySQL (tMysqlLastInsertId on page 2453)
603
tDBOutput
tDBOutput
Writes, updates, makes changes or suppresses entries in a database.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessOutput on page 95)
• Amazon Aurora (tAmazonAuroraOutput on page 163)
• Amazon Mysql (tAmazonMysqlOutput on page 195)
• Amazon Oracle (tAmazonOracleOutput on page 218)
• Amazon Redshift (tRedshiftOutput on page 2996)
• AS400 (tAS400Output on page 252)
• Exasol (tEXAOutput on page 906)
• FireBird (tFirebirdOutput on page 1189)
• Greenplum (tGreenplumOutput on page 1330)
• IBM DB2 (tDB2Output on page 570)
• Informix (tInformixOutput on page 1720)
• Ingres (tIngresOutput on page 1761)
• Interbase (tInterbaseOutput on page 1794)
• JDBC (tJDBCOutput on page 1865)
• MemSQL (tMemSQLOutput (deprecated))
• Microsoft SQL Server (tMSSqlOutput on page 2375)
• MySQL (tMysqlOutput on page 2460)
• Netezza (tNetezzaOutput on page 2637)
• Oracle (tOracleOutput on page 2699)
• ParAccel (tParAccelOutput on page 2817)
• PostgreSQL (tPostgresqlOutput on page 2920)
• PostgresPlus (tPostgresPlusOutput on page 2879)
• SAPHana (tSAPHanaOutput on page 3312)
• SAS (tSasOutput (deprecated))
• SQLite (tSQLiteOutput on page 3515)
• Snowflake (tSnowflakeOutput on page 3412)
• Sybase (ASE and IQ) (tSybaseOutput on page 3689)
• Teradata (tTeradataOutput on page 3749)
• VectorWise (tVectorWiseOutput on page 3811)
604
tDBOutput
605
tDBOutputBulk
tDBOutputBulk
Writes a file with columns based on the defined delimiter and the standards of the selected database
type.
This component works with a variety of databases depending on your selection.
The tDBOutputBulk and tDBBulkExec components are used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT statement
used to feed a database the selected database type. These two steps are fused together in the
tDBOutputBulkExec component, detailed in a separate section. The advantage of using two separate
steps is that the data can be transformed before it is loaded in the database.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessOutputBulk on page 101)
• Amazon Redshift (tRedshiftOutputBulk on page 3002)
• Greenplum (tGreenplumOutputBulk on page 1336)
• Informix (tInformixOutputBulk on page 1726)
• Ingres (tIngresOutputBulk on page 1766)
• Microsoft SQL Server (tMSSqlOutputBulk on page 2382)
• MySQL (tMysqlOutputBulk on page 2480)
• Oracle (tOracleOutputBulk on page 2706)
• ParAccel (tParAccelOutputBulk on page 2823)
• PostgreSQL (tPostgresqlOutputBulk on page 2927)
• PostgresPlus (tPostgresPlusOutputBulk on page 2885)
• Snowflake (tSnowflakeOutputBulk on page 3416)
• Sybase (ASE and IQ) (tSybaseOutputBulk on page 3695)
• Vertica (tVerticaOutputBulk on page 3844)
606
tDBOutputBulkExec
tDBOutputBulkExec
Executes the Insert action in a database.
This component works with a variety of databases depending on your selection.
The tDBOutputBulk and tDBBulkExec components are used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT statement
used to feed a database the selected database type. These two steps are fused together in the
tDBOutputBulkExec component, detailed in a separate section. The advantage of using two separate
steps is that the data can be transformed before it is loaded in the database.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessOutputBulk on page 101)
• Amazon Redshift (tRedshiftOutputBulk on page 3002)
• Greenplum (tGreenplumOutputBulk on page 1336)
• Informix (tInformixOutputBulk on page 1726)
• Ingres (tIngresOutputBulk on page 1766)
• Microsoft SQL Server (tMSSqlOutputBulk on page 2382)
• MySQL (tMysqlOutputBulk on page 2480)
• Oracle (tOracleOutputBulk on page 2706)
• ParAccel (tParAccelOutputBulk on page 2823)
• PostgreSQL (tPostgresqlOutputBulk on page 2927)
• PostgresPlus (tPostgresPlusOutputBulk on page 2885)
• Snowflake (tSnowflakeOutputBulkExec on page 3423)
• Sybase (ASE and IQ) (tSybaseOutputBulk on page 3695)
• Vertica (tVerticaOutputBulk on page 3844)
607
tDBRollback
tDBRollback
Cancels the transaction commit in a connected database to avoid committing part of a transaction
involuntarily.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessRollback on page 108)
• Amazon Aurora (tAmazonAuroraRollback on page 170)
• Amazon Mysql (tAmazonMysqlRollback on page 201)
• Amazon Oracle (tAmazonOracleRollback on page 224)
• Amazon Redshift (tRedshiftRollback on page 3014)
• AS400 (tAS400Rollback on page 257)
• Exasol (tEXARollback on page 912)
• FireBird (tFirebirdRollback on page 1194)
• Greenplum (tGreenplumRollback on page 1342)
• IBM DB2 (tDB2Rollback on page 576)
• Informix (tInformixRollback on page 1733)
• Ingres (tIngresRollback on page 1775)
• Interbase (tInterbaseRollback on page 1800)
• JDBC (tJDBCRollback on page 1870)
• Microsoft SQL Server (tMSSqlRollback on page 2390)
• MySQL (tMysqlRollback on page 2491)
• Netezza (tNetezzaRollback on page 2643)
• Oracle (tOracleRollback on page 2715)
• ParAccel (tParAccelRollback on page 2830)
• PostgreSQL (tPostgresqlRollback on page 2934)
• PostgresPlus (tPostgresPlusRollback on page 2891)
• SAPHana (tSAPHanaRollback on page 3318)
• SQLite (tSQLiteRollback on page 3520)
• Sybase (ASE and IQ) (tSybaseRollback on page 3703)
• Teradata (tTeradataRollback on page 3755)
• VectorWise (tVectorWiseRollback on page 3816)
• Vertica (tVerticaRollback on page 3852)
608
tDBRow
tDBRow
Executes the stated SQL query onto a database.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessRow on page 110)
• Amazon Mysql (tAmazonMysqlRow on page 203)
• Amazon Oracle (tAmazonOracleRow on page 226)
• Amazon Redshift (tRedshiftRow on page 3016)
• AS400 (tAS400Row on page 259)
• Exasol (tEXARow on page 914)
• FireBird (tFirebirdRow on page 1196)
• Greenplum (tGreenplumRow on page 1344)
• IBM DB2 (tDB2Row on page 578)
• Informix (tInformixRow on page 1735)
• Ingres (tIngresRow on page 1777)
• Interbase (tInterbaseRow on page 1802)
• JDBC (tJDBCRow on page 1872)
• MemSQL (tMemSQLRow (deprecated))
• Microsoft SQL Server (tMSSqlRow on page 2392)
• MySQL (tMysqlRow on page 2493)
• Netezza (tNetezzaRow on page 2645)
• Oracle (tOracleRow on page 2717)
• ParAccel (tParAccelRow on page 2832)
• PostgreSQL (tPostgresqlRow on page 2936)
• PostgresPlus (tPostgresPlusRow on page 2893)
• SAPHana (tSAPHanaRow on page 3319)
• SQLite (tSQLiteRow on page 3522)
• Snowflake (tSnowflakeRow on page 3440)
• Sybase (ASE and IQ) (tSybaseRow on page 3705)
• Teradata (tTeradataRow on page 3757)
• VectorWise (tVectorWiseRow on page 3818)
• Vertica (tVerticaRow on page 3854)
609
tDBSCD
tDBSCD
Reflects and tracks changes in a dedicated database SCD table.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Greenplum (tGreenplumSCD on page 1348)
• IBM DB2 (tDB2SCD on page 582)
• Informix (tInformixSCD on page 1739)
• Ingres (tIngresSCD on page 1781)
• Microsoft SQL Server (tMSSqlSCD on page 2397)
• MySQL (tMysqlSCD on page 2508)
• Netezza (tNetezzaSCD on page 2649)
• Oracle (tOracleSCD on page 2722)
• ParAccel (tParAccelSCD on page 2836)
• PostgreSQL (tPostgresqlSCD on page 2940)
• PostgresPlus (tPostgresPlusSCD on page 2897)
• Sybase (ASE and IQ) (tSybaseSCD on page 3709)
• Teradata (tTeradataSCD on page 3762)
• Vertica (tVerticaSCD on page 3858)
610
tDBSCDELT
tDBSCDELT
Reflects and tracks changes in a dedicated SCD table through SQL queries.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• IBM DB2 (tDB2SCDELT on page 586)
• MySQL (tMysqlSCDELT on page 2522)
• Oracle (tOracleSCDELT on page 2726)
• PostgreSQL (tPostgresqlSCDELT on page 2944)
• PostgresPlus (tPostgresPlusSCDELT on page 2901)
• Sybase (ASE and IQ) (tSybaseSCDELT on page 3713)
• Teradata (tTeradataSCDELT on page 3766)
611
tDBSP
tDBSP
Calls a database stored procedure.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• IBM DB2 (tDB2SP on page 591)
• Informix (tInformixSP on page 1743)
• JDBC (tJDBCSP on page 1889)
• Microsoft SQL Server (tMSSqlSP on page 2401)
• MySQL (tMysqlSP on page 2526)
• Oracle (tOracleSP on page 2731)
• Sybase (ASE and IQ) (tSybaseSP on page 3718)
612
tDBTableList
tDBTableList
Lists the names of specified database tables using a SELECT statement based on a WHERE clause.
This component works with a variety of databases depending on your selection.
Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Microsoft SQL Server (tMSSqlTableList on page 2410)
• MySQL (tMysqlTableList on page 2532)
• Oracle (tOracleTableList on page 2739)
613
tDBFSConnection
tDBFSConnection
Connects to a given DBFS (Databricks Filesystem) system so that the other DBFS components can
reuse the connection it creates to communicate with this DBFS.
The DBFS (Databricks Filesystem) components are designed for quick and straightforward data
transferring with Databricks. If you need to handle more sophisticated scenarios for optimal
performance, use Spark Jobs with Databricks.
Basic settings
Endpoint In the Endpoint field, enter the URL address of your Azure
Databricks workspace. This URL can be found in the
Overview blade of your Databricks workspace page on your
Azure portal. For example, this URL could look like https://
westeurope.azuredatabricks.net.
Token Click the [...] button next to the Token field to enter the
authentication token generated for your Databricks user
account. You can generate or find this token on the User
settings page of your Databricks workspace. For further
information, see Token management from the Azure
documentation.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Usage
614
tDBFSGet
tDBFSGet
Copies files from a given DBFS (Databricks Filesystem) system, pastes them in a user-defined directory
and if needs be, renames them.
The DBFS (Databricks Filesystem) components are designed for quick and straightforward data
transferring with Databricks. If you need to handle more sophisticated scenarios for optimal
performance, use Spark Jobs with Databricks.
Basic settings
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Endpoint In the Endpoint field, enter the URL address of your Azure
Databricks workspace. This URL can be found in the
Overview blade of your Databricks workspace page on your
Azure portal. For example, this URL could look like https://
westeurope.azuredatabricks.net.
Token Click the [...] button next to the Token field to enter the
authentication token generated for your Databricks user
account. You can generate or find this token on the User
settings page of your Databricks workspace. For further
information, see Token management from the Azure
documentation.
DBFS directory In the DBFS directory field, enter the path pointing to the
data to be used in the DBFS file system.
Local directory Browse to, or enter the local directory to store the files
copied from DBFS.
Overwrite file Options to overwrite or not the existing file with the new
one.
615
tDBFSGet
Include subdirectories Select this check box if the selected input source type
includes sub-directories.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Usage
616
tDBFSPut
tDBFSPut
Connects to a given DBFS (Databricks Filesystem) system, copies files from an user-defined directory,
pastes them in this system and if needs be, renames these files.
The DBFS (Databricks Filesystem) components are designed for quick and straightforward data
transferring with Databricks. If you need to handle more sophisticated scenarios for optimal
performance, use Spark Jobs with Databricks.
Basic settings
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Endpoint In the Endpoint field, enter the URL address of your Azure
Databricks workspace. This URL can be found in the
Overview blade of your Databricks workspace page on your
Azure portal. For example, this URL could look like https://
westeurope.azuredatabricks.net.
Token Click the [...] button next to the Token field to enter the
authentication token generated for your Databricks user
account. You can generate or find this token on the User
settings page of your Databricks workspace. For further
information, see Token management from the Azure
documentation.
DBFS directory In the DBFS directory field, enter the path pointing to the
data to be used in the DBFS file system.
Local directory Local directory where are stored the files to be loaded into
DBFS.
Overwrite file Options to overwrite or not the existing file with the new
one.
617
tDBFSPut
Include subdirectories Select this check box if the selected input source type
includes sub-directories.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Usage
618
tDBSQLRow
tDBSQLRow
Acts on the actual DB structure or on the data (although without handling data) depending on
the nature of the query and the database. The SQLBuilder tool helps you write easily your SQL
statements.
tDBSQLRow is the generic component for database query. It executes the SQL query stated onto t
he specified database. The row suffix means the component implements a flow in the job design
although it does not provide output. For performance reasons, specific DB component should always
be preferred to the generic component.
To use this component, relevant DBMSs' ODBC drivers should be installed and the corresponding
ODBC connections should be configured via the database connection configuration wizard.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
619
tDBSQLRow
Table Name Name of the source table where changes made to data
should be captured.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Additional JDBC parameters Specify additional connection properties for the database
connection you are creating.
Note:
You can set the encoding parameters through this field.
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
620
tDBSQLRow
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
Note that the relevant DBRow component should be
preferred according to your DBMSs. Most of the DBMSs
have their specific DBRow components.
Resetting a DB auto-increment
This scenario describes a single component Job which aims at re-initializing the DB auto-increment to
1. This job has no output and is generally to be used before running a script.
Warning:
As a prerequisite of this Job, the relevant DBMS's ODBC driver must have been installed and the
corresponding ODBC connection must have been configured.
621
tDBSQLRow
Procedure
Procedure
1. Drag and drop a tDBSQLRow component from the Palette to the design workspace.
3. Select Repository in the Property Type list as the ODBC connection has been configured and
saved in the Repository. The follow-up fields gets filled in automatically.
For more information on storing DB connections in the Repository, see Talend Studio User Guide.
4. The Schema is built-in for this Job and it does not really matter in this example as the action is
made on the table auto-increment and not on data.
5. The Query type is also built-in. Click on the [...] button next to the Query statement box to launch
the SQLbuilder editor, or else type in directly in the statement box:
Alter table <TableName> auto_increment = 1
6. Press Ctrl+S to save the Job and F6 to run.
The database autoincrement is reset to 1.
622
tDenormalize
tDenormalize
Denormalizes the input flow based on one column.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at component
level. Note that this check box is not available in the Map/
Reduce version of the component.
623
tDenormalize
Global Variables
Usage
Limitation Note that this component may change the order in the
incoming Java flow.
624
tDenormalize
6. In the Basic settings of tDenormalize, define the column that contains multiple values to be
grouped.
7. In this use case, the column to denormalize is Children.
8. Set the Delimiter to separate the grouped values. Beware as only one column can be
denormalized.
9. Select the Merge same value check box, if you know that some values to be grouped are strictly
identical.
10. Save your Job and press F6 to execute it.
625
tDenormalize
Results
All values from the column Children (set as column to denormalize) are grouped by their Fathers
column. Values are separated by a comma.
4. Define the Row and Field separators, the Header and other information if required.
5. The file schema is made of four columns including: Name, FirstName, HomeTown, WorkTown.
626
tDenormalize
6. In the tDenormalize component Basic settings, select the columns that contain the repetition.
These are the column which are meant to occur multiple times in the document. In this use
case, FirstName, HomeCity and WorkCity are the columns against which the denormalization is
performed.
7. Add as many line to the table as you need using the plus button. Then select the relevant columns
in the drop-down list.
8. In the Delimiter column, define the separator between double quotes, to split concanated values.
For FirstName column, type in "#", for HomeCity, type in "§", ans for WorkCity, type in "¤".
9. Save your Job and press F6 to execute it.
627
tDenormalize
Results
This time, the console shows the results with no duplicate instances.
628
tDenormalizeSortedRow
tDenormalizeSortedRow
Synthesizes sorted input flow to save memory.
tDenormalizeSortedRow combines in a group all input sorted rows. Distinct values of the
denormalized sorted row are joined with item separators.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component in the Job.
Built-in: You create the schema and store it locally for the
relevant component. Related topic: see Talend Studio User
Guide.
629
tDenormalizeSortedRow
Advanced settings
tStatCatcher Statistics Select this ckeck box to collect the log data at component
level.
Global Variables
Usage
630
tDenormalizeSortedRow
• If needed, define row and field separators, header and footer, and the number of processed rows.
• Set Schema to Built in and click the three-dot button next to Edit Schema to define the data to
pass on to the next component. The schema in this example consists of two columns, id and name.
631
tDenormalizeSortedRow
• Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the
tFileInputDelimited component.
• In the Criteria panel, use the plus button to add a line and set the sorting parameters for the
schema column to be processed. In this example we want to sort the id columns in ascending
order.
• In the design workspace, select tDenormalizeSortedRow.
• Click the Component tab to define the basic settings for tDenormalizeSortedRow.
• Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tSortRow
component.
• In the Input rows countfield, enter the number of the input rows to be processed or press
Ctrl+Space to access the context variable list and select the variable: tFileInputDeli
mited_1_NB_LINE.
• In the To denormalize panel, use the plus button to add a line and set the parameters to the
column to be denormalize. In this example we want to denormalize the name column.
• In the design workspace, select tLogRow and click the Component tab to define its basic settings.
For more information about tLogRow, see tLogRow on page 1977.
• Save your Job and press F6 to execute it.
632
tDenormalizeSortedRow
The result displayed on the console shows how the name column was denormalize.
633
tDie
tDie
Triggers the tLogCatcher component for exhaustive log before killing the Job.
Both tDie and tWarn components are closely related to the tLogCatcher component.They generally
make sense when used alongside a tLogCatcher in order for the log data collected to be encapsulated
and passed on to the output defined.
This component throws an error and kills the job. If you simply want to throw a warning, see the
tWarn documentation.
Basic settings
Die message Enter the message to be displayed before the Job is killed.
Note:
Note that any value greater than 255 can not be used as
an error code on Linux.
Global Variables
634
tDie
Usage
Related scenarios
For use cases in relation with tDie, see tLogCatcher scenarios:
• Catching messages triggered by a tWarn component on page 1971
• Catching the message triggered by a tDie component on page 1973
635
tDotNETInstantiate
tDotNETInstantiate
Invokes the constructor of a .NET object that is intended for later reuse.
tDotNETInstantiate instantiates an object in the .NET for later reuse.
Basic settings
Dll to load Type in the path, or browse to the DLL library containing
the classe(es) of interest or enter the assembly's name
to be used. For example, System.Data, Version=2.0.0.0,
Culture=neutral, PublicKeyToken=b77a5c561934e089 for an
OleDb assembly.
Fully qualified class name(i.e. ClassLibrary1. Enter a fully qualified name for the class of interest.
NameSpace2.Class1)
Value(s) to pass to the constructor Click the plus button to add one or more values to be
passed to the constructor for the object. Or, leave this table
empty to call a default constructor for the object.
The valid value(s) should be the parameters required by the
class to be used.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
636
tDotNETInstantiate
Usage
Related scenario
For a related scenario, see Utilizing .NET in Talend on page 643.
637
tDotNETRow
tDotNETRow
Facilitates data transform by utilizing custom or built-in .NET classes.
tDotNETRow sends data to and from libraries and classes within .NET or other custom DLL files.
Basic settings
Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Use a static method Select this check box to invoke a static method in .NET and
this will disable Use an existing instance check box.
Propagate a data to output Select this check box to propagate a transformed data to
output.
Use an existing instance Select this check box to reuse an existing instance of a .NET
object from the Existing instance to use list.
Existing instance to use: Select an existing instance of .NET
objects created by the other .NET components from the list.
Dll to load Type in the path, or browse to the DLL library containing
the class(es) of interest or enter the assembly's name
to be used. For example, System.Data, Version=2.0.0.0,
Culture=neutral, PublicKeyToken=b77a5c561934e089 for an
OleDb assembly.
Fully qualified class name(i.e. ClassLibrary1. Enter a fully qualified name for the class of interest.
NameSpace2.Class1)
Method name Fill this field with the name of the method to be invoked
in .NET.
638
tDotNETRow
Value(s) to pass to the constructor Click the plus button to add one or more lines for values to
be passed to the constructor for the object. Or, leave this
table empty to call a default constructor for the object.
The valid value(s) should be the parameters required by the
class to be used.
Method Parameters Click the plus button to add one or more lines for
parameters to be passed to the method.
Output value target column Select a column in the output row from the list to put value
into it.
Advanced settings
Create a new instance at each row Select this check box to create a new instance at each row
that passes through the component.
Method doesn't return a value Select this check box to invoke a method without returning
a value as a result of the processing.
Returns an instance of a .NET Object Select this check box to return an instance of a .NET object
as a result of a invoked method.
Store the returned value for later use Select this check box to store the returned value of a
method for later reuse in another tDotNETRow component.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
639
tDotNETRow
Note:
The required DLLs can be installed in the System32
folder or in the bin folder of the Java runtime to be used.
If you need to export a Job using this component to run
it outside the Studio, you have to specify the runtime
container of interest by setting the -Djava.library.path
argument accordingly. For users of Talend solutions
with ESB, to run a Job using this component in ESB
Runtime, you need to copy the runtime DLLs to the
%KARAF_HOME%/lib/wrapper/ directory.
Note: For information about configuring the tDotNetInstantiate and tDotNetRow components, see
Talend Components Reference Guide.
This article shows the way to invoke dll methods in a Talend Studio Job, which uses the two DotNet
family components.
640
tDotNETRow
• Place the file in a directory that the system variable Path points to (for example, %JAVA_HOME%
\bin, C:\Windows\System32, etc). You can also place it in another directory. In this case, you
need to add the directory as a library path using -Djava.library.path=path_to_direct
ory_containing_the_dll.
• The system assembly or the dll to integrate already exists.
Configuring tDotNetInstantiate
Procedure
1. Specifying the dll to load in the DDL to load field. The DLL can be a system assembly or a custom
DLL.
For system assemblies, you can specify the name of the desired system assembly (for example,
“System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken
=b77a5c561934e089”); for custom dlls, you need to provide the absolute path to the dll (for
example, "C:\\WINDOWS\\system32\\ClassLibrary1.dll)".
2. Specify the class name and the name space in the Fully qualified class name field
3. Set parameter values for the constructor in the Value(s) to pass to the constructor field.
Configuring tDotNetRow
641
tDotNETRow
Procedure
1. Add columns in the schema by clicking the Edit schema button or using the schema propagated
to this component. You need to specify one of the columns of the schema for holding the output
value (if any) using the Output value target column drop-down list.
2. Select Propagate data to output to pass the data from input to output.
3. Take either of the following two options.
• If you have deployed a tDotNetInstantiate component for creating the .Net object, select Use
an existing instance and select the component from the Existing instance to use drop-down
list to refer the corresponding .Net object.
• You can also create a new .Net object for use. To achieve this, make sure Use an existing
instance is not select, set DLL to load, Fully qualified class name, Method Name, and Value(s)
to pass to the constructor options as needed.
4. Provide the name of the method to invoke in the Method Name field.
5. Provide the parameter values for the method in rows of the Method Parameters filed. As
prompted, you can use input row values as parameter values (for example, input_row.colu
mn_name).
642
tDotNETRow
Note:
• For information about other options of this component, refer to Talend Components
Reference Guide.
• See Utilizing .NET in Talend section in Talend Components Reference Guide for an example of
this article.
Prerequisites
Before replicating this scenario, you need first to build up your runtime environment.
• Create the DLL to be loaded by tDotNETInstantiate
This example class built into .NET reads as follows:
using System;
using System.Collections.Generic;
using System.Text;
namespace Test1
{
public class Class1
{
string s = null;
public Class1(string s)
{
this.s = s;
}
643
tDotNETRow
}
This class reads the input value and adds the text Return Value from Class1: in front of this value. It
is compiled using the latest .NET.
• Install the runtime DLL from the latest .NET. In this scenario, we use janet-win32.dll on Windows
32-bit version and place it in the System32 folder.
Thus the runtime DLL is compatible with the DLL to be loaded.
Connecting components
Procedure
1. Drop the following components from the Palette to the design workspace: tDotNETInstantiate,
tDotNETRow and tLogRow.
2. Connect tDotNETInstantiate to tDotNETRow using a Trigger On Subjob OK connection.
3. Connect tDotNETRow to tLogRow using a Row Main connection.
Configuring tDotNETInstantiate
Procedure
1. Double-click tDotNETInstantiate to display its Basic settings view and define the component
properties.
2. Click the three-dot button next to the Dll to load field and browse to the DLL file to be loaded.
Alternatively, you can fill the field with an assembly. In this example, we use :
"C:/Program Files/ClassLibrary1/bin/Debug/ClassLibrary1.dll""
3. Fill the Fully qualified class name field with a valid class name to be used. In this example, we
use:
"Test1.Class1"
4. Click the plus button beneath the Value(s) to pass to the constructor table to add a new line for
the value to be passed to the constructor.
In this example, we use:
"Hello world"
644
tDotNETRow
Configuring tDotNETRow
Procedure
1. Double-click tDotNETRow to display its Basic settings view and define the component properties.
Click the plus button beneath the table to add a new column to the schema and click OK to save
the setting.
6. Select newColumn from the Output value target column list.
Configuring tLogRow
Procedure
1. Double-click tLogRow to display its Basic settings view and define the component properties.
645
tDotNETRow
2. Click Sync columns button to retrieve the schema defined in the preceding component.
3. Select Table in the Mode area.
Results
Save your Job and press F6 to execute it.
From the result, you can read that the text Return Value from Class1 is added in front of the
retrieved value Hello world.
646
tDropboxConnection
tDropboxConnection
Creates a Dropbox connection to a given account that the other Dropbox components can reuse.
Basic settings
Access Token Enter the access token required by the Dropbox account you
need to connect to. This access token allows the Studio to
make Dropbox API calls for that Dropbox account.
Note that a Dropbox App should have been created under
that account before generating the access token. For further
information about a Dropbox access token, see https://
www.dropbox.com/developers/blog/94/generate-an-access-
token-for-your-own-account.
Use HTTP Proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
See Uploading files to Dropbox on page 655
647
tDropboxDelete
tDropboxDelete
Removes a given folder or file from Dropbox.
Basic settings
Use Existing Connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Token Enter the access token required by the Dropbox account you
need to connect to. This access token allows the Studio to
make Dropbox API calls for that Dropbox account.
Note that a Dropbox App should have been created under
that account before generating the access token. For further
information about a Dropbox access token, see https://
www.dropbox.com/developers/blog/94/generate-an-access-
token-for-your-own-account.
Use HTTP Proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.
Path Enter the path on Dropbox pointing to the folder or the file
you need to remove.
Note that the path string should start with a slash (/). It is
the root folder of the Dropbox App for which you are using
the current access token.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
648
tDropboxDelete
Related scenarios
No scenario is available for the Standard version of this component yet.
649
tDropboxGet
tDropboxGet
Downloads a selected file from a Dropbox account to a specified local directory.
Basic settings
Use Existing Connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Token Enter the access token required by the Dropbox account you
need to connect to. This access token allows the Studio to
make Dropbox API calls for that Dropbox account.
Note that a Dropbox App should have been created under
that account before generating the access token. For further
information about a Dropbox access token, see https://
www.dropbox.com/developers/blog/94/generate-an-access-
token-for-your-own-account.
Use HTTP Proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.
Path Enter the path on Dropbox pointing to the file you need to
download.
Note that the path string should start with a slash (/). It is
the root folder of the Dropbox App for which you are using
the current access token.
Save As File Select this check box to display the File field and browse
to, or enter the local directory where you want to store the
downloaded file. The existing file, if any, is replaced.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component.
The schema of this component is read-only. You can click
the button next to Edit schema to view the predefined
schema that contains the following two columns:
• fileName: the name of the downloaded file.
• content: the content of the downloaded file.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
650
tDropboxGet
Global Variables
Usage
Usage rule This component can be used alone or along with other
components via the Iterate link or a trigger link such as On
Subjob OK.
Related scenarios
No scenario is available for the Standard version of this component yet.
651
tDropboxList
tDropboxList
Lists the files stored in a specified directory on Dropbox.
Basic settings
Use Existing Connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Token Enter the access token required by the Dropbox account you
need to connect to. This access token allows the Studio to
make Dropbox API calls for that Dropbox account.
Note that a Dropbox App should have been created under
that account before generating the access token. For further
information about a Dropbox access token, see https://
www.dropbox.com/developers/blog/94/generate-an-access-
token-for-your-own-account.
Use HTTP Proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.
Path Enter the path pointing to the folder you need to list the
files from, or enter the path pointing to the exact file you
need to read.
Note that the path string should start with a slash (/). It is
the root folder of the Dropbox App for which you are using
the current access token.
List Type Select the type of data you need to list from the specified
path.
Include subdirectories Select this check box to list files from any existing sub-
folders in addition to the files in the directory defined in
the Path field.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
NAME The name of the remote file being processed. This is a Flow
variable and it returns a string.
652
tDropboxList
IS_FILE The boolean result of the file listing. This is a Flow variable
and it returns a boolean. The result Yes indicates that the
listed data is of the type File; otherwise, the type is Folder.
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
653
tDropboxPut
tDropboxPut
Uploads data to Dropbox from either a local file or a given data flow.
Basic settings
Use Existing Connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Token Enter the access token required by the Dropbox account you
need to connect to. This access token allows the Studio to
make Dropbox API calls for that Dropbox account.
Note that a Dropbox App should have been created under
that account before generating the access token. For further
information about a Dropbox access token, see https://
www.dropbox.com/developers/blog/94/generate-an-access-
token-for-your-own-account.
Use HTTP Proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.
Path (File Only) Enter the path pointing to the file you need to write
contents in. This file will be created on the fly if it does not
exist.
Note that the path string should start with a slash (/). It is
the root folder of the Dropbox App for which you are using
the current access token.
Upload Incoming content as File Select this radio button to read data directly from the input
flow of the preceding component and write the data into
the file specified in the Path field.
654
tDropboxPut
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Note that the schema of this component is read-only with
a single column named content and it receives data from
the content column of its input schema only. This means
that you must use a content column in the input data flow
to carry the data to be uploaded. This type of column is typ
ically provided by the tFileInputRaw component. For further
information, see tFileInputRaw on page 1085.
The Schema field is not available when you have selected
the Expose as OutputStream or the Upload local file radio
button.
Upload local file Select this radio button to upload a locally stored file to Dro
pbox. In the File field that is displayed, you need to enter
the path or browse to this file.
Expose as OutputStream Select this check box to expose the output stream of this
component as a variable named OUTPUTSTREAM so that
the other components can reuse this variable to write the
contents to be uploaded into the exposed output stream.
For example, you can use the Use output stream feature
of the tFileOutputDelimited component to feed a given
tDropboxPut's exposed output stream. For further
information, see tFileOutputDelimited on page 1113.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
655
tDropboxPut
Before replicating this scenario, you need to create a Dropbox App under the Dropbox account to be
used. In this scenario, the Dropbox App to be used is named to talenddrop and thus the root folder
in which files are uploaded is talenddrop, too. In addition, the access token to this folder has been
generated from the App console provided by Dropbox.
For further information about a Dropbox App, see https://www.dropbox.com/developers/apps/.
Connecting to Dropbox
Procedure
1. Double-click tDropboxConnection to open its Component view.
656
tDropboxPut
2. In the Access token field, paste the token that you have generated via the App console of Dropbox
for accessing the Dropbox App folder to be used.
Procedure
1. Double-click tFixedFlowInput to open its Component view.
In this scenario, only three rows of sample data are created to indicate three countries and their
calling codes.
33;France
86;China
81;Japan
2. Click the [...] button next to Edit schema to open the schema editor.
3. Click the [+] button twice to add two rows and in the Column column, rename them to code and
country, respectively.
657
tDropboxPut
4. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog
box.
5. In the Mode area, select the Use Inline Table radio button. The code and the country column have
been automatically created in this table.
6. Enter the sample data mentioned above in this table.
Procedure
1. Double-click tFileOutputDelimited to open its Component view.
2. Select the Use output stream check box to write the data to be outputted into a given output
stream.
3. In the Output stream field, enter the code to define the output stream you need to write data
in. In this scenario, it is the output stream of the tDropboxPut_1 component linked with the
current component. Thus the code used to write the data reads as follows:((java.io.Outp
utStream)globalMap.get("tDropboxPut_1_OUTPUTSTREAM"))
Note that in this example code, the tDropboxPut component has the number 1 as its affix, w
hich represents its component ID distributed automatically within this Job. If the tDropboxPut
component you are using has a different ID, you need to adapt the code to that ID number.
4. Click Edit schema to verify that the schema of this component is identical with that of the
preceding tFixedFlowInput component. If not so, click the Sync columns button to make both of
the schemas identical.
5. Navigate to the Advanced settings tab.
658
tDropboxPut
6. Mark the Custom the flush buffer size check box. This automatically adds 1 in the Row number
field.
2. Select the Use existing connection check box to reuse the connection created by tDropboxConnec
tion.
3. In the Path field, enter the path pointing to the file you need to write data in, with a slash (/) at
the beginning of the path. For example, enter /calling_code.csv.
4. In the Upload mode area, select the Rename if Existing radio button.
5. Select the Expose As OutputStream radio button to expose the output stream of this component
so that the other component, tFileOutputDelimited in this scenario, can write data in the stream.
This component is used to read a picture named esb_architecture.png into the data flow. In the
real-world practice, this file can be of many other formats, such as pdf, xls, ppt or mp3.
659
tDropboxPut
2. In the Filename field, enter the path or browse to the file you need to upload.
3. In the Mode area, select the Read the file as a bytes array radio button.
2. Select the Use existing connection check box to reuse the connection created by tDropboxConnec
tion.
3. In the Path field, enter the path pointing to the file you need to write data in, with a slash (/) at
the beginning of the path. For example, enter /architecture.png.
4. In the Upload mode area, select Rename if existing.
5. Select the Upload incoming content as file radio button. This displays the Edit schema button to
allow you to view the read-only schema of this component.
660
tDTDValidator
tDTDValidator
Helps at controlling data and structure quality of the file to be processed
Validates the XML input file against a DTD file and sends the validation log to the defined output.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component.
The schema of this component is read-only. It contains
standard information regarding the file validation.
If XML is valid, display If XML is invalid, display Type in a message to be displayed in the Run console based
on the result of the comparison.
Print to console Select this check box to display the validation message.
Advanced settings
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
661
tDTDValidator
Usage
Click the plus button to add a filemask line and enter the filemask: *.xml. Remember Java code
requires double quotes.
Set the path of the XML files to be verified.
Select No from the Case Sensitive drop-down list.
662
tDTDValidator
4. In the tDTDValidate Component view, the schema is read-only as it contains standard log
information related to the validation process.
In the Dtd file field, browse to the DTD file to be used as reference.
5. Click in the XML file field, press Ctrl+Space bar to access the variable list, and double-click the
current filepath global variable: tFileList.CURRENT_FILEPATH.
6. In the various messages to display in the Run tab console, use the jobName variable to recall
the job name tag. Recall the filename using the relevant global variable: ((String)globa
lMap.get("tFileList_1_CURRENT_FILE")). Remember Java code requires double quotes.
Select the Print to Console check box.
7. In the tMap component, drag and drop the information data from the standard schema that you
want to pass on to the output file.
8. Once the Output schema is defined as required, add a filter condition to only select the log
information data when the XML file is invalid.
Follow the best practice by typing first the wanted value for the variable, then the operator based
on the type of data filtered then the variable that should meet the requirement. In this case: 0 ==
row1.validate.
9. Then connect (if not already done) the tMap to the tFileOutputDelimited component using a Row
> Main connection. Name it as relevant, in this example: log_errorsOnly.
10. In the tFileOutputDelimited Basic settings, define the destination filepath, the field delimiters and
the encoding.
11. Save your Job and press F6 to run it.
663
tDTDValidator
On the Run console the messages defined display for each of the files. At the same time the
output file is filled with the log data for invalid files.
664
tDynamoDBInput
tDynamoDBInput
Retrieves data from an Amazon DynamoDB table and sends them to the component that follows for
transformation.
Basic settings
Access Key Enter the access key ID that uniquely identifies an AWS
Account. For further information about how to get your
Access Key and Secret Key, see Getting Your AWS Access
Keys.
Secret Key Enter the secret access key, constituting the security
credentials in combination with the access Key.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. These credentials can be used on Amazon
EC2 instances, and are delivered through the Amazon
EC2 metadata service. To use this option, your Job must
be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.
Use End Point Select this check box and in the Server Url field displayed,
specify the Web service URL of the DynamoDB database
service.
Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks
(e.g. "us-east-1") in the list. For more information about the
AWS Region, see Regions and Endpoints.
665
tDynamoDBInput
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Use advanced key condition expression Select this check box and in the Advanced key condition
expression field displayed, specify the key condition
expressions used to determine the items to be read from the
table or index.
Key condition expression Specify the key condition expressions used to determine the
items to be read. Click the [+] button to add as many rows
as needed, each row for a key condition expression, and set
the following attributes for each expression:
• Key Column: Enter the name of the key column.
• Function: Select the function for the key condition
expression.
• Value1: Specify the value used in the key condition
expression.
• Value2: Specify the second value used in the key
condition expression if needed, depending on the
function you selected.
Note that only the items that meet all the key conditions
defined in this table can be returned.
This table is not available when the Use advanced key
condition expression check box is selected.
Use filter expression Select this check box to use the filter expression for the
query or scan operation.
Use advanced filter expression Select this check box and in the Advanced filter expression
field displayed, specify the filter expressions used to refine
the data after it is queried or scanned and before it is
returned to you.
666
tDynamoDBInput
Filter expression Specify the filter expressions used to refine the results
returned to you. Click the [+] button to add as many rows
as needed, each row for a filter expression, and set the fol
lowing attributes for each expression:
• Column: Enter the name of the column used to refine
the results.
• Function: Select the function for the filter expression.
• Value1: Specify the value used in the filter expression.
• Value2: Specify the second value used in the filter
expression if needed, depending on the function you
selected.
Note that only the items that meet all the filter conditions
defined in this table can be returned.
This table is available when the Use filter expression check
box is selected and the Use advanced filter expression
check box is cleared.
Value mapping Specify the placeholders for the expression attribute values.
• value: Enter the expression attribute value.
• placeholder: Specify the placeholder for the
corresponding value.
For more information, see Expression Attribute Values.
Name mapping Specify the placeholders for the attribute names that
conflict with the DynamoDB reserved words.
• name: Enter the name of the attribute that conflicts
with a DynamoDB reserved word.
• placeholder: Specify the placeholder for the
corresonding attribute name.
For more information, see Expression Attribute Names.
Advanced settings
STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,
sts.amazonaws.com, where session credentials are
retrieved from.
This check box is available only when the Assume role
check box is selected.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
667
tDynamoDBInput
Usage
This data has two columns: DeliverID and EventPayLoad, seperated by a semicolon (;). The JSON
document itself is stored in the EventPayLoad column.
668
tDynamoDBInput
Procedure
1. In the Integration perspective of the Studio, create an empty Standard Job from the Job Designs
node in the Repository tree view.
2. In the workspace, enter the name of the component to be used and select this component from
the list that appears. In this scenario, the components are tFixedflowInput, tDynamoDBOutput,
tDynamoDBInput and tLogRow.
The tFixedFlowInput component is used to load the sample data into the data flow. In the real-
world practice, use the input component specific to the data format or the source system to be
used instead of tFixedFlowInput.
3. Connect tFixedFlowInput to tDynamoDBOutput and connect tDynamoDBInput to tLogRow using
the Row > Main link.
4. Connect tFixedFlowInput to tDynamoDBInput using the Trigger > On Subjob Ok link.
Procedure
1. Double-click tFixedFlowInput in its Component view.
Example
2. Click the ... button next to Edit schema to open the schema editor.
669
tDynamoDBInput
Example
3. Click the + button twice to add two rows, each representing a column of the sample data, and in
the Column column, name these columns to DeliveryId and EventPayload, respectively.
4. On the row for the DeliveryId column, select the check box in the Key column to use
this DeliveryID column as the partition key column of the DynamoDB table to be used. A
DynamoDB table requires a partition key column.
5. Click OK to validate these changes and once prompted, accept the propagation of the schema to
the connected component, tDynamoDBOutput.
6. In the Mode area, select the Use Inline content radio box and enter the sample data in the field
that is displayed:
Example
670
tDynamoDBInput
Example
8. Click the ... button next to Edit schema to open the schema editor. This component should have
retrieved the schema from tFixedFlowInput.
Example
9. In the DB Type column, select JSON for the EventPayload column, as this is the column in
which the JSON documents are stored.
10. In the Access key and Secret key fields, enter the credentials of the AWS account to be used to
access your DynamoDB database.
11. From the Region drop-down list, select the AWS region to be used. If you do not know which
region to select, ask the administrator of your AWS system for more information.
12. From the Action on table drop-down list, select Drop table is exists and create.
13. From the Action on data drop-down list, select Insert.
14. In the Table name field, enter the name to be used for the DynamoDB table to be created.
671
tDynamoDBInput
15. In the Partition Key field, enter the name of the column to be used to provide parition keys. In this
example, it is DeliveryId.
Procedure
1. Double-click tDynamoDBInput to open its Component view.
Example
2. Click the ... button next to Edit schema to open the schema editor.
672
tDynamoDBInput
Example
3. Click the + button twice to add two rows, each representing a column of the sample data, and in
the Column column, name these columns to DeliveryId and EventPayload, respectively.
4. On the row for the DeliveryId column, select the check box in the Key column to use
this DeliveryID column as the partition key column of the DynamoDB table to be used. A
DynamoDB table requires a partition key column.
5. In the DB Type column, select JSON for the EventPayload column, as this is the column in
which the JSON documents are stored.
6. In the Access key and Secret key fields, enter the credentials of the AWS account to be used to
access your DynamoDB database.
7. From the Region drop-down list, select the same region as you selected in the previous steps for
tDynamoDBOutput.
8. From the Action drop-down list, select Scan.
9. In the Table Name field, enter the name of the DynamoDB table to be created by
tDynamoDBOutput.
10. Select the Use filter expression check box and then the Use advanced filter expression check box.
11. In the Advanced filter expression field, enter the filter to be used to select JSON documents.
Example
"EventPayload.customerOrderNumber.deliveryCode = :value"
The part on the left of the equals sign reflects the structure within a JSON document of the samp
le data, in the EventPayload column. The purpose is to use the value of deliveryCode
element to filter the document to be read.
You need to define the :value placeholder in the Value mapping table.
12. Under the Value mapping table, click the + button to add one row and do the following:
a) In the value column, enter the value of the JSON element to be used as a filter.
Example
In this example, this element is deliveryCode and you need to extract the JSON document
in which the value of the deliveryCode element is 261. As this value is a string, enter 261
within double quotation marks.
673
tDynamoDBInput
Results
Once done, the retrieved JSON document is displayed in the console of the Run view of the Studio.
In the created DynamoDB table, you can see the both of the sample JSON documents.
674
tDynamoDBOutput
tDynamoDBOutput
Creates, updates or deletes data in an Amazon DynamoDB table.
Basic settings
Access Key Enter the access key ID that uniquely identifies an AWS
Account. For further information about how to get your
Access Key and Secret Key, see Getting Your AWS Access
Keys.
Secret Key Enter the secret access key, constituting the security
credentials in combination with the access Key.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. These credentials can be used on Amazon
EC2 instances, and are delivered through the Amazon
EC2 metadata service. To use this option, your Job must
be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.
Use End Point Select this check box and in the Server Url field displayed,
specify the Web service URL of the DynamoDB database
service.
Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks
(e.g. "us-east-1") in the list. For more information about the
AWS Region, see Regions and Endpoints.
675
tDynamoDBOutput
• Create table: The table does not exist and gets created.
• Create table if does not exist: The table is created if it
does not exist.
• Drop table if exist and create: The table is removed if it
already exists and created again.
Action on data On the data of the table defined, you can perform one of the
following operations:
• Insert: Insert new items from the input flow.
• Update: Update existing items according to the input
flow.
• Delete: Remove existing items according to the input
flow.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Advanced settings
STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,
sts.amazonaws.com, where session credentials are
retrieved from.
This check box is available only when the Assume role
check box is selected.
676
tDynamoDBOutput
Read Capacity Unit Specify the number of read capacity units. For more
information, see Amazon DynamoDB Provisioned
Throughput.
Write Capacity Unit Specify the number of write capacity units. For more
information, see Amazon DynamoDB Provisioned
Throughput.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
677
tEDIFACTtoXML
tEDIFACTtoXML
Transforms an EDIFACT message file into the XML format for better readability to users and
compatibility with processing tools.
This component reads a United Nations/Electronic Data Interchange For Administration, Commerce
and Transport (UN/EDIFACT) message and transforms it into the XML format according to the
EDIFACT version and the EDIFACT family.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component.
The schema of this component is fixed and read-only, with
only one column: document.
Ignore new line Select this check box to skip carriage returns in the input
file.
Die on error Select this check box to stop Job execution when an error
is encountered. By default, this check box is cleared, and
therefore illegal rows are skipped and the process is
completed for the error free rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
678
tEDIFACTtoXML
Usage
Results
2. Fill the EDI filename field with the full path to the input EDIFACT message file.
In this use case, the input file is 99a_cuscar.edi.
679
tEDIFACTtoXML
3. From EDI version list, select the EDIFACT version of the input file, D99A in this use case.
4. Select the Ignore new line check box to skip the carriage return characters in the input file during
the transformation.
5. Leave the other parameters as they are.
6. Double-click the tFileOutputXML component to show its Basic settings view.
7. Fill the File Name field with the full path to the output XML file you want to generate.
In this use case, the output XML is 99a_cuscar.xml.
8. Leave the other parameters as they are.
Results
The input EDIFACT CUSCAR message file is transformed into the XML format and the output XML file
is generated as defined.
680
tEDIFACTtoXML
681
tELTGreenplumInput
tELTGreenplumInput
Adds as many Input tables as required for the most complicated Insert statement.
The three ELT Greenplum components are closely related, in terms of their operating conditions.
These components should be used to handle Greenplum DB schemas to generate Insert statements,
including clauses, which are to be executed in the DB output table defined.
Provides the table schema to be used for the SQL statement to execute.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
682
tELTGreenplumInput
Global Variables
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Mapping data using a simple implicit join on page 686
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
683
tELTGreenplumMap
tELTGreenplumMap
Uses the tables provided as input to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
The three ELT Greenplum components are closely related, in terms of their operating conditions.
These components should be used to handle Greenplum DB schemas to generate Insert statements,
including clauses, which are to be executed in the DB output table defined.
Helps you to build the SQL statement graphically, using the table provided as input.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
ELT Greenplum Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.
684
tELTGreenplumMap
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
685
tELTGreenplumMap
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
686
tELTGreenplumMap
Dropping components
Procedure
1. Add the following components from the Palette to the workspace:
• tGreenplumConnection
• two tELTGreenplumInput
• tELTGreenplumMap
• tELTGreenplumOutput
• tGreenplumCommit
• tGreenplumInput
• tLogRow
2. Rename the following components:
• tGreenplumConnection to connect_to_greenplum_host
• two tELTGreenplumInput to employee+statecode and statecode
• tELTGreenplumMap to match+map
• tELTGreenplumOutput to map_data_output
• tGreenplumCommit to commit_to_host
• tGreenplumInput to read_map_output_table
• tLogRow to show_map_data
3. Connect the components in the Job:
• link tGreenplumConnection to tELTGreenplumMap using an OnSubjobOk trigger
• link tELTGreenplumMap to tGreenplumCommit using an OnSubjobOk trigger
• link tGreenplumCommit to tGreenplumInput using an OnSubjobOk trigger
• link tGreenplumInput to tLogRow using a Row > Main connection
The two tELTGreenplumInput components and tELTGreenplumOutput will be linked to
tELTGreenplumMap later once the relevant tables have been defined.
687
tELTGreenplumMap
a) In the Host and Port fields, enter the context variables for the Greenplum server.
b) In the Database field, enter the context variable for the Greenplum database.
c) In the Username and Password fields, enter the context variables for the authentication
credentials.
For more information on context variables, see Talend Studio User Guide.
2. Double-click employee+statecode to open its Basic settings view in the Component tab.
a) In the Default table name field, enter the name of the source table, namely employee_by_st
atecode.
b) Click the [...] button next to the Edit schema field to open the schema editor.
c) Click the [+] button to add three columns, namely id, name and statecode, with the data type as
INT4, VARCHAR, and INT4 respectively.
d) Click OK to close the schema editor.
688
tELTGreenplumMap
a) In the Default table name field, enter the name of the lookup table, namely statecode.
4. Click the [...] button next to the Edit schema field to open the schema editor.
a) Click the [+] button to add two columns, namely state and statecode, with the data type as
VARCHAR and INT4 respectively.
b) Click OK to close the schema editor.
c) Link statecode to tELTGreenplumMap using the output statecode.
5. Click tELTGreenplumMap to open its Basic settings view in the Component tab.
689
tELTGreenplumMap
7. Click the [+] button on the upper left corner to open the table selection box.
a) Select tables employee_by_statecode and statecode in sequence and click Ok. The tables appear
on the left panel of the editor.
8. On the upper right corner, click the [+] button to add an output table, namely employee_by_state.
a) Click Ok to close the map editor.
9. Double-click tELTGreenplumOutput to open its Basic settings view in the Component tab.
690
tELTGreenplumMap
a) In the Default table name field, enter the name of the output table, namely employee_by_state.
10. Click the [...] button next to the Edit schema field to open the schema editor.
a) Click the [+] button to add three columns, namely id, name and state, with the data type as
INT4, VARCHAR, and VARCHAR respectively.
b) Click OK to close the schema editor.
c) Link tELTGreenplumMap to tELTGreenplumOutput using the table output employee_by_state.
d) Click OK on the pop-up window below to retrieve the schema of tELTGreenplumOutput.
ow the map editor's output table employee_by_state shares the same schema as that of
tELTGreenplumOutput.
11. Double-click tELTGreenplumMap to open the map editor.
D.
Drop the columns id and name from table employee_by_statecode as well as the column statecode
from table statecode to their counterparts in the output table employee_by_state.
Click Ok to close the map editor.
691
tELTGreenplumMap
a) Drop the column statecode from table employee_by_statecode to its counterpart of the table
statecode, looking for the records in the two tables that have the same statecode values.
12. Double-click tGreenplumInput to open its Basic settings view in the Component tab.
a) In the Mode area, select Table (print values in cells of a table for a better display.
As shown above, the desired employee records have been written to the table employee_by_state,
presenting clearer geographical information about the employees.
692
tELTGreenplumMap
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725
693
tELTGreenplumOutput
tELTGreenplumOutput
Executes the SQL Insert, Update and Delete statement to the Greenplum database
The three ELT Greenplum components are closely related, in terms of their operating conditions.
These components should be used to handle Greenplum DB schemas to generate Insert statements,
including clauses, which are to be executed in the DB output table defined.
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.
Basic settings
Action on data On the data of the table defined, you can perform the
following operation:
Insert: Adds new entries to the table.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
694
tELTGreenplumOutput
Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.
Default Table Name Enter the default table name, between double quotation
marks.
Default Schema Name Enter the default schema name,between double quotation
marks.
Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.
Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field which appears.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
695
tELTGreenplumOutput
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Mapping data using a simple implicit join on page 686
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
696
tELTHiveInput
tELTHiveInput
Replicates the schema, which the tELTHiveMap component that follows will use, of the input Hive
table.
The three ELT Hive components are closely related, in terms of their operating conditions. These
components should be used to handle Hive DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
This component provides, for the tELTHiveMap component that follows, the input schema of the Hive
table to be used.
Basic settings
Built-In: You create and store the schema locally for this
component only.
Repository: You have already created the schema and stored
it in the Repository. You can reuse it in various projects and
Job designs.
Edit schema Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Default table name Enter the name of the input table to be used.
Default schema name Enter the name of the database schema to which the input
table to be used is related.
697
tELTHiveInput
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Note:
The ELT components do not handle actual data flow but
only schema information.
Usage with Dataproc The ELT Hive components require Tez to be installed on the
Google Cloud Dataproc cluster to be used.
• Use the initialization action explained in this Google
Cloud Platform documentation: Apache Tez on
Dataproc.
• For more details about the general concept of the
initialization actions in a Google Cloud Dataproc
cluster, see the related Google documentation:
Initialization actions.
Related scenarios
• Joining table columns and writing them into Hive on page 710
• Aggregating Snowflake data using context variables as table and connection names on page 725
698
tELTHiveMap
tELTHiveMap
Builds graphically the Hive QL statement in order to transform data.
The three ELT Hive components are closely related, in terms of their operating conditions. These
components should be used to handle Hive DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
This component uses the tables provided as input, to feed the parameter in the built statement. The
statement can include inner or outer joins to be implemented between tables or between one table
and its aliases.
Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:
API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.
Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.
Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.
699
tELTHiveMap
If you are not certain about your project ID, check it in the
Manage Resources page of your Google Cloud Platform
services.
Region From this drop-down list, select the Google Cloud region
to be used.
Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.
Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.
HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.
• The Password is defined when creating your
HDInsight cluster for authentication to this cluster.
Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.
700
tELTHiveMap
Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.
Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.
Note:
This field is not available when you select Embedded
from the Connection mode list.
Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave
701
tELTHiveMap
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.
For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the
keytab file to be used.
Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog
702
tELTHiveMap
Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their
hostnames. This actually sets the dfs.client.use
.datanode.hostname property to true. When
connecting to a S3N filesystem, you must select this
check box.
For further information about these parameters, see
the documentation or contact the administrator of the
Hadoop cluster to be used.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.
Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you
703
tELTHiveMap
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
ELT Hive Map editor The ELT Map editor helps you to define the output schema
as well as build graphically the Hive QL statement to be
executed. The column names of schema can be different
from the column names in the database.
If you use context variables in the Expression column in the
Map editor to map the input and the output schemas, put
single quotation marks around these context variables, for
example, 'context.v_erpName'.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
704
tELTHiveMap
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
705
tELTHiveMap
Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Execution engine Select this check box and from the drop-down list, select
the framework you need to use to run the Job.
This list is available only when you are using the Embedded
mode for the Hive connection and the distribution you are
working with is:
• Custom: this option allows you connect to a
distribution supporting Tez but not officially supported
by Talend .
Before using Tez, ensure that the Hadoop cluster you are
using supports Tez. You will need to configure the access to
the relevant Tez libraries via the Advanced settings view of
this component.
For further information about Hive on Tez, see Apache's
related documentation in https://cwiki.apache.org/con
fluence/display/Hive/Hive+on+Tez. Some examples are
presented there to show how Tez can be used to gain
performance over MapReduce.
Store by HBase Select this check box to display the parameters to be set to
allow the Hive components to access HBase tables:
• Once this access is configured, you will be able to use,
in tHiveRow and tHiveInput, the Hive QL statements to
read and write data in HBase.
• If you are using the Kerberos authentication, you
need to define the HBase related principals in the
corresponding fields that are displayed.
For further information about this access involving Hive and
HBase, see Apache's Hive documentation about Hive/HBase
integration.
Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.
Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.
Define the jars to register for HBase Select this check box to display the Register jar for HBase
table, in which you can register any missing jar file required
by HBase, for example, the Hive Storage Handler, by default,
registered along with your Hive installation.
Register jar for HBase Click the [+] button to add rows to this table, then, in the
Jar name column, select the jar file(s) to be registered and
706
tELTHiveMap
Advanced settings
Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
707
tELTHiveMap
Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are
both 1000 which are normally appropriate for running the
computations.
Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.
Global Variables
708
tELTHiveMap
Usage
Note:
The ELT components do not handle actual data flow but
only schema information.
Usage with Dataproc The ELT Hive components require Tez to be installed on the
Google Cloud Dataproc cluster to be used.
• Use the initialization action explained in this Google
Cloud Platform documentation: Apache Tez on
Dataproc.
• For more details about the general concept of the
initialization actions in a Google Cloud Dataproc
cluster, see the related Google documentation:
Initialization actions.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
709
tELTHiveMap
710
tELTHiveMap
2. Create two input Hive tables containing the columns you want to join and aggregate these
columns into the output Hive table, agg_result. The statements to be used are: create table
customer (id int, name string, address string, idState int, id2 int,
regTime string, registerTime string, sum1 string, sum2 string) row
format delimited fields terminated by ';' location '/user/ychen/
hive/table/customer' and create table state_city (id int, postal
string, state string, capital int, mostpopulouscity string) row format
delimited fields terminated by ';' location '/user/ychen/hive/table/
state_city'
3. Use tHiveRow to load data into the two input tables, customer and state_city. The statements to
be used are: "LOAD DATA LOCAL INPATH 'C:/tmp/customer.csv' OVERWRITE INTO
TABLE customer" and "LOAD DATA LOCAL INPATH 'C:/tmp/State_City.csv'
OVERWRITE INTO TABLE state_city"
The two files, customer.csv and State_City.csv, are two local files we created for this scenario. You
need to create your own files to provide data to the input Hive tables. The data schema of each
file should be identical with their corresponding table.
You can use tRowGenerator and tFileOutputDelimited to create these two files easily. For
further information about these two components, see tRowGenerator on page 3134 and
tFileOutputDelimited on page 1113.
For further information about the Hive query language, see https://cwiki.apache.org/confluence/
display/Hive/LanguageManual.
711
tELTHiveMap
2. Click the [...] button next to Edit schema to open the schema editor.
3. Click the button as many times as required to add columns and rename them to replicate the
schema of the customer table we created earlier in Hive.
4. In the Default table name field, enter the name of the input table, customer, to be processed by
this component.
5. Double-click the other tELTHiveInput component using the state_city link to open its Component
view.
6. Click the [...] button next to Edit schema to open the schema editor.
712
tELTHiveMap
7. Click the button as many times as required to add columns and rename them to replicate the
schema of the state_city table we created earlier in Hive.
8. In the Default table name field, enter the name of the input table, state_city, to be processed by
this component.
Procedure
1. Click tELTHiveMap, then, click Component to open its Component view.
2. In the Version area, select the Hadoop distribution you are using and the Hive version.
3. In the Connection mode list, select the connection mode you want to use. If your distribution is
HortonWorks, this mode is Embedded only.
713
tELTHiveMap
4. In the Host field and the Port field, enter the authentication information for the component to
connect to Hive. For example, the host is talend-hdp-all and the port is 9083.
5. Select the Set Jobtracker URI check box and enter the location of the Jobtracker. For example,
talend-hdp-all:50300.
6. Select the Set NameNode URI check box and enter the location of the NameNode. For example,
hdfs://talend-hdp-all:8020. If you are using WebHDFS, the location should be webhdfs://mast
ernode:portnumber; WebHDFS with SSL is not supported yet.
Procedure
1. Click ELT Hive Map Editor to map the schemas
2. On the input side (left in the figure), click the Add alias button to add the table to be used.
3. In the pop-up window, select the customer table, then click OK.
4. Repeat the operations to select the state_city table.
5. Drag and drop the idstate column from the customer table onto the id column of the state_city
table. Thus an inner join is created automatically.
6. On the output side (the right side in the figure), the agg_result table is empty at first. Click
at the bottom of this side to add as many columns as required and rename them to replicate the
schema of the agg_result table you created earlier in Hive.
714
tELTHiveMap
Note:
The type column is the partition column of the agg_result table and should not be replicated in
this schema. For further information about the partition column of the Hive table, see the Hive
manual.
7. From the customer table, drop id, name, address, and sum1 to the corresponding columns in the
agg_result table.
8. From the state_city table, drop postal, state, capital and mostpopulouscity to the corresponding
columns in the agg_result table.
In this scenario, context variables are not used in the Expression column in the Map editor. If you
use context variables, put them in single quotation marks. For example:
715
tELTHiveMap
2. If this component does not have the same schema of the preceding component, a warning icon
appears. In this case, click the Sync columns button to retrieve the schema from the preceding one
and once done, the warning icon disappears.
3. In the Default table name field, enter the output table you want to write data in. In this example,
it is agg_result.
4. In the Field partition table, click to add one row. This allows you to write data in the partition
column of the agg_result table.
This partition column was defined the moment we created the agg_result table using
partitioned by (type string) in the Create statement presented earlier. This partition
column is type, which describes the type of a customer.
5. In Partition column, enter type without any quotation marks and in Partition value, enter
prospective in single quotation marks.
Results
Once done, verify agg_result in Hive using, for example,
716
tELTHiveMap
This figure present only a part of the table. You can find that the selected input columns are
aggregated and written into the agg_result table and the partition column is filled with the value
prospective.
Related scenarios
• Joining table columns and writing them into Hive on page 710
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725
717
tELTHiveOutput
tELTHiveOutput
Works alongside tELTHiveMap to write data into the Hive table.
The three ELT Hive components are closely related, in terms of their operating conditions. These
components should be used to handle Hive DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
This component executes the query built by the preceding tELTHiveMap component to write data into
the specified Hive table.
Basic settings
Built-In: You create and store the schema locally for this
component only.
Repository: You have already created the schema and stored
it in the Repository. You can reuse it in various projects and
Job designs.
Edit schema Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Default table name Enter the default name of the output table you want to
write data in.
718
tELTHiveOutput
Default schema name Enter the name of the default database schema to which the
output table to be used is related to.
Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.
Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field that appears.
If this table is related to a different database schema from
the default one, you also need to enter the name of that
database schema. The syntax is schema_name.table_name.
The target table uses the Parquet format If the table in which you need to write data is a PARQUET
table, select this check box.
Then from the Compression list that appears, select the
compression mode you need to use to handle the PARQUET
file. The default mode is Uncompressed.
Field Partition In Partition Column, enter the name, without any quotation
marks, of the partition column of the Hive table you want to
write data in.
In Partition Value, enter the value you want to use, in single
quotation marks, for its corresponding partition column.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
719
tELTHiveOutput
Usage
Note:
The ELT components do not handle actual data flow but
only schema information.
Usage with Dataproc The ELT Hive components require Tez to be installed on the
Google Cloud Dataproc cluster to be used.
• Use the initialization action explained in this Google
Cloud Platform documentation: Apache Tez on
Dataproc.
• For more details about the general concept of the
initialization actions in a Google Cloud Dataproc
cluster, see the related Google documentation:
Initialization actions.
Related scenarios
• Joining table columns and writing them into Hive on page 710.
• Aggregating Snowflake data using context variables as table and connection names on page 725
720
tELTInput
tELTInput
Adds as many Input tables as required for the SQL statement to be executed.
The three ELT components are closely related, in terms of their operating conditions. These
components should be used to handle DB schemas to generate Insert statements, including clauses,
which are to be executed in the DB output table defined.
Note that it is highly recommended to use the ELT components for a specific type of database
(if any) instead of the ELT components. For example, for Teradata, it is recommended to use the
tELTTeradataInput, tELTTeradataMap and tELTTeradataOutput components instead.
Basic settings
Schema and Edit schema A schema is a row description, it defines the nature and
number of fields to be processed. The schema is either
built-in or remotely stored in the Repository. The Schema
defined is then passed on to the ELT Mapper to be included
to the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
721
tELTInput
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
722
tELTMap
tELTMap
Uses the tables provided as input to feed the parameter in the built SQL statement. The statement
can include inner or outer joins to be implemented between tables or between one table and its
aliases.
The three ELT components are closely related, in terms of their operating conditions. These
components should be used to handle DB schemas to generate Insert statements, including clauses,
which are to be executed in the DB output table defined.
Note that it is highly recommended to use the ELT components for a specific type of database
(if any) instead of the ELT components. For example, for Teradata, it is recommended to use the
tELTTeradataInput, tELTTeradataMap and tELTTeradataOutput components instead.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
ELT Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.
723
tELTMap
JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.
Driver JAR Complete this table to load the driver JARs needed. To do
this, click the [+] button under the table to add as many
rows as needed, each row for a driver JAR, then select
the cell and click the [...] button at the right side of the
cell to open the Module dialog box from which you can
select the driver JAR to be used. For example, the driver jar
RedshiftJDBC41-1.1.13.1013.jar for the Redshift
database.
Class name Enter the class name for the specified driver between
double quotation marks. For example, for the
RedshiftJDBC41-1.1.13.1013.jar driver, the name
to be entered is com.amazon.redshift.jdbc41.D
river.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
724
tELTMap
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
725
tELTMap
#SID;SNAME;TID
11;Alex;22
12;Mark;23
13;Stephane;21
14;Cedric;22
15;Bill;21
16;Jack;23
17;John;22
18;Andrew;23
• The source table TEACHER with three columns, TID of NUMBER(38,0) type and TNAME and
TPHONE of VARCHAR(50) type, has been created in Snowflake, and the following data has been
written into the table.
#TID;TNAME;TPHONE
21;Peter;+86 15812343456
22;Michael;+86 13178964532
23;Candice;+86 13923187456
Procedure
1. Add a tSnowflakeConnection component, a tSnowflakeClose component, two tELTInput
components, a tELTMap component, and a tELTOutput component to your Job.
2. On the Basic setting view of the first tELTInput component, enter the name of the first
source table in the Default Table Name field. In this example, it is the context variable
context.SourceTableS.
726
tELTMap
3. Repeat step 2 to set the value of the default table name for the second tELTInput component
and the tELTOutput component to context.SourceTableT and context.TargetTable
respectively.
4. Link the first tELTInput component to the tELTMap component using the Link > context.Source
TableS (Table) connection.
5. Link the second tELTInput component to the tELTMap component using the Link > context.Source
TableT (Table) connection.
6. Link the tELTMap component to the tELTOutput component using the Link > *New Output*
(Table) connection. The link will be renamed automatically to context.TargetTable
(Table).
7. Link the tSnowflakeConnection component to the tELTMap component using a Trigger > On
Subjob Ok connection.
8. Link the tELTMap component to the tSnowflakeClose component.
Connecting to Snowflake
Configure the tSnowflakeConnection component to connect to Snowflake.
Procedure
1. Double-click the tSnowflakeConnection component to open its Basic settings view.
2. In the Account field, enter the account name assigned by Snowflake.
3. In the Snowflake Region field, select the region where the Snowflake database locates.
4. In the User Id and the Password fields, enter the authentication information accordingly.
Note that this user ID is your user login name. If you do not know your user login name yet, ask
the administrator of your Snowflake system for details.
5. In the Warehouse field, enter the name of the data warehouse to be used in Snowflake.
6. In the Schema field, enter the name of the database schema to be used.
7. In the Database field, enter the name of the database to be used.
Procedure
1. Double-click the first tELTInput component to open its Basic settings view.
727
tELTMap
2. Click the [...] button next to Edit schema and in the schema dialog box displayed, define the
schema by adding three columns, SID and TID of INT type and SNAME of VARCHAR type.
3. Select Mapping Snowflake from the Mapping drop-down list.
4. Repeat the previous steps to configure the second tELTInput component, and define its schema by
adding three columns, TID of INT type and TNAME and TPHONE of VARCHAR type.
Procedure
1. Double-click the tELTOutput component to open the Basic settings view.
2. Select Create table from the Action on table drop-down list to create the target table.
3. Select the Table name from connection name is variable check box.
4. Select Mapping Snowflake from the Mapping drop-down list.
Procedure
1. Click the tELTMap component to open its Basic settings view.
2. Select the Use an existing connection check box and from the Component List displayed, select
the connection component you have configured to open the Snowflake connection.
3. Select Mapping Snowflake from the Mapping drop-down list.
4. Click the [...] button next to ELT Map Editor to open its map editor.
5. Add the first input table context.SourceTableS by clicking the [+] button in the upper left
corner of the map editor and then selecting the relevant table name from the drop-down list in
the pop-up dialog box.
6. Do the same to add the second input table context.SourceTableT.
7. Drag the column TID from the first input table context.SourceTableS and drop it onto the
corresponding column TID in the second input table context.SourceTableT.
8. Drag all columns from the input table context.SourceTableS and drop them onto the output
table context.TargetTable in the upper right panel.
9. Do the same to drag two columns TNAME and TPHONE from the input table context.Source
TableT and drop them onto the bottom of the output table. When done, click OK to close the
map editor.
10. Click the Sync columns button on the Basic settings view of the tELTOutput component to set its
schema.
728
tELTMap
Procedure
1. Double-click the tSnowflakeClose component to open the Component tab.
2. From the Connection Component drop-down list, select the component that opens the connection
you need to close, tSnowflakeConnection_1 in this example.
Procedure
1. Press Ctrl + S to save the Job.
2. Press F6 to execute the Job.
As shown above, Talend Studio executes the Job successfully and inserts eight rows into the
target table.
You can then create and run another Job to retrieve data from the target table by using the
tSnowflakeInput component and the tLogRow component. You will find that the aggregated data
are displayed on the console as shown in below screenshot.
For more information about how to retrieve data from Snowflake, see Writing data into and
reading data from a Snowflake table on page 3407.
Related scenarios
• Aggregating table columns and filtering on page 745.
• Mapping date using using an Alias table on page 749.
• Mapping data using a subquery on page 800, a related scenario using subquery
729
tELTOutput
tELTOutput
Carries out the action on the table specified and inserts the data according to the output schema
defined in the ELT Mapper.
The three ELT components are closely related, in terms of their operating conditions. These
components should be used to handle DB schemas to generate Insert statements, including clauses,
which are to be executed in the DB output table defined.
Note that it is highly recommended to use the ELT components for a specific type of database
(if any) instead of the ELT components. For example, for Teradata, it is recommended to use the
tELTTeradataInput, tELTTeradataMap and tELTTeradataOutput components instead.
Basic settings
Action on data On the data of the table defined, you can perform the
following operation:
• Insert: Adds new entries to the table. If duplicates are
found, Job stops.
• Update: Updates entries in the table.
• Delete: Deletes the entries which correspond to the
entry flow.
Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
730
tELTOutput
Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.
Default Table Name Enter the default table name, between double quotation
marks.
Default Schema Name Enter the default schema name, between double quotation
marks.
Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.
Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field which appears.
Advanced settings
Use update statement without subqueries Select this option to generate an UPDATE statement for the
database.
This option is available when Update is selected from the
Action on data drop-down list in the Basic settings view.
Clause SET Select the column names that will be used to generate the
SET clauses.
SET clauses will not be generated for the columns that are
not selected.
This field appears when Update is selected from the Action
on data drop-down list in the Basic settings view.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
731
tELTOutput
Global Variables
Usage
Usage rule tELTOutput is to be used along with the tELTMap. Note that
the Output link to be used with these components must
correspond strictly to the syntax of the table name.
Note that the ELT components do not handle actual data
flow but only schema information.
Limitation Avoid using any keyword for the database as the table/
column name or using any special character in the table/
column name. If you want to, you can enclose the table/
column name in a pair of \" to see whether it works. For
example, when you want to use the keyword number as an
Oracle database column name, you can have the Db Column
value in the schema editor set to \"number\". But note
that this solution does not always work.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
732
tELTMSSqlInput
tELTMSSqlInput
Adds as many Input tables as required for the most complicated Insert statement.
The three ELT MSSql components are closely related, in terms of their operating conditions. These
components should be used to handle MSSql DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
Provides the table schema to be used for the SQL statement to execute.
Basic settings
Schema and Edit schema A schema is a row description, it defines the nature and
number of fields to be processed. The schema is either
built-in or remotely stored in the Repository. The Schema
defined is then passed on to the ELT Mapper to be included
to the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
733
tELTMSSqlInput
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
734
tELTMSSqlMap
tELTMSSqlMap
Uses the tables provided as input to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
The three ELT MSSql components are closely related, in terms of their operating conditions. These
components should be used to handle MSSql DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
Helps you to build the SQL statement graphically, using the table provided as input.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
ELT MSSql Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.
735
tELTMSSqlMap
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
736
tELTMSSqlMap
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725
737
tELTMSSqlOutput
tELTMSSqlOutput
Executes the SQL Insert, Update and Delete statement to the MSSql database
The three ELT MSSql components are closely related, in terms of their operating conditions. These
components should be used to handle MSSql DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.
Basic settings
Action on data On the data of the table defined, you can perform the
following operation:
Insert: Adds new entries to the table. If duplicates are found,
Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
738
tELTMSSqlOutput
Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.
Default Table Name Enter the default table name, between double quotation
marks.
Default Schema Name Enter the default schema name,between double quotation
marks.
Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.
Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field which appears.
Advanced settings
Use update statement without subqueries Select this option to generate an UPDATE statement for the
MSSql database.
This option is available when Update is selected from the
Action on data drop-down list in the Basic settings view.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
739
tELTMSSqlOutput
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
740
tELTMysqlInput
tELTMysqlInput
Adds as many Input tables as required for the most complicated Insert statement.
The three ELT Mysql components are closely related, in terms of their operating conditions. These
components should be used to handle Mysql DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
tELTMysqlInput provides the table schema to be used for the SQL statement to execute.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the nature and
number of fields to be processed. The schema is either
built-in or remotely stored in the Repository. The Schema
defined is then passed on to the ELT Mapper to be included
to the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Default Table Name Enter the default table name, between double quotation
marks.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
741
tELTMysqlInput
Global Variables
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
742
tELTMysqlMap
tELTMysqlMap
Uses the tables provided as input to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
The three ELT Mysql components are closely related, in terms of their operating conditions. These
components should be used to handle Mysql DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
tELTMysqlMap helps to graphically build the SQL statement using the table provided as input.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
ELT Mysql Map editor The ELT Map editor allows you to define the output schema
as well as build graphically the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.
743
tELTMysqlMap
Advanced settings
Additional JDBC Specify additional JDBC parameters for the database connection created.
Parameters
This property is not available when the Use an existing connection check box in the Basic settings
view is selected.
tStatCatcher Statistics Select this check box to collect log data at the component level.
Global Variables
Usage
744
tELTMysqlMap
Note:
The ELT components do not handle actual data flow but
only schema information.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Building a Job
Procedure
1. Add the following components from the Palette onto the design workspace. Label these
components to best describe their functionality.
• three tELTMysqlInput components
• a tELTMysqlMap
• a tELTMysqlOutput
2. Double-click the first tELTMysqlInput component to display its Basic settings view.
745
tELTMysqlMap
3. Select Repository from the Schema list, click the three dot button preceding Edit schema, and
select your DB connection and the desired schema from the Repository Content dialog box.
The selected schema name appears in the Default Table Name field automatically.
In this use case, the DB connection is Talend_MySQL and the schema for the first input component
is owners.
4. Set the second and third tELTMysqlInput components in the same way but select cars and resellers
respectively as their schema names.
Note: In this use case, all the involved schemas are stored in the Metadata node of the
Repository tree view for easy retrieval. For further information concerning metadata, see
Talend Studio User Guide.
You can also select the three input components by dropping the relevant schemas from
the Metadata area onto the design workspace and double-clicking tELTMysqlInput from
the Components dialog box. Doing so allows you to skip the steps of labeling the input
components and defining their schemas manually.
5. Connect the three tELTMysqlInput components to the tELTMysqlMap component using links
named following strictly the actual DB table names: owners, cars and resellers.
6. Connect the tELTMysqlMap component to the tELTMysqlOutput component and name the link
agg_result, which is the name of the database table you will save the aggregation result to.
7. Click the tELTMysqlMap component to display its Basic settings view.
8. Select Repository from the Property Type list, and select the same DB connection that you use for
the input components.
All the database details are automatically retrieved.
9. Double-click the tELTMysqlMap component to launch the ELT Map editor to set up joins between
the input tables and define the output flow.
746
tELTMysqlMap
10. Add the input tables by clicking the green plus button at the upper left corner of the ELT Map
editor and selecting the relevant table names in the Add a new alias dialog box.
11. Drop the ID_Owner column from the owners table to the corresponding column of the cars table.
12. In the cars table, select the Explicit join check box in front of the ID_Owner column.
As the default join type, INNER JOIN is displayed on the Join list.
13. Drop the ID_Reseller column from the cars table to the corresponding column of the resellers
table to set up the second join, and define the join as an inner join in the same way.
14. Select the columns to be aggregated into the output table, agg_result.
15. Drop the ID_Owner, Name, and ID_Insurance columns from the owners table to the output table.
16. Drop the Registration, Make, and Color columns from the cars table to the output table.
17. Drop the Name_Reseller and City columns from the resellers table to the output table.
With the relevant columns selected, the mappings are displayed in yellow and the joins are
displayed in dark violet.
18. Set up a filter in the output table. Click the Add filter row button on top of the output table to
display the Additional clauses expression field, drop the City column from the resellers table to the
expression field, and complete a WHERE clause that reads resellers.City ='Augusta'.
747
tELTMysqlMap
19. Click the Generated SQL Select query tab to display the corresponding SQL statement.
748
tELTMysqlMap
Note: You can also use a built-in output schema and retrieve the schema structure from the
preceding component; however, make sure that you specify an existing target table having the
same data structure in your database.
Procedure
1. Save your Job.
2. Press F6 to launch it.
All selected data is inserted in the agg_result table as specified in the SQL statement.
749
tELTMysqlMap
Building a Job
Procedure
1. Drop two tELTMysqlInput components, a tELTMysqlMap component, and a tELTMysqlOutput
component to the design workspace, and label them to best describe their functionality.
2. Double-click the first tELTMysqlInput component to display its Basic settings view.
3. Select Repository from the Schema list, and define the DB connection and schema by clicking the
three dot button preceding Edit schema.
The DB connection is Talend_MySQL and the schema for the first input component is employees.
Note:
In this use case, all the involved schemas are stored in the Metadata node of the Repository
tree view for easy retrieval. For further information concerning metadata, see Talend Studio
User Guide.
4. Set the second tELTMysqlInput component in the same way but select dept as its schema.
5. Double-click the tELTMysqlOutput component to display its Basic settings view.
6. Select an action from the Action on data list as needed, Insert in this use case.
7. Select Repository as the schema type, and define the output schema in the same way as you
defined the input schemas. In this use case, select result as the output schema, which is the name
of the database table used to store the mapping result.
The output schema contains all the columns of the input schemas plus a ManagerName column.
750
tELTMysqlMap
Procedure
1. Connect the two tELTMysqlInput components to the tELTMysqlMap component using Link
connections named strictly after the actual input table names, employees and dept in this use case.
2. Connect the tELTMysqlMap component to the tELTMysqlOutput component using a Link
connection. When prompted, click Yes to allow the ELT Mapper to retrieve the output table
structure from the output schema.
3. Click the tELTMysqlMap component and select the Component tab to display its Basic settings
view.
4. Select Repository from the Property Type list, and select the same DB connection that you use for
the input components.
All the DB connection details are automatically retrieved.
Procedure
1. Click the three-dot button next to ELT Mysql Map Editor or double-click the tELTMysqlMap
component on the design workspace to launch the ELT Map editor.
With the tELTMysqlMap component connected to the output component, the output table is
displayed in the output area.
2. Add the input tables, employees and dept, in the input area by clicking the green plus button and
selecting the relevant table names in the Add a new alias dialog box.
3. Create an alias table based on the employees table by selecting employees from the Select the
table to use list and typing in Managers in the Type in a valid alias field in the Add a new alias
dialog box.
751
tELTMysqlMap
4. Drop the DeptNo column from the employees table to the dept table.
5. Select the Explicit join check box in front of the DeptNo column of the dept table to set up an
inner join.
6. Drop the ManagerID column from the employees table to the ID column of the Managers table.
7. Select the Explicit join check box in front of the ID column of the Managers table and select LEFT
OUTER JOIN from the Join list to allow the output rows to contain Null values.
8. Drop all the columns from the employees table to the corresponding columns of the output table.
9. Drop the DeptName and Location columns from the dept table to the corresponding columns of
the output table.
10. Drop the Name column from the Managers table to the ManagerName column of the output table.
752
tELTMysqlMap
11. Click on the Generated SQL Select query tab to display the SQL query statement to be executed.
Procedure
1. Save your Job.
2. Press F6 to run it.
The output database table result contains all the information about the employees, including the
names of their respective managers.
Related scenarios
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725
753
tELTMysqlOutput
tELTMysqlOutput
tELTMysqlOutput executes the SQL Insert, Update and Delete statement to the Mysql database
The three ELT Mysql components are closely related, in terms of their operating conditions. These
components should be used to handle Mysql DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
tELTMysqlOutput carries out the action on the table specified and inserts the data according to the
output schema defined the ELT Mapper.
Basic settings
Action on data On the data of the table defined, you can perform the
following operation:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.
Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
754
tELTMysqlOutput
Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.
Default Table Name Enter the default table name, between inverted commas.
Note that the table must exist already. If it does not exist,
you can use tCreateTable to create one first. For more
information about tCreateTable, see tCreateTable on page
540.
Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.
Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field which appears.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component level.
Global Variables
Usage
755
tELTMysqlOutput
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
756
tELTNetezzaInput
tELTNetezzaInput
Allows you to add as many Input tables as required for the most complicated Insert statement.
The three ELT Netezza components are closely related, in terms of their operating conditions. These
components should be used to handle Netezza database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Provides the table schema to be used for the SQL statement to execute.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
757
tELTNetezzaInput
Global Variables
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Mapping data using a simple implicit join on page 686
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
758
tELTNetezzaMap
tELTNetezzaMap
Uses the tables provided as input, to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
The three ELT Netezza components are closely related, in terms of their operating conditions. These
components should be used to handle Netezza database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Helps you to build the SQL statement graphically, using the table provided as input.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
ELT Netezza Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.
759
tELTNetezzaMap
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
760
tELTNetezzaMap
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
• Mapping data using a simple implicit join on page 686
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725
761
tELTNetezzaOutput
tELTNetezzaOutput
Performs the action (insert, update or delete) on data in the specified Netezza table through the SQL
statement generated by the tELTNetezzaMap component.
The three ELT Netezza components are closely related, in terms of their operating conditions. These
components should be used to handle Netezza database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic settings
Action on data On the data of the table defined, you can perform the
following operation:
Insert: Adds new entries to the table.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.
Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.
762
tELTNetezzaOutput
Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.
Default Table Name Enter the default table name, between double quotation
marks.
Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.
Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field that appears.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
763
tELTNetezzaOutput
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Mapping data using a simple implicit join on page 686
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
764
tELTOracleInput
tELTOracleInput
Provides the Oracle table schema that will be used by the tELTOracleMap component to generate the
SQL SELECT statement.
The three ELT Oracle components are closely related, in terms of their operating conditions. These
components should be used to handle Oracle database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic settings
Schema and Edit schema A schema is a row description, it defines the nature and
number of fields to be processed. The schema is either
built-in or remotely stored in the Repository. The Schema
defined is then passed on to the ELT Mapper to be included
to the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Default Table Name Enter the default table name, between double quotation
marks.
Default Schema Name Enter the default schema name,between double quotation
marks.
765
tELTOracleInput
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Note:
The ELT components do not handle actual data flow but
only schema information.
Related scenarios
• Updating Oracle database entries on page 769
• Aggregating Snowflake data using context variables as table and connection names on page 725
766
tELTOracleMap
tELTOracleMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTOracleInput
components.
The three ELT Oracle components are closely related, in terms of their operating conditions. These
components should be used to handle Oracle database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
ELT Oracle Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.
Style link Auto: By default, the links between the input and output
schemas and the Web service parameters are in the form of
curves.
Bezier curve: Links between the schema and the Web
service parameters are in the form of curve.
Line: Links between the schema and the Web service
parameters are in the form of straight lines.
This option slightly optimizes performance.
767
tELTOracleMap
Advanced settings
Use Hint Options Select this check box to activate the hint configuration
area to help you optimize a query's execution. In this area,
parameters are:
- HINT: specify the hint you need, using the syntax /*+
*/. - POSITION: specify where you put the hint in a SQL
statement.
- SQL STMT: select the SQL statement you need to use.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
768
tELTOracleMap
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
769
tELTOracleMap
Adding components
As described in Aggregating table columns and filtering on page 745, configure a Job for data
aggregation using the corresponding ELT components for Oracle database - tELTOracleInput,
tELTOracleMap, and tELTOracleOutput. Execute the Job to save the aggregation result in a database
table named Agg_Result.
Note:
When defining filters in the ELT Map editor, note that strings are case sensitive in Oracle database.
Procedure
1. Launch the ELT Map editor and add a new output table named update_data.
2. Add a filter row to the update_data table to set up a relationship between input and output tables:
owners.ID_OWNER = agg_result.ID_OWNER.
3. Drop the MAKE column from the cars table to the update_data table.
4. Drop the NAME_RESELLER column from the resellers table to the update_data table.
5. Add a model enclosed in single quotation marks, 'A8' in this use case, to the MAKE column from
the cars table, preceded by a double pipe.
6. Add Sold by enclosed in single quotation marks in front of the NAME_RESELLER column from
the resellers table, with a double pipe in between.
770
tELTOracleMap
Procedure
1. Save your Job.
771
tELTOracleMap
Related scenario
• Updating Oracle database entries on page 769
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725
772
tELTOracleOutput
tELTOracleOutput
Performs the action (insert, update, delete, or merge) on data in the specified Oracle table through the
SQL statement generated by the tELTOracleMap component.
The three ELT Oracle components are closely related, in terms of their operating conditions. These
components should be used to handle Oracle database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic Settings
Action on data On the data of the table defined, you can perform the
following operation:
Insert: Add new entries to the table. If duplicates are found,
the Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.
MERGE: Updates and/or adds data to the table. Note
that the options available for the MERGE operation are
different to those available for the Insert, Update or Delete
operations.
Note:
Following global variables are available:
• NB_LINE_INSERTED: Number of lines inserted
during the Insert operation.
• NB_LINE_UPDATED: Number of lines updated during
the Update operation.
• NB_LINE_DELETED: Number of lines deleted during
the Delete operation.
• NB_LINE_MERGED: Number of lines inserted and/or
updated during the MERGE operation.
Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
773
tELTOracleOutput
Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.
Use Merge Update (for MERGE) Select this check box to update the data in the output table.
Column : Lists the columns in the entry flow.
Update : Select the check box which corresponds to the
name of the column you want to update.
Use Merge Update Where Clause : Select this check box and
enter the WHERE clause required to filter the data to be
updated, if necessary.
Use Merge Update Delete Clause: Select this check box and
enter the WHERE clause required to filter the data to be
deleted and updated, if necessary.
Use Merge Insert (for MERGE) Select this check box to insert the data in the table.
Column: Lists the entry flow columns.
Check All: Select the check box corresponding to the name
of the column you want to insert.
Use Merge Update Where Clause: Select this check box and
enter the WHERE clause required to filter the data to be
inserted.
Default Table Name Enter a default name for the table, between double
quotation marks.
Default Schema Name Enter a name for the default Oracle schema, between
double quotation marks.
Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.
Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field which appears.
774
tELTOracleOutput
Advanced settings
Use Hint Options Select this check box to activate the hint configuration
area when you want to use a hint to optimize a query's
execution. In this area, parameters are:
- HINT: specify the hint you need, using the syntax /*+ */.
- POSITION: specify where you put the hint in a SQL
statement.
- SQL STMT: select the SQL statement you need to use.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
775
tELTOracleOutput
2. Select Repository from the Schema list and click the [...] button preceding Edit schema.
3. Select your database connection and the desired schema from the Repository Content dialog box.
The selected schema name appears in the Default Table Name field automatically.
• In this use case, the database connection is Talend_Oracle and the schema is
new_customers.
• In this use case, the input schema is stored in the Metadata node of the Repository tree view
for easy retrieval. For further information concerning metadata, see Talend Studio User Guide.
776
tELTOracleOutput
• You can also select the input component by dropping the relevant schema from the Metadata
area onto the design workspace and double-clicking tELTOracleInput from the Components
dialog box. Doing so allows you to skip the steps of labeling the input component and
defining its schema manually.
4. Click the tELTOracleMap component to display its Basic settings view.
5. Select Repository from the Property Type list, and select the same database connection that you
use for the input components.
Remember: All the database details are automatically retrieved. Leave the other settings as
they are.
6. Double-click the tELTOracleMap component to launch the ELT Map editor for setingup the data
transformation flow.
Display the input table by clicking the green plus button at the upper left corner of the ELT Map
editor and selecting the relevant table name in the Add a new alias dialog box.
In this use case, the only input table is new_customers.
777
tELTOracleOutput
7. Select all the columns in the input table and drop them to the output table.
8. Click the Generated SQL Select query tab to display the query statement to be executed.
Click OK to validate the ELT Map settings and close the ELT Map editor.
9. Double-click the tELTOracleOutput component to display its Basic settings view.
a) From the Action on data list, select MERGE.
b) Click the Sync columns button to retrieve the schema from the preceding component.
c) Select the Use Merge Update check box to update the data using Oracle's MERGE function.
10. In the table that appears, select the check boxes for the columns you want to update.
In this use case, youupdate all the data according to the customer ID. Therefore, select all the
check boxes except the one for the ID column.
Warning: The columns defined as the primary key cannot and must not be made subject to
updates.
11. Select the Use Merge Insert check box to insert new data while updating the existing data by
leveraging the OracleMERGE function.
12. In the table that appears, select the check boxes for the columns into which you want to insert
new data.
778
tELTOracleOutput
In this use case, insert all the new customer data. Therefore, select all the check boxes by clicking
the Check All check box.
13. Fill the Default Table Name field with the name of the target table already existing in your
database. In this example, fill in customers_merge.
14. Leave the other parameters as they are.
779
tELTPostgresqlInput
tELTPostgresqlInput
Provides the Postgresql table schema that will be used by the tELTPostgresqlMap component to
generate the SQL SELECT statement.
The three ELT Postgresql components are closely related, in terms of their operating conditions.
These components should be used to handle Postgresql database table schemas to generate SQL
statements, including clauses, which are to be executed in the database output table defined.
Basic settings
Schema and Edit schema A schema is a row description, it defines the nature and
number of fields to be processed. The schema is either
built-in or remotely stored in the Repository. The Schema
defined is then passed on to the ELT Mapper to be included
to the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Default Table Name Enter the default table name, between double quotation
marks.
Default Schema Name Enter the default schema name, between double quotation
marks.
780
tELTPostgresqlInput
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
781
tELTPostgresqlMap
tELTPostgresqlMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTPostgresql
Input components.
The three ELT Postgresql components are closely related, in terms of their operating conditions.
These components should be used to handle Postgresql database table schemas to generate SQL
statements, including clauses, which are to be executed in the database output table defined.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
ELT Postgresql Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.
782
tELTPostgresqlMap
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
783
tELTPostgresqlMap
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725
784
tELTPostgresqlOutput
tELTPostgresqlOutput
Performs the action (insert, update or delete) on data in the specified Postgresql table through the
SQL statement generated by the tELTPostgresqlMap component.
The three ELT Postgresql components are closely related, in terms of their operating conditions.
These components should be used to handle Postgresql database table schemas to generate SQL
statements, including clauses, which are to be executed in the database output table defined.
Basic settings
Action on data On the data of the table defined, you can perform the
following operation:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.
Schema and Edit schema A schema is a row description, that is to say, it defines the
number of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
785
tELTPostgresqlOutput
Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.
Default Table Name Enter the default table name between double quotation
marks.
Default Schema Name Enter the default schema name between double quotation
marks
Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.
Use different table name Select this check box to enter a different output table name,
between double quotation marks, in the Table name field
which appears.
Advanced settings
Use update statement without subqueries Select this option to generate an UPDATE statement for the
database.
This option is available when Update is selected from the
Action on data drop-down list in the Basic settings view.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
786
tELTPostgresqlOutput
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
787
tELTSybaseInput
tELTSybaseInput
Provides the Sybase table schema that will be used by the tELTSybaseMap component to generate the
SQL SELECT statement.
The three ELT Sybase components are closely related, in terms of their operating conditions. These
components should be used to handle Sybase database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic settings
Schema and Edit schema A schema is a row description, it defines the number and
nature of the fields to be processed. The schema is either
built-in (local) or stored remotely in the Repository. The
Schema defined is then passed on to the ELT Mapper for
inclusion in the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Default Table Name Enter a default name for the table, between double
quotation marks.
Default Schema Name Enter a default name for the Sybase schema, between
double quotation marks.
788
tELTSybaseInput
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Note:
ELT components only handle schema information. They
do not handle actual data flow..
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
789
tELTSybaseMap
tELTSybaseMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTSybaseInpu
t components.
The three ELT Sybase components are closely related, in terms of their operating conditions. These
components should be used to handle Sybase database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
ELT Sybase Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.
790
tELTSybaseMap
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at component level.
Global Variables
Usage
Note:
The ELT components only handle schema information.
They do not handle actual data flow.
791
tELTSybaseMap
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
792
tELTSybaseOutput
tELTSybaseOutput
Performs the action (insert, update or delete) on data in the specified Sybase table through the SQL
statement generated by the tELTSybaseMap component.
The three ELT Sybase components are closely related, in terms of their operating conditions. These
components should be used to handle Sybase database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic settings
Action on data On the data of the table defined, you can perform the
following operation:
Insert: Add new entries to the table. If duplicates are found,
the Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.
Schema and Edit schema A schema is a row description, that is to say, it defines the
number and nature of the fields to be processed and passed
on to the next component. The schema is either Built-in
(local) or stored remotely in the Repository . The Schema
defined is then passed on to the ELT Mapper for inclusion in
the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
793
tELTSybaseOutput
Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.
Default Table Name Enter a default name for the table, between double
quotation marks.
Note that the table must exist already. If it does not exist,
you can use tCreateTable to create one first. For more
information about tCreateTable, see tCreateTable on page
540.
Default Schema Name Enter a default name for the Sybase schema, between
double quotation marks.
Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.
Use different table name Select this check box to enter a different output table name,
between double quotation marks, in the Table name field
which appears.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at component level.
Global Variables
794
tELTSybaseOutput
Usage
Note:
ELT components only handle schema information. They
do not handle actual data flow.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
795
tELTTeradataInput
tELTTeradataInput
Provides the Teradata table schema that will be used by the tELTTeradataMap component to generate
the SQL SELECT statement.
The three ELT Teradata components are closely related, in terms of their operating conditions. These
components should be used to handle Teradata database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic settings
Schema and Edit schema A schema is a row description, that is to say, it defines the
nature and number of fields to be processed. The schema
is either built-in or remotely stored in the Repository. The
Schema defined is then passed on to the ELT Mapper to be
included to the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Default Table Name Enter a default name for the table, between double
quotation marks.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at component level.
796
tELTTeradataInput
Global Variables
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
797
tELTTeradataMap
tELTTeradataMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTTeradataIn
put components.
The three ELT Teradata components are closely related, in terms of their operating conditions. These
components should be used to handle Teradata database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
ELT Teradata Map editor The ELT Map editor allows you to define the output schema
as well as build graphically the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.
798
tELTTeradataMap
Advanced settings
Query band Select this check box to use the Teradata Query Banding
feature to add metadata to the query to be processed,
such as the user running the query. This can help you, for
example, identify the origin of this query.
Once selecting the check box, the Query Band parameters
table is displayed, in which you need to enter the metadata
information to be added. This information takes the form of
key/value pairs, for example, DpID in the Key column and
Finance in the Value column.
This check box actually generates the SET QUERY_BAND
FOR SESSION statement with the key/value pairs declared
in the Query Band parameters table. For further information
about this statement, see https://docs.teradata.com/search/
all?query=End+logging+syntax.
This check box is not available when you have selected
the Using an existing connection check box. In this
situation, if you need to use the Query Band feature, set it
in the Advanced settings tab of the Teradata connection
component to be used.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
799
tELTTeradataMap
Usage
Note:
The ELT components do not handle actual data flow but
only schema information.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Prerequisite
Ensure that you have added an Oracle database connection in the Metadata > Db Connections section
prior to creating the Job. For more information, see the Centralizing database metadata section of the
Talend Data Integration Studio User Guide.
800
tELTTeradataMap
Design the Prejob that includes the data in this scenario as follows:
The PreferredSubject table contains the student's preferred subject data. To reproduce this scenario,
you can load the following data to the Oracle table from a CSV file:
SeqID;StuName;Subject;Detail
1;Amanda;art;Amanda prefers art.
2;Ford;science;Ford prefers science.
3;Kate;art;Kate prefers art.
801
tELTTeradataMap
The CourseScore table contains the student's subject score data. To reproduce this scenario, you can
load the following data to the Oracle table from a CSV file:
SeqID;StuName;Subject;Course;Score;Detail
1;Amanda;science;math;85;science score
2;Amanda;science;physics;75;science score
3;Amanda;science;chemistry;80;science score
4;Amanda;art;chinese;85;art score
5;Amanda;art;history;95;art score
6;Amanda;art;geography;80;art score
7;Ford;science;math;95;science score
8;Ford;science;physics;85;science score
9;Ford;science;chemistry;80;science score
10;Ford;art;chinese;75;art score
11;Ford;art;history;80;art score
12;Ford;art;geography;85;art score
13;Kate;science;math;65;science score
14;Kate;science;physics;75;science score
15;Kate;science;chemistry;80;science score
16;Kate;art;chinese;85;art score
17;Kate;art;history;80;art score
18;Kate;art;geography;95;art score
Before the Job execution, the output table TotalScoreOfPreferredSubject does not contain any data:
SeqID;StuName;PreferredSubject;TotalScore
Procedure
1. Create a Standard Job.
2. Add the following components:
• Prejob
• two tFixedFlowInput components
• two tOracleOutput components
• two tOracleInput components
• one tCreateTable component
• two tLogRow components
3. Configure the first tFixedFlowInput component:
a) Select the tFixedFlowInput component to display the Basic settings view.
b) Select Use Inline Content(delimited file) from the Mode options.
c) Add the following data to the Content field:
d) Click ... next to the Edit Schema field to open the Schema Editor.
e) Add four columns with the following names and corresponding parameters:
802
tELTTeradataMap
1;Amanda;science;math;85;science score
2;Amanda;science;physics;75;science score
3;Amanda;science;chemistry;80;science score
4;Amanda;art;chinese;85;art score
5;Amanda;art;history;95;art score
6;Amanda;art;geography;80;art score
7;Ford;science;math;95;science score
8;Ford;science;physics;85;science score
9;Ford;science;chemistry;80;science score
10;Ford;art;chinese;75;art score
11;Ford;art;history;80;art score
12;Ford;art;geography;85;art score
13;Kate;science;math;65;science score
14;Kate;science;physics;75;science score
15;Kate;science;chemistry;80;science score
16;Kate;art;chinese;85;art score
17;Kate;art;history;80;art score
18;Kate;art;geography;95;art score
c) Click ... next to the Edit Schema field to open the Schema Editor.
d) Add six columns with the following names and corresponding parameters:
5. Select the first tOracleOutput component to open the Basic settings view.
a) Select Repository from the Property Type drop-down list.
b) Specify the Oracle database connection the you have previously added by clicking .... This
automatically populates the database information in the fields provided.
Repeat step 6 and steps 6a-6b to configure the second tOracleOutput component.
6. Select the tCreateTable component to open the Basic settings view.
a) Select Oracle from the Database Type drop-down list.
803
tELTTeradataMap
804
tELTTeradataMap
805
tELTTeradataMap
806
tELTTeradataMap
Note: Specify the Oracle database connection information in the second ELTMap component in
the Job.
2. Click [...] next to ELT Oracle Map Editor to open its map editor.
3. Add the input table CourseScore by clicking [+] in the upper left corner of the map editor and
then selecting the relevant table name from the drop-down list in the pop-up dialog box.
4. Add an output table by clicking [+] in the upper right corner of the map editor and then entering
the table name TotalScore in the corresponding field in the pop-up dialog box.
5. Drag StuName, Subject, and Score columns in the input table and then drop them to the output
table.
6. Click the Add filter row button in the upper right corner of the output table and select Add an
other(GROUP...) clause from the pop-up menu. Then in the Additional other clauses (GROUP/
807
tELTTeradataMap
This SQL query will appear as a subquery in the SQL query generated by the ELTMap component.
8. Click OK to validate these changes and close the map editor.
9. Connect the first SubqueryMap to ELTMap using the Link > TotalScore (table1) link. Note that
the link is renamed automatically to TotalScore (Table_ref) since the output table TotalScore is a
reference table.
5. Click [...] next to ELT Oracle Map Editor to open its map editor.
6. Add the input table PreferredSubject by clicking the [+] button in the upper left corner
of the map editor and selecting the relevant table name from the drop-down list in the pop-up
dialog box.
Repeat the step to add another input table TotalScore.
808
tELTTeradataMap
7. Drag the StuName column in the input table PreferredSubject and drop it to the corresponding
column in the input table TotalScore. Then select the Explicit join check box for the StuName
column in the input table TotalScore.
Repeat the step for the Subject column.
8. Drag the SeqID column in the input table PreferredSubject and drop it to the corresponding
column in the output table.
Repeat the step to drag the StuName and Subject columns in the input table PreferredSubject and
the Score column in the input table TotalScore and drop them to the corresponding column in the
output table.
9. Click the Generated SQL Select query for "table2" output tab at the bottom of the map editor to
display the corresponding generated SQL statement.
The SQL query generated in the SubqueryMap component appears as a subquery in the SQL query
generated by this component. Alias will be automatically added for the selected columns in the
subquery.
10. Click OK to validate these changes and close the map editor.
The select statement is generated and the mapping data are written into the output table.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
809
tELTTeradataOutput
tELTTeradataOutput
Performs the action (insert, update or delete) on data in the specified Teradata table through the SQL
statement generated by the tELTTeradataMap component.
The three ELT Teradata components are closely related, in terms of their operating conditions. These
components should be used to handle Teradata database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic settings
Action on data On the data of the table defined, you can perform the
following operation:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
810
tELTTeradataOutput
Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.
Default Table Name Enter a default name for the table, between double
quotation marks.
Note that the table must exist already. If it does not exist,
you can use tCreateTable to create one first. For more
information about tCreateTable, see tCreateTable on page
540.
Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.
Use different table name Select this check box to enter a different output table name,
between double quotation marks, in the Table name field
which appears.
Advanced settings
Clause SET Select the column names that will be used to generate the
SET clauses.
SET clauses will not be generated for the columns that are
not selected.
This field appears when Update is selected from the Action
on data drop-down list in the Basic settings view.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at component level.
Global Variables
811
tELTTeradataOutput
Usage
Note:
Note that the ELT components do not handle actual data
flow but only schema information.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725
812
tELTVerticaInput
tELTVerticaInput
Provides the Vertica table schema that will be used by the tELTVerticaMap component to generate the
SQL SELECT statement.
The three ELT Vertica components are closely related, in terms of their operating conditions. These
components should be used to handle Vertica database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
813
tELTVerticaInput
Global Variables
Usage
Note:
The ELT components do not handle actual data flow but
only schema information.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800
• Aggregating Snowflake data using context variables as table and connection names on page 725
814
tELTVerticaMap
tELTVerticaMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTVerticaInput
components.
The three ELT Vertica components are closely related, in terms of their operating conditions. These
components should be used to handle Vertica database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
ELT Vertica Map Editor The ELT Map editor allows you to define the output schema
as well as build graphically the SQL statement to be
executed. The column names of the schema can be different
from the column names in the database.
815
tELTVerticaMap
Additional JDBC Parameters Specify additional connection properties for the database
connection you are creating.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
816
tELTVerticaMap
Note:
The ELT components do not handle actual data flow but
only schema information.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725
817
tELTVerticaOutput
tELTVerticaOutput
Performs the action (insert, update or delete) on data in the specified Vertica table through the SQL
statement generated by the tELTVerticaMap component.
The three ELT Vertica components are closely related, in terms of their operating conditions. These
components should be used to handle Vertica database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Basic settings
Action on data On the data of the table defined, you can perform one of the
following operations:
• Insert: Add new entries to the table. If duplicates are
found, Job stops.
• Update: Updates entries in the table.
• Delete: Deletes entries which correspond to the entry
flow.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Sync columns Click this button to retrieve the schema from the previous
component connected in the Job.
818
tELTVerticaOutput
Where clauses (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operation.
This field is available only when Update or Delete is
selected from the Action on data drop-down list.
Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.
Use different table name Select this check box to use a different output table name.
Advanced settings
Direct Select this check box to write the data directly to disk,
bypassing memory.
This check box is not visible when the Set SQL Label check
box is selected.
Set SQL Label Select this check box and specify the label that identifies
the query. For more information, see How to label queries
for profiling.
This check box is not visible when the Direct check box is
selected.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
819
tELTVerticaOutput
Usage
Note:
The ELT components do not handle actual data flow but
only schema information.
Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800
• Aggregating Snowflake data using context variables as table and connection names on page 725
820
tESBConsumer
tESBConsumer
Calls the defined method from the invoked Web service and returns the class as defined, based on the
given parameters.
Basic settings
Input Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Response Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
821
tESBConsumer
Fault Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Use Service Registry This option is only available if you subscribed to Talend
Enterprise ESB solutions.
Select this check box to enable the Service Registry. It
provides dynamic endpoint lookup and allows services to
be redirected based upon information retrieved from the
registry. It works in runtime only.
Enter the authentication credentials in the Username and
Password field.
If SAML token is registered in the service registry, you need
to specify the client's role in the Role field. You can also
select the Propagate Credentials check box to make the call
on behalf of an already authenticated user by propagating
the existing credentials. You can enter the username and
the password to authenticate via STS to propagate using
username and password, or provide the alias, username
822
tESBConsumer
Use Service Locator Maintains the availability of the service to help meet
demands and service level agreements (SLAs).
This option will not show if the Use Service Registry check
box is selected.
Use Service Activity Monitor Captures events and stores this information to facilitate
in-depth analysis of service activity and track-and-trace
of messages throughout a business transaction. This can
be used to analyze service response times, identify traffic
patterns, perform root cause analysis and more.
This option is disabled when the Use Service Registry check
box is selected if you subscribed to Talend Enterprise ESB
solutions.
Use Authentication Select this check box to enable the authentication option.
Select from Basic HTTP, HTTP Digest, Username Token,
and SAML Token (ESB runtime only). Enter a username
and a password in the corresponding fields as required.
Authentication with Basic HTTP, HTTP Digest, and
Username Token work in both the studio and runtime.
Authentication with the SAML Token works in runtime only.
When SAML Token (ESB runtime only) is selected, you can
either provide the user credentials to send the request or
make the call on behalf of an already authenticated user by
propagating the existing credentials. Select from:
-: Enter the username and the password in the
corresponding fields to access the service.
Propagate using U/P: Enter the user name and the password
used to authenticate against STS.
Propagate using Certificate: Enter the alias and the
password used to authenticate against STS.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
This option will not show if the Use Service Registry check
box is selected.
Use Business Correlation Select this check box to create a correlation ID in this
component.
You can specify a correlation ID in the Correlation Value
field. In this case the correlation ID will be passed on to the
service it calls so that chained service calls will be grouped
823
tESBConsumer
under this correlation ID. If you leave this field empty, this
value will be generated automatically at runtime.
When this option is enabled, tESBConsumer will also extract
the correlation ID from the response header and store it in
the component variable for further use in the flow.
This option will be enabled automatically when the Use
Service Registry check box is selected.
Use GZip Compress Select this check box to compress the incoming messages
into GZip format before sending.
Die on error Select this check box to kill the Job when an error occurs.
Advanced settings
Log messages Select this check box to log the message exchange
between the service provider and the consumer.
Service Locator Custom Properties This table appears when Use Service Locator is selected.
You can add as many lines as needed in the table to
customize the relevant properties. Enter the name and the
value of each property between double quotation marks
in the Property Name field and the Property Value field r
espectively.
Service Activity Custom Properties This table appears when Use Service Activity Monitor is
selected. You can add as many lines as needed in the table
to customize the relevant properties. Enter the name and
the value of each property between double quotation marks
in the Property Name field and the Property Value field r
espectively.
Connection time out(second) Set a value in seconds for Web service connection time out.
This option only works in the studio. To use it after the
component is deployed in runtime:
1. Create a configuration file with the name
org.apache.cxf.http.conduits-
<endpoint_name>.cfg in the <TalendRuntime
Path>/container/etc/ folder.
2. Specify the url of the Web service and the
client.ConnectionTimeout parameter in
milliseconds in the configuration file. If you need
to use the Receive time out option, specify the
client.ReceiveTimeout in milliseconds too.
The url can be a full endpoint address or a regular
expression containing wild cards, for example:
url = http://localhost:8040/*
client.ConnectionTimeout=10000000
client.ReceiveTimeout=20000000
824
tESBConsumer
Disable Chunking Select this check box to disable encoding the payload
as chunks. In general, chunking will perform better as
the streaming can take place directly. But sometimes the
payload is truncated with chunking enabled. If you are
getting strange errors when trying to interact with a service,
try turning off chunking to see if that helps.
Trust server with SSL/TrustStore file and TrustStore Select this check box to validate the server certificate to
password the client via an SSL protocol and fill in the corresponding
fields:
TrustStore file: Enter the path (including filename) to
the certificate TrustStore file that contains the list of
certificates that the client trusts.
TrustStore password: Enter the password used to check the
integrity of the TrustStore data.
Use http proxy/Proxy host, Proxy port, Proxy user, and Select this check box if you are using a proxy server and fill
Proxy password in the necessary information.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
HTTP Headers Click [+] as many times as required to add the name-value
pair(s) for HTTP headers to define the parameters of the
requested HTTP operation.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
825
tESBConsumer
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to turn on or off the Use
Authentication or Use HTTP proxy option dynamically at
runtime. You can add two rows in the table to set both
options.
Once a dynamic parameter is defined, the corresponding
option becomes highlighted and unusable in the Basic
settings view or Advanced settings view.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
826
tESBConsumer
this row and accept the propagation that prompts you to get the schema from the tESBConsumer
component.
4. Right-click the tESBConsumer component, select Row > Response from the contextual menu and
click the second tXMLMap component.
5. Right-click the second tXMLMap component, select Row > *New Output* (Main) from the
contextual menu and click the second tLogRow component. Enter response in the popup dialog
box to name this row.
6. Right-click the tESBConsumer component again, select Row > Fault from the contextual menu
and click the other tLogRow component.
Procedure
1. In the design workspace, double-click the tESBConsumer component to open its Basic settings
view in the Component tab.
827
tESBConsumer
5. Select the Log messages check box to show the exchange log in the execution console.
Procedure
1. Double-click the tFixedFlowInput component to open its Basic settings view in the Component
tab.
828
tESBConsumer
2. For users of Talend Studio with ESB who have retrieved the schema from the service WSDL
definition in the configuration of the tESBConsumer component, select Repository from the
Schema list. Then click the [...] of the next field to show the Repository Content dialog box. Select
the metadata under the IsValidEmail node to use it as the schema of the input message. Click OK
to close the dialog box.
For users of Talend Studio without ESB, please go to the next step.
829
tESBConsumer
3. For users of Talend Studio without ESB, the schema need to be created manually. Select Built-In
from the Schema list.
Click the three-dot button next to Edit Schema. In the schema dialog box, click the plus button to
add a new line of String type and name it Email. Click OK to close the dialog box.
830
tESBConsumer
Procedure
1. In the design workspace, double-click the tXMLMap component to open the Map Editor.
2. In the output table, right-click the root node and select Rename from the contextual menu. Enter
IsValidEmail in the dialog box that appears.
3. Right-click the IsValidEmail node and select Set A Namespace from the contextual menu. Enter
http://www.webservicex.net in the dialog box that appears.
4. Right-click the IsValidEmail node again and select Create Sub-Element from the contextual menu.
Enter Email in the dialog box that appears.
5. Right-click the Email node and select As loop element from the contextual menu.
6. Click the Email node in the input table and drop it to the Expression column in the row of the
Email node in the output table.
Procedure
1. In the design workspace, double-click the tXMLMap component in the output flow to open the
Map Editor.
2. In the input table, right-click the root node and select Rename from the contextual menu. Enter
IsValidEmailResponse in the dialog box that appears.
3. Right-click the IsValidEmailResponse node and select Set A Namespace from the contextual menu.
Enter http://www.webservicex.net in the dialog box that appears.
831
tESBConsumer
4. Right-click the IsValidEmailResponse node again and select Create Sub-Element from the cont
extual menu. Enter IsValidEmailResult in the dialog box that appears.
5. Right-click the IsValidEmailResult node and select As loop element from the contextual menu.
6. On the lower right part of the map editor , click [+] to add a row of String type to the output t
able and name it response.
7. Click the IsValidEmailResult node in the input table and drop it to the Expression column in the
row of the response node in the output table.
832
tESBConsumer
The email address [email protected] is returned as false. The input and output SOAP
messages in XML are also shown in the console.
833
tESBConsumer
834
tESBConsumer
Procedure
1. In the design workspace, double-click the tESBConsumer component to open its Basic settings
view in the Component tab.
835
tESBConsumer
Click Finish to validate your settings and close the dialog box.
4. In the Advanced settings view, select the Log messages check box to log the content of the
messages.
Procedure
1. Double-click the first tFixedFlowInput component to open its Basic settings view in the
Component tab.
2. For users of Talend Studio with ESB who have retrieved the schema from the service WSDL
definition in the configuration of the tESBConsumer component, select Repository from the
Schema list. Then click the [...] of the next field to show the Repository Content dialog box. Select
the metadata under the IsValidEmail node to use it as the schema of the input message. Click OK
to close the dialog box.
For users of Talend Studio without ESB, please go to the next step.
836
tESBConsumer
3. For users of Talend Studio without ESB, the schema need to be created manually. Select Built-In
from the Schema list.
Click the [...] button next to Edit Schema. In the schema dialog box, click the [+] button to add a
new line of String type and name it Email. Click OK to close the dialog box.
837
tESBConsumer
Give the value Hello world! to id and Talend to company, which are the headers of the request
message.
838
tESBConsumer
Procedure
1. In the design workspace, double-click the first tXMLMap component to open the Map Editor.
2. In the output table, right-click the root node and select Rename from the contextual menu. Enter
IsValidEmail in the dialog box that appears.
3. Right-click the IsValidEmail node and select Set A Namespace from the contextual menu. Enter
http://www.webservicex.net in the dialog box that appears.
4. Right-click the IsValidEmail node again and select Create Sub-Element from the contextual menu.
Enter Email in the dialog box that appears.
5. Right-click the Email node and select As loop element from the contextual menu.
6. Click the Email node in the input table and drop it to the Expression column in the row of the
Email node in the output table.
839
tESBConsumer
840
tESBConsumer
Procedure
1. In the design workspace, double-click tMap to open the Map Editor.
2. On the lower right part of the map editor, click [+] to add two rows of Document type to the outp
ut table and name them payload and headers respectively.
3. Click the payload node in the input table and drop it to the Expression column in the row of the
payload node in the output table.
4. Click the header node in the input table and drop it to the Expression column in the row of the
headers node in the output table.
Procedure
1. In the design workspace, double-click the tXMLMap component in the output flow to open the
Map Editor.
2. In the input table, right-click the root node and select Rename from the contextual menu. Enter
IsValidEmailResponse in the dialog box that appears.
3. Right-click the IsValidEmailResponse node and select Set A Namespace from the contextual menu.
Enter http://www.webservicex.net in the dialog box that appears.
841
tESBConsumer
4. Right-click the IsValidEmailResponse node again and select Create Sub-Element from the cont
extual menu. Enter IsValidEmailResult in the dialog box that appears.
5. Right-click the IsValidEmailResult node and select As loop element from the contextual menu.
6. On the lower right part of the map editor, click [+] to add a row of String type to the output t
able and name it response.
7. Click the IsValidEmailResult node in the input table and drop it to the Expression column in the
row of the response node in the output table.
842
tESBConsumer
As shown in the execution log, the email address [email protected] is returned as false. The
input and output SOAP messages in XML is also shown in the console. The SOAP header is sent with
the request to the service.
843
tESBProviderFault
tESBProviderFault
Serves a Talend Job cycle result as a Fault message of the Web service in case of a request response
communication style.
It acts as Fault message of the Web Service response at the end of a Talend Job cycle.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.
EBS service settings Fault title: Value of the faultString column in the Fault
message.
Note:
The Row > Fault flow of tESBConsumer has a pre-defined
schema whose column, faultString, is filled up with the
content of the field Fault title of tESBProviderFault.
844
tESBProviderFault
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
845
tESBProviderFault
For how to define a service in the Studio, see Talend Studio User Guide.
Procedure
1. Right-click getAirportInformationByISOCountryCode under the Web service airport and from the
contextual menu, select Assign Job.
2. In the Operation Choice window, select Create a new Job and Assign it to this Service Operation.
3. Click Next to open the Job description window. The Job name airportSoap_getAirportInform
ationByISOCountryCode is automatically filled in.
846
tESBProviderFault
4. Click Finish to create the Job and open it in the workspace. Three components are already
available.
Procedure
1. Drop tXMLMap and tMysqlInput from the Palette to the workspace.
2. Link tESBProviderRequest to tXMLMap using a Row > Main connection.
3. Link tMysqlInput to tXMLMap using a Row > Main connection.
4. Link tXMLMap to tESBProviderResponse using a Row > *New Output*(Main) connection.
In the new Output name pop-up window, enter the output table name, airport_response.
Click OK in the pop-up window that asks whether to get the schema of the target component.
847
tESBProviderFault
Procedure
1. Double-click tMysqlInput to display its Basic settings view.
2. Fill up the basic settings for the Mysql connection and database table.
Click the [...] button to open the schema editor.
848
tESBProviderFault
3. Click the [+] button to add two columns, id and name, with the type of string.
Click OK to close the editor.
Click Guess Query to retrieve the SQL query.
4. Double-click tXMLMap to open its mapper.
5. In the main : row1 table of the input flow side (left), right-click the column name payload and from
the contextual menu, select Import from Repository. Then the Metadata wizard is opened.
849
tESBProviderFault
Select the schema of the request message and click OK to validate this selection. In this example,
the schema is getAirportInformationByISOCountryCode.
6. Do the same to import the hierarchical schemas for the response/fault messages (right). In this
example, these schemas are getAirportInformationByISOCountryCodeResponse and getAirportInfo
rmationByISOCountryCodeFault respectively.
7. Then to create the join to the lookup data, drop the CountryAbbrviation node from the main flow
onto the id column of the lookup flow.
8. On the lookup flow table, click the wrench icon on the upper right corner to open the setting
panel.
Set Lookup Model as Reload at each row, Match Model as All matches and Join Model as Inner
join.
9. On the airport_response output flow table, click the wrench icon on the upper right corner to open
the setting panel.
Set the All in one option as true. This ensures that only one response is returned for each request
if multiple airport matches are found in the database.
10. On the fault_message output flow table, click the wrench icon on the upper right corner to open
the setting panel.
Set the Catch Lookup Inner Join Reject option as true to monitor the mismatches between the
country code in the request and the records in the database table. Once such a situation occurs, a
fault message will be generated by tESBConsumer and outputted via its Row > Fault flow.
850
tESBProviderFault
Note:
The Row > Fault flow of tESBConsumer has a predefined schema in which the faultString
column is filled with the content of the field Fault title of tESBProviderFault.
11. Drop the name column in the lookup flow onto the Expression area next to the tns:getAirport
InformationByISOCountryCodeResult node in the airport_response output flow.
Drop the tns:CountryAbbreviation node in the main flow onto the Expression area next to the
tns:getAirportInformationByISOCountryCodeFaultString node in the fault_message output flow. This
way, the incorrect country code in the request will be shown in the faultDetail column of the Row
> Fault flow of tESBConsumer.
Click OK to close the editor and validate this configuration.
12. Double-click tESBProviderFault to display its Basic settings view:
13. In the field Fault title, enter the context variable context.fault_message.
For how to define context variables, see Talend Studio User Guide.
Procedure
1. Press Ctrl +S to save the Job.
2. Press F6 to run this Job.
Results
The data service is published and will listen to all the requests until you click the Kill button to stop it
as by default, the Keep listening option of tESBProviderRequest is selected automatically.
Now is the time to configure the consumer Job that interacts with the data service.
851
tESBProviderFault
Procedure
1. Drop a tFileInputDelimited, a tXMLMap, a tESBConsumer and two tLogRow from the Palette to
the workspace.
2. Rename one tLogRow as response and the other as fault_message.
3. Link tFileInputDelimited to tXMLMap using a Row > Main connection.
4. Link tXMLMap to tESBConsumer using a Row > *New Output*(Main) connection.
In the new Output name pop-up window, enter the output table name, for example request.
Click OK in the pop-up window that asks whether to get the schema of the target component.
5. Link tESBConsumer to response using the Row > Response connection.
6. Link tESBConsumer to fault_message using the Row > Fault connection.
Procedure
1. Double-click tFileInputDelimited to open its Basic settings view.
852
tESBProviderFault
2. In the File name/stream field, enter the context variable for the file that has the country codes,
context.filepath.
3. Click the [...] button to open the schema editor.
4. Click the [+] button to add a column, country_code, for example, with the type of string.
Click OK to close the editor.
5. Double-click tXMLMap to open its Map editor.
853
tESBProviderFault
6. In the request table of the output flow side, right-click the column name payload and from the
contextual menu, select Import from Repository. Then the Metadata wizard is opened.
Select the schema of the request message and click OK to validate this selection. In this example,
the schema is getAirportInformationByISOCountryCode.
7. Drop the country_code column in the main flow onto the Expression area next to the
tns:CountryAbbreviation node in the request output flow.
Click OK to close the editor and validate this configuration.
8. Double-click tESBConsumer to open its service configuration wizard:
854
tESBProviderFault
9. Click the Browse... button to select the desired WSDL file. The Port name and Operation are
automatically filled up once the WSDL file is selected.
Click OK to close the wizard.
10. Double-click response to open its Basic settings view:
11. Select Vertical (each row is a key/value list) and then Print label for a better view of the results.
Do the same to the other tLogRow, fault_message.
Procedure
1. Press Ctrl +S to save the Job.
2. Press F6 to run this Job.
855
tESBProviderFault
As shown above, two messages are returned, one giving the airport name that matches the
country code CN and the other giving the error details caused by the country code CC.
856
tESBProviderRequest
tESBProviderRequest
Wraps Talend Job as web service.
It waits for a request message from a consumer and passes it to the next component.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Keep listening Check this box when you want to ensure that the provider
(and therefore Talend Job) will continue listening for
requests after processing the first incoming request.
857
tESBProviderRequest
Advanced settings
Log messages (Studio only) Select this check box to log the message exchange
between the service provider and the consumer. This option
works in the Studio only.
Response timeout, sec Specify the time limit in seconds for sending response to
the consumer. This parameter is necessary to avoid locking
of message exchanges.
Request processing queue size Specify the maximum number of received requests that
can be processed in parallel by the components between
tESBProviderRequest and tESBProviderResponse. Note that
this parameter is different from the queueSize in the
<TalendRuntimePath>\etc\org.apache.cxf.wor
kqueues-default.cfg which defines pool configuration
for incoming requests on CXF level.
Request processing timeout, sec Specify the time limit in seconds for requests to be
processed by the components between the tESBProviderRe
quest and the tESBProviderResponse.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
858
tESBProviderRequest
Usage
Usage rule This component covers the possibility that a Talend Job can
be wrapped as a service, with the ability to input a request
to a service into a Job and return the Job result as a service
response.
The tESBProviderResponse component can both deliver the
payload of a SOAP message and also access the HTTP and
SOAP headers of a service.
The tESBProviderRequest component should be used
with the tESBProviderResponse component to provide
a Job result as a response, in case of a request-response
communication style.
When the SAML Token or the Service Registry is enabled in
the service runtime options and if the SAML Token exists
in the request header, the tESBProviderRequest compo
nent will get and store the SAML Token in the component
variable for further use in the flow.
The tESBProviderRequest component will get the
Correlation Value in the request header if it exists and
stored it in the component variable. When the Business
Correlation or the Service Registry is enabled in the service
runtime options, the Correlation Value will also be added to
the response. In this case, tESBProviderRequest will create a
Correlation Value if it does not exist.
Note that the Service Registry option is only available if you
subscribed to Talend Enterprise ESB solutions. For more
information about how to set the runtime options, see the
corresponding section in the Talend Studio User Guide.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to turn on or off the Keep
listening option dynamically at runtime.
When a dynamic parameter is defined, the corresponding
Keep listening option in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
859
tESBProviderRequest
In this scenario, a provider Job and a consumer Job are needed. In the meantime, the related
service should already exist in the Services node, with the WSDL URI being http://127.0.0.1.8088/
esb/provider/?WSDL, the port name being TEST_ProviderJobSoapBinding and the operation being
invoke(anyType):anyType.
The provider Job consists of a tESBProviderRequest, a tXMLMap, and two tLogRow components.
• Drop the following components from the Palette onto the design workspace: a tESBProviderRe
quest, a tXMLMap, and two tLogRow.
• Double-click tESBProviderRequest_1 in the design workspace to display its Component view and
set its Basic settings.
• Select Repository from the Property Type list and click the three-dot button to choose the service,
to the granularity of port name and operation.
• Click OK.
• Click the three-dot button next to Edit schema to view the schema of tESBProviderRequest_1.
860
tESBProviderRequest
• Click OK.
• Connect tESBProviderRequest_1 to tLogRow_1.
• Double-click tLogRow_1 in the design workspace to display its Component view and set its Basic
settings.
• Click the three-dot button next to Edit schema. and define the schema as follow.
861
tESBProviderRequest
• On the lower right part of the map editor, click the plus button to add one row to the payload
table and name this row as payload.
• In the Type column of this payload row, select Document as the data type. The corresponding XML
root is added automatically to the top table on the right side which represents the output flow.
• In the payload table, right-click root to open the contextual menu.
• From the contextual menu, select Create Sub-Element and type in response in the popup dialog
box.
• Right-click the response node and select As loop element from the contextual menu.
• Repeat this operation to create a sub-element request of the root node in the input table and set
the request node as loop element.
• Click the request node in the input table and drop it to the Expression column in the row of the
response node in the output table.
862
tESBProviderRequest
• Click the three-dot button next to Edit Schema and define the schema as follow.
• Drop the following components from the Palette onto the design workspace: a tFixedFlowInput, a
tXMLMap, a tESBConsumer, and two tLogRow.
• Double-click tFixedFlowInput_1 in the design workspace to display its Component view and set its
Basic settings.
863
tESBProviderRequest
• Click the plus button to add a new line of string type and name it payloadString.
• Click OK.
• In the Number of rows field, set the number of rows as 1.
• In the Mode area, select Use Single Table and input world in quotations into the Value field.
• Connect tFixedFlowInput_1 to tXMLMap_1.
• Connect tXMLMap_1 to tESBConsumer_1 and name this row as payload.
• In the design workspace, double-click tXMLMap_1 to open the Map Editor.
• In the output table, right-click the root node to open the contextual menu.
• From the contextual menu, select Create Sub-Element and type in request in the popup dialog
box.
• Right-click the request node and select As loop element from the contextual menu.
• Click the payloadstring node in the input table and drop it to the Expression column in the row of
the request node in the output table.
864
tESBProviderRequest
...
web service [endpoint: http://127.0.0.1:8088/esb/provider] published
...
• Click the three-dot button next to the Service Configuration to open the editor.
865
tESBProviderRequest
866
tESBProviderRequest
• Click the three-dot button next to Edit Schema and define the schema as follow:
• In the Job Design, double-click tLogRow_2 to display its Component view and set its Basic
settings.
• Click the three-dot button next to Edit Schema and define the schema as follow.
867
tESBProviderRequest
• Run the provider Job. In the execution log you will see:
868
tESBProviderResponse
tESBProviderResponse
Serves a Talend Job cycle result as a response message.
It acts as a service provider response builder at the end of each Talend Job cycle.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
869
tESBProviderResponse
Global Variables
Usage
870
tESBProviderResponse
• Drop the following components from the Palette onto the design workspace: a tESBProviderRe
quest, a tESBProviderResponse, a tXMLMap, and two tLogRow.
• In the design workspace, double-click tESBProviderRequest_1 to display its Component view and
set its Basic settings.
• Select Repository from the Property Type list and click the three-dot button to choose the service,
to the granularity of port name and operation.
• Click OK.
• Click the three-dot button next to Edit schema to view its schema.
871
tESBProviderResponse
• Click the three-dot button next to Edit schema and define the schema as follow.
872
tESBProviderResponse
• In the Type column of this payload row, select Document as the data type. The corresponding XML
root is added automatically to the top table on the right side which represents the output flow.
• In the payload table, right-click root to open the contextual menu.
• From the contextual menu, select Create Sub-Element and type in response in the popup dialog
box.
• Right-click the response node and select As loop element from the contextual menu.
• Repeat this operation to create a sub-element request of the root node in the input table and set
the request node as loop element.
• Click the request node in the input table and drop it to the Expression column in the row of the
response node in the output table.
873
tESBProviderResponse
• Click the three-dot button next to Edit schema and define the schema as follow.
• Click the three-dot button next to Edit schema and define the schema as follow.
874
tESBProviderResponse
• Drop the following components from the Palette onto the design workspace: a tFixedFlowInput, a
tXMLMap, a tESBConsumer, and two tLogRow.
• Double-click tFixedFlowInput_1 in the design workspace to display its Component view and set its
Basic settings.
• Click the plus button to add a new line of string type and name it payloadString.
• Click OK.
875
tESBProviderResponse
...
web service [endpoint: http://127.0.0.1:8088/esb/provider] published
...
876
tESBProviderResponse
• Click the three-dot button next to the Service Configuration to open the editor.
877
tESBProviderResponse
• Click the three-dot button next to Edit Schema and define the schema as follow.
• In the Job Design, double-click tLogRow_2 to display its Component view and set its Basic
settings.
878
tESBProviderResponse
• Click the three-dot button next to Edit Schema and define the schema as follow:
2011-04-21 15:28:26.874:INFO::jetty-7.2.2.v20101205
2011-04-21 15:28:27.108:INFO::Started
[email protected]:8088
web service [endpoint: http://127.0.0.1:8088/esb/provider] published
• Run the consumer Job. In the execution log of the Job you will see:
879
tESBProviderResponse
880
tEXABulkExec
tEXABulkExec
Imports data into an EXASolution database table using the IMPORT command provided by the
EXASolution database in a fast way.
The import will be cancelled after a configurable number of records fail to import. Erroneous records
can be sent to a log table in the same database or to a local log file.
Basic settings
Use an existing connection Select this check box and from the list displayed select the
relevant connection component to reuse the connection
details you have already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Host Enter the host or host list of the EXASol database servers.
EXASol can run in a cluster environment. The valid value
can be a simple IP address (for example, 172.16.173.128
), an IP range list (for example, 172.16.173.128..130 that
represents three servers 172.16.173.128, 172.16.173.129
, and 172.16.173.130), or a comma-separated host list
(for example, server1,server2,server3) of the EXASolution
database cluster.
881
tEXABulkExec
User and Password Enter the user authentication data to access the
EXASolution database.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Note:
Typically the table names are stored in upper case. If you
need mixed case identifiers, you have to enter the name
in double quotes. For example, "\"TEST_data_LOAD\"".
Action on table On the table defined, you can perform one of the following
operations before running the import:
• None: No operation is carried out.
• Drop and create table: The table is removed and
created again.
• Create table: The table does not exist and gets created.
• Create table if not exists: The table is created if it does
not exist.
• Truncate table: The table content is deleted. You do
not have the possibility to rollback the operation.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Note:
The columns in the schema must be in the same order
as they are in the CSV file. It is not necessary to fill all
columns of the defined table unless the use case or table
definition expects that.
882
tEXABulkExec
Advanced settings
Additional JDBC Parameters Specify additional connection properties for the database
connection you are creating. The properties are separated
by semicolon and each property is a key-value pair, for
example, encryption=1;clientname=Talend.
This field is not available if the Use an existing connection
check box is selected.
Column Formats Specify the format for Date and numeric columns if the
default can not be applied.
• Column: The cells in this column are automatically
filled with the defined schema column names.
• Has Thousand Delimiters: Select this check box if
the value of the corresponding numeric column (only
for numeric column) in the file contains thousand
separators.
• Alternative Format: Specify the necessary format
as String value if a special format is expected. The
necessary format will be created from the schema
column length and precision. For more information
about format models, see EXASolution User Manual.
Source table columns If the source is a database, configure the mapping between
the source columns and the target columns in this table.
Specifically configuring the mapping is optional. If you set
nothing here, it is assumed that the source table has the
same structure as the target table.
• Column: The schema column in the target table.
• Source column name: The name of the column in the
source table.
Column Separator Enter the separator for the columns of a row in the local
file.
Column Delimiter Enter the delimiter that encapsulates the field content in
the local file.
Row Separator Enter the char used to separate the rows in the local file.
Null representation Enter the string that represents a NULL value in the local
file. If not specified, NULL values are represented as the
empty string.
Skip rows Enter the number of rows (for example, header or any other
prefix rows) to be omitted.
Encoding Enter the character set used in the local file. By default, it is
UTF8.
Trim column values Specify whether spaces are deleted at the border of CSV
columns.
• No trim: no spaces are trimmed.
• Trim: spaces from both left and right sides are
trimmed.
883
tEXABulkExec
• Trim only left: spaces from only the left side are
trimmed.
• Trim only right: spaces from only the right side are
trimmed.
Default Date Format Specify the format for datetime values. By default, it is
YYYY-MM-DD.
Default Timestamp Format Specify the timestamp format used. By default, it is YYYY-
MM-DD HH24:MI:SS.FF3.
Decimal Separator Specify the character used to separate the integer part
of a number from the fraction. In the numeric format, the
character will be applied to the placeholder D.
Note that this setting affects the connection property
NLS_NUMERIC_CHARACTERS that defines the decimal and
group characters used for representing numbers.
Minimal number errors to reject the transfer Specify the maximum number of invalid rows allowed
during the data loading process. For example, the value 2
means the loading process will stop if the third error occurs.
Log Error Destination Specify the location where error messages will be stored.
• No Logging: error messages will not be saved.
• Local Log File: error messages will be stored in a
specified local file.
• Local Error Log File: specify the path to the local
file that stores error messages.
• Add current timestamp to log file name (before
extension): select this check box to add the
current timestamp before the extension of the file
name for identification reasons in case you use
the same file multiple times.
• Logging Table: error messages will be stored in a
specified table. The table will be created if it does not
exist.
• Error Log Table: enter the name of the table that
stores error messages.
• Use current timestamp to build log table: select
this check box to use the current timestamp to
build the log table for identification reasons in
case you use the same table multiple times.
Transfer files secure Select this check box to transfer the file over HTTPS instead
of HTTP.
Test mode (no statements are executed) Select this check box to have the component running in test
mode, where no statements are executed.
884
tEXABulkExec
Use precision and length from schema Select this check box to check column values that are of
numeric types (that is, Double, Float, BigDecimal, Integer,
Long, and Short) against the Length setting (which sets the
number of integer digits) and the Precision setting (which
sets the number of decimal digits) in the schema. Only
the values with neither their number of integer digits nor
number of decimal digits larger than the Length setting and
the Precision setting are loaded.
For example, with Length set to 4 and Precision set to 3,
the values 8888.8888 and 88888.888 will be dropped;
the values 8888.88 and 888.888 will be loaded.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
885
tEXABulkExec
Local file
The local file is not transferred by uploading the file. Instead, the driver establishes a (secure)
web service that sends the URL to the database, and the database retrieves the file from this local
web service. Because the port of this service cannot be explicitly defined, this method requires a
transparent network between the local Talend Job and the EXASolution database.
Remote file
This method works with a file that is accessible on a server through the following protocols: SCP,
SFTP, FTP, HTTP, or HTTPS.
Use predefined connection It is possible, via the SQL interface, to set up a named connection in the
EXASolution database itself. Select this option if you want to use such a
connection, and provide its name.
To know what connections are available, look at the table SYS.EXA_DBA_CO
NNECTIONS in the database.
The connection must contain a URL with one of the following protocols: SCP,
SFTP, FTP, HTTP, or HTTPS.
The URL must not contain the file name. The file name is always dynamic and
must be provided by the component configuration.
Remote file server URL Specify the URL to the file server, without the file name itself.
File name Specify the name of the file you want to fetch from the server.
Query parameters If the web service depends on query parameters, specify them here.
For example, if you want to get a file from an HDFS file system via the web
service, you need to add some additional parameters such as open=true.
886
tEXABulkExec
Use user authentication Select this check box if you want to use Basic Authentication when connecting to
the web server.
Remote user and Remote users password Enter the user name and password need to access the web server.
EXASol database
An EXASolution database can also serve as a remote source for the data. The source can be a table or
a specific query.
Use predefined connection It is possible, via the SQL interface, to set up a named connection in the
EXASolution database itself. Select this option if you want to use such a
connection, and provide its name.
To know what connections are available, look at the table SYS.EXA_DBA_CO
NNECTIONS in the database.
The username and password must by provided by the component and not as part
of the predefined connection.
EXASol database host Specify the host of the remote EXASolution database.
This field can also be used to access a cluster.
Use self defined query Select this check box if you want to use a specific query to get the data.
This method is preferred if, for example, your data needs to be filtered (using a
where condition), joined or converted.
Source query If you want to use a specific query, enter the query in this field.
Database or schema If you are not using a specific query, enter the schema name for the source table
in this field.
Source table If you are not using a specific query, enter the table name in this field.
The mapping between the source table columns and the target table columns
(schema columns) can be set in the advanced settings.
Use user authentication Select this check box if you want to use Basic Authentication when connecting to
the source database.
Remote user and Remote users password Enter the user name and password needed to access the source database.
Oracle database
An Oracle database can also serve as remote source for the data. Access to an Oracle database
requires an Enterprise license for the EXASolution database and does not work with the free edition.
The source can be a table or a specific query.
Use predefined connection It is possible, via the SQL interface, to set up a named connection in the
EXASolution database itself. Select this option if you want to use such a
connection, and provide its name.
To know what connections are available, look at the table SYS.EXA_DBA_CO
NNECTIONS in the database.
The username and password must by provided by the component and not as part
of the predefined connection.
Oracle database URL Specify the JDBC URL to the Oracle database.
887
tEXABulkExec
Use self defined query Select this check box if you want to use a specific query to get the data.
This method is preferred if, for example, your data needs to be filtered (using a
where condition), joined or converted.
Source query If you want to use a specific query, enter the query in this field.
Database or schema If you are not using a specific query, enter the schema name for the source table
in this field.
Source table If you are not using a specific query, enter the table name in this field.
The mapping between the source table columns and the target table columns
(schema columns) can be set in the advanced settings.
Use user authentication Select this check box if you want to use Basic Authentication when connecting to
the source database.
Remote user and Remote users password Enter the user name and password needed to access the source database.
JDBC-compliant database
The free edition of the EXASolution database supports MySQL and PostgreSQL databases, and others
are available in the Enterprise edition. The source can be table or a self defined query.
Nearly all enterprise-grade databases provide a JDBC interface.
Use predefined connection It is possible, via the SQL interface, to set up a named connection in the
EXASolution database itself. Select this option if you want to use such a
connection, and provide its name.
To know what connections are available, look at the table SYS.EXA_DBA_CO
NNECTIONS in the database.
The username and password must by provided by the component and not as part
of the predefined connection.
JDBC database URL Specify the JDBC URL to the source database.
Use self defined query Select this check box if you want to use a specific query to get the data.
This method is preferred if, for example, your data needs to be filtered (using a
where condition), joined or converted.
Source query If you want to use a specific query, enter the query in this field.
Database or schema If you are not using a specific query, enter the schema name for the source table
in this field.
Source table If you are not using a specific query, enter the table name in this field.
The mapping between the source table columns and the target table columns
(schema columns) can be set in the advanced settings.
Use user authentication Select this check box if you want to use Basic Authentication when connecting to
the source database.
Remote user and Remote users password Enter the user name and password needed to access the source database.
888
tEXABulkExec
Procedure
1. Double-click the tFixedFlowInput component to open its Basic settings view.
889
tEXABulkExec
2. Click the [...] button next to Edit schema to open the Schema dialog box.
3. Click the [+] button to add six columns: EmployeeID of the Integer type, EmployeeName, OrgTeam
and JobTitle of the String type, OnboardDate of the Data type with the yyyy-MM-dd date pattern,
and MonthSalary of the Double type.
4. Click OK to close the dialog box and accept schema propagation to the next component.
890
tEXABulkExec
5. In the Mode area, select Use Inline Content (delimited file) and enter the following employee data
in the Content field.
12000;James;Dev Team;Developer;2008-01-01;15000.01
12001;Jimmy;Dev Team;Developer;2008-11-22;13000.11
12002;Herbert;QA Team;Tester;2008-05-12;12000.22
12003;Harry;Doc Team;Technical Writer;2009-03-10;12000.33
12004;Ronald;QA Team;Tester;2009-06-20;12500.44
12005;Mike;Dev Team;Developer;2009-10-15;14000.55
12006;Jack;QA Team;Tester;2009-03-25;13500.66
12007;Thomas;Dev Team;Developer;2010-02-20;16000.77
12008;Michael;Dev Team;Developer;2010-07-15;14000.88
12009;Peter;Doc Team;Technical Writer;2011-02-10;12500.99
7. In the File Name field, specify the file into which the input data will be written. In this example, it
is "E:/employee.csv".
8. Click Advanced settings to open the Advanced settings view of the tFileOutputDelimited
component.
9. Select the Advanced separator (for numbers) check box and in the Thousands separator and
Decimal separator fields displayed, specify the separators for thousands and decimal. In this
example, the default values "," and "." are used.
Loading the source data into a newly created EXASolution database table
Procedure
1. Double-click the tEXABulkExec component to open its Basic settings view.
891
tEXABulkExec
2. Fill in the Host, Port, Schema, User and Password fields with your EXASolution database
connection details.
3. In the Table field, enter the name of the table into which the source data will be written. In this
example, the target database table is named "employee" and it does not exist.
4. Select Create table from the Action on table list to create the specified table.
5. In the Source area, select Local file as the source for the input data, and then specify the file that
contains the source data. In this example, it is "E:/employee.csv".
6. Click the [...] button next to Edit schema to open the Schema dialog box and define the schema,
which should be the same as that of the tFixedFlowInput component.
Then click OK to validate these changes and close the dialog box.
7. Click Advanced settings to open the Advanced settings view of the tEXABulkExec component.
8. In the Column Formats table, for the two numeric fields EmployeeID and MonthSalary, select the
corresponding check boxes in the Has Thousand Delimiters column, and then define their format
892
tEXABulkExec
model strings in the corresponding fields of the Alternative Format column. In this example,
"99G999" for EmployeeID and "99G999D99" for MonthSalary.
9. Make sure that the Thousands Separator and Decimal Separator fields have values identical to
those of the tFileOutputDelimited component and keep the default settings for the other options.
Procedure
1. Double-click the tEXAInput component to open its Basic settings view.
2. Fill in the Host name, Port, Schema name, Username and Password fields with your EXASolution
database connection details.
3. In the Table Name field, enter the name of the table from which the data will be retrieved. In this
example, it is "employee".
4. Click the [...] button next to Edit schema to open the Schema dialog box and define the schema,
which should be the same as that of the tFixedFlowInput component.
Then click OK to close the dialog box and accept schema propagation to the next component.
5. Click the Guess Query button to fill the Query field with the following auto-generated SQL
statement that will be executed on the specified table.
SELECT employee.EmployeeID,
employee.EmployeeName,
employee.OrgTeam,
employee.JobTitle,
employee.OnboardDate,
employee.MonthSalary
FROM employee
893
tEXABulkExec
7. In the Mode area, select the Table (print values in cells of a table) option for better readability of
the output.
As shown above, the employee data is written into the specified EXASolution database table and
is then retrieved and displayed on the console.
894
tEXAClose
tEXAClose
Closes an active connection to an EXASolution database instance to release the occupied resources.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
895
tEXAClose
Related scenario
No scenario is available for the Standard version of this component yet.
896
tEXACommit
tEXACommit
Validates the data processed through the Job into the connected EXASolution database.
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Component List Select the tEXAConnection component for which you want
the commit action to be performed.
Close Connection This check box is selected by default and it allows you
to close the database connection once the commit is
done. Clear this check box to continue to use the selected
connection after the component has performed its task.
Warning:
If you want to use a Row > Main connection to link
tEXACommit to your Job, your data will be committed row
by row. In this case, do not select the Close Connection
check box or your connection will be closed before the end
of your first row commit.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
897
tEXACommit
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenario
For a similar scenario using other database, see Inserting data in mother/daughter tables on page
2426.
898
tEXAConnection
tEXAConnection
Opens a connection to an EXASolution database instance that can then be reused by other
EXASolution components.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Host Enter the host or host list of the EXASol database servers.
EXASol can run in a cluster environment. The valid value
can be a simple IP address (for example, 172.16.173.128
), an IP range list (for example, 172.16.173.128..130 that
represents three servers 172.16.173.128, 172.16.173.129
, and 172.16.173.130), or a comma-separated host list
(for example, server1,server2,server3) of the EXASolution
database cluster.
Username and Password Enter the user authentication data to access the
EXASolution database.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
899
tEXAConnection
Advanced settings
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
Additional JDBC Parameters Specify additional connection properties for the database
connection you are creating. The properties are separated
by semicolon and each property is a key-value pair, for
example, encryption=1;clientname=Talend.
This field is not available if the Use an existing connection
check box is selected.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
900
tEXAConnection
Usage
Related scenario
For a similar scenario using other database, see Inserting data in mother/daughter tables on page
2426.
901
tEXAInput
tEXAInput
Retrieves data from an EXASolution database based on a query with a strictly defined order which
corresponds to the schema definition, and passes the data to the next component.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and from the list displayed select the
relevant connection component to reuse the connection
details you have already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
902
tEXAInput
Host name Enter the host or host list of the EXASol database servers.
EXASol can run in a cluster environment. The valid value
can be a simple IP address (for example, 172.16.173.128
), an IP range list (for example, 172.16.173.128..130 that
represents three servers 172.16.173.128, 172.16.173.129
, and 172.16.173.130), or a comma-separated host list
(for example, server1,server2,server3) of the EXASolution
database cluster.
Schema name Enter the name of the schema you want to use.
Username and Password Enter the user authentication data to access the
EXASolution database.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Query Type and Query Enter the database query, paying particularly attention to
the proper sequence of the fields in order to match the
schema definition.
Guess Query Click the button to generate the query that corresponds to
the table schema in the Query field.
Guess schema Click the button to retrieve the schema from the table.
903
tEXAInput
Advanced settings
Change fetch size Select this check box to change the fetch size which
specifies the amount of resultset data sent during one
single communication step with the database. In the Fetch
size field displayed, you need to enter the size in KB.
Additional JDBC parameters Specify additional connection properties for the database
connection you are creating. The properties are separated
by semicolon and each property is a key-value pair, for
example, encryption=1;clientname=Talend.
This field is not available if the Use an existing connection
check box is selected.
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespaces from all the String/Char columns.
Trim column Select the check box in the Trim column to remove leading
and trailing whitespaces from the corresponding field.
This table is not available if the Trim all the String/Char
columns check box is selected.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
904
tEXAInput
Related scenario
For a related scenario, see Importing data into an EXASolution database table from a local CSV file on
page 889.
For similar scenarios using other databases, see:
905
tEXAOutput
tEXAOutput
Writes, updates, modifies or deletes data in an EXASolution database by executing the action
defined on the table and/or on the data in the table, based on the flow incoming from the preceding
component.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and from the list displayed select the
relevant connection component to reuse the connection
details you have already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
906
tEXAOutput
Host Enter the host or host list of the EXASol database servers.
EXASol can run in a cluster environment. The valid value
can be a simple IP address (for example, 172.16.173.128
), an IP range list (for example, 172.16.173.128..130 that
represents three servers 172.16.173.128, 172.16.173.129
, and 172.16.173.130), or a comma-separated host list
(for example, server1,server2,server3) of the EXASolution
database cluster.
Schema name Enter the name of the schema you want to use.
Username and Password Enter the user authentication data to access the
EXASolution database.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Table Enter the name of the table to be written. Note that only on
e table can be written at a time.
Action on table On the table defined, you can perform one of the following
operations:
• None: No operation is carried out.
• Drop and create table: The table is removed and
created again.
• Create table: The table does not exist and gets
created.
• Create table if does not exist: The table is created if it
does not exist.
• Drop table if exists and create: The table is removed if
it already exists and created again.
• Clear table: The table content is deleted.
• Truncate table: The table content is deleted. You don
not have the possibility to rollback the operation.
Action on data On the data of the table defined, you can perform:
• Insert: Add new entries to the table. If duplicates are
found, Job stops.
• Update: Make changes to existing entries
• Insert or update: Insert a new record. If the record with
the given reference already exists, an update would be
made.
• Update or insert: Update the record with the given
reference. If the record does not exist, a new record
would be inserted.
• Delete: Remove entries corresponding to the input
flow.
907
tEXAOutput
Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You
can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define
primary keys for the update and delete operations. To
do that: Select the Use field options check box and then
in the Key in update column, select the check boxes
next to the column name on which you want to base
the update operation. Do the same in the Key in delete
column for the deletion operation.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
Use commit control Select this box to display the Commit every field in which
you can define the number of rows to be processed before
committing.
908
tEXAOutput
Additional JDBC parameters Specify additional connection properties for the database
connection you are creating. The properties are separated
by semicolon and each property is a key-value pair, for
example, encryption=1;clientname=Talend.
This field is not available if the Use an existing connection
check box is selected.
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
• Name: Enter the name of the column to be modified or
inserted.
• SQL expression: Enter the SQL expression to be
executed to modify or insert data in the corresponding
columns.
• Position: Select Before, After or Replace, depending on
the action to be carried out on the reference column.
• Reference column: Type in a column of reference that
can be used to place or replace the new or altered
column.
Use field options Select this check box to customize a request for the
corresponding column, particularly if multiple actions are
being carried out on the data.
• Key in update: Select the check box for the
corresponding column based on which the data is
updated.
• Key in delete: Select the check box for the
corresponding column based on which the data is
deleted.
• Updatable: Select the check box if the data in the
corresponding column can be updated.
• Insertable: Select the check box if the data in the
corresponding column can be inserted.
Debug query mode Select this check box to display each step during processing
entries in a database.
Use batch mode Select this check box to activate the batch mode for data
processing, and in the Batch Size field displayed enter the
number of records to be processed in each batch.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
909
tEXAOutput
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in an EXASolution database. It also allows you to
create a reject flow using a Row > Rejects link to filter data
in error. For a related scenario, see Retrieving data in error
with a Reject link on page 2474.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
910
tEXAOutput
Related scenario
For similar scenarios using other databases, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.
911
tEXARollback
tEXARollback
Cancels the transaction commit in the connected EXASolution database.
It allows you to roll back any changes made in the EXASolution database to prevent partial
transaction commit if an error occurs.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Component List Select the tEXAConnection component for which you want
the rollback action to be performed.
Close Connection This check box is selected by default and it allows you
to close the database connection once the rollback is
done. Clear this check box to continue to use the selected
connection after the component has performed its task.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
912
tEXARollback
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related Scenario
For a similar scenario using other database, see Rollback from inserting data in mother/daughter
tables on page 2429.
913
tEXARow
tEXARow
Executes SQL queries on an EXASolution database.
Depending on the nature of the query and the database, tEXARow acts on the actual structure of the
database, or indeed the data, although without modifying them. The Row suffix indicates that it is
used to channel a flow in a Job although it does not produce any output data.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and from the list displayed select the
relevant connection component to reuse the connection
details you have already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Host Enter the host or host list of the EXASol database servers.
EXASol can run in a cluster environment. The valid value
can be a simple IP address (for example, 172.16.173.128
), an IP range list (for example, 172.16.173.128..130 that
914
tEXARow
Schema name Enter the name of the schema you want to use.
Username and Password Enter the user authentication data to access the
EXASolution database.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Guess Query Click the Guess Query button to generate the query that
corresponds to the table schema in the Query field.
915
tEXARow
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
Additional JDBC parameters Specify additional connection properties for the database
connection you are creating. The properties are separated
by semicolon and each property is a key-value pair, for
example, encryption=1;clientname=Talend.
This field is not available if the Use an existing connection
check box is selected.
Propagate QUERY's recordset Select this check box to insert the query results in one of
the flow columns. Select the particular column from the use
column list.
Use PreparedStatement Select this check box to use prepared statements and in
the Set PreparedStatement Parameters table displayed,
add as many parameters as needed and set the following
attributes for each parameter:
• Parameter Index: enter the index of the prepared
statement parameter.
• Parameter Type: click in the cell and select the type of
the parameter from the list.
• Parameter Value: enter the value of the parameter.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
916
tEXARow
Usage
Related Scenario
For similar scenarios using other databases, see:
• Procedure on page 622,
• Removing and regenerating a MySQL table index on page 2497.
917
tEXistConnection
tEXistConnection
Opens a connection to an eXist database in order that a transaction may be carried out.
Basic settings
Note:
Users can enter a different driver, depending on their
needs.
Advanced settings
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.
Usage
Usage rule This component is more commonly used with other tEXist*
components, especially with the tEXistGet and tEXistPut
components. If you set the connection properties in the
tEXistConnection component, you can reuse the connection
for other tEXist* components in the same Job.
eXist-db is an open source database management system
built using XML technology. It stores XML data according
to the XML data model and features efficient, index-based
XQuery processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension,
see XQuery update extension.
918
tEXistConnection
Related scenarios
For tEXistConnection related scenario, see tMysqlConnection on page 2425
919
tEXistDelete
tEXistDelete
Deletes specified resources from a remote eXist database.
Basic settings
Use an existing connection/Component List Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Note:
Users can enter a different driver, depending on their
needs.
Files Click the plus button to add the lines you want to use as
filters:
Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions.
Advanced settings
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.
920
tEXistDelete
Global Variables
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
921
tEXistGet
tEXistGet
Retrieves selected resources from a remote eXist database to a defined local directory.
Basic settings
Use an existing connection/Component List Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Note:
Users can enter a different driver, depending on their
needs.
Files Click the plus button to add the lines you want to use as
filters:
Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions
Advanced settings
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.
922
tEXistGet
Global Variables
Usage
923
tEXistGet
Procedure
Procedure
1. Drop the tEXistGet component from the Palette into the design workspace.
2. Double-click the tEXistGet component to open the Component view and define the properties in
its Basic settings view.
3. Fill in the URI field with the URI of the eXist database you want to connect to.
In this scenario, the URI is xmldb:exist://192.168.0.165:8080/exist/xmlrpc. Note that the URI used in
this use case is for demonstration purposes only and is not an active address.
4. Fill in the Collection field with the path to the collection of interest on the database server, /db/
talend in this scenario.
5. Fill in the Driver field with the driver for the XML database, org.exist.xmldb.DatabaseImpl in this
scenario.
6. Fill in the Username and Password fields by typing in admin and talend respectively in this
scenario.
7. Click the three-dot button next to the Local directory field to set a path for saving the XML file
downloaded from the remote database server.
In this scenario, set the path to your desktop, for example C:/Documents and Settings/galano/Des
ktop/ExistGet.
8. In the Files field, click the plus button to add a new line in the Filemask area, and fill it with a
complete file name to retrieve data from a particular file on the server, or a filemask to retrieve
data from a set of files. In this scenario, fill in dictionary_en.xml.
9. Save your Job and press F6 to execute it.
924
tEXistGet
The XML file dictionary_en.xml is retrieved and downloaded to the defined local directory.
925
tEXistList
tEXistList
Lists the resources stored on a remote eXist database.
Basic settings
Use an existing connection/Component List Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Note:
Users can enter a different driver, depending on their
needs.
Files Click the plus button to add the lines you want to use as fil
ters:.
Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions.
Advanced settings
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.
926
tEXistList
Global Variables
Global Variables NB_FILE: the number of files iterated upon. This is an After
variable, and it returns an integer.
CURRENT_FILE: the current file name. This is a Flow
variable and it returns a string.
CURRENT_FILEPATH: the current file path. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
Related scenario
For a related scenario, see Listing and getting files/folders on an FTP directory on page 1230.
927
tEXistPut
tEXistPut
Uploads specified files from a defined local directory to a remote eXist database.
Basic settings
Use an existing connection/Component List Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Note:
Users can enter a different driver, depending on their
needs.
Files Click the plus button to add the lines you want to use as fil
ters:.
Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions.
Advanced settings
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.
928
tEXistPut
Global Variables
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
929
tEXistXQuery
tEXistXQuery
Queries XML files located on remote databases using local files containing XPath queries and outputs
the results to an XML file stored locally.
Basic settings
Use an existing connection/Component List Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Collection Enter the path to the XML file location on the database.
Note:
Users can enter a different driver, depending on their
needs.
XQuery Input File Browse to the local file containing the query to be executed.
Local Output Browse to the directory in which the query results should be
saved.
Advanced settings
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.
Global Variables
930
tEXistXQuery
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
931
tEXistXUpdate
tEXistXUpdate
Processes XML file records and updates the existing records on the database server.
Basic settings
Use an existing connection/Component List Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Collection Enter the path to the collection and file of interest on the da
tabase server.
Note:
Users can enter a different driver, depending on their
needs.
Update File Browse to the local file in the local directory to be used to
update the records on the database.
Advanced settings
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.
Global Variables
932
tEXistXUpdate
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
933
tExternalSortRow
tExternalSortRow
Sorts input data based on one or several columns, by sort type and order, using an external sort
application.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component connected in the Job.
File Name Name or path to the file to be processed and/or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
External command "sort" path Enter the path to the external file containing the sorting
algorithm to use.
934
tExternalSortRow
Criteria Click the plus button to add as many lines as required for
the sort to be complete. By default the first column defined
in your schema is selected.
Advanced settings
Maximum memory Type in the size of physical memory you want to allocate to
sort processing.
Set temporary input file directory Select the check box to activate the field in which you can
specify the directory to handle your temporary input file.
Add a dummy EOF line Select this check box when using the tAggregateSortedRow
component.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
935
tExternalSortRow
Related scenario
For related use case, see tSortRow on page 3465.
936
tExtractDelimitedFields
tExtractDelimitedFields
Generates multiple columns from a delimited string column.
The extracted fields are written in new columns of the output schema. If you need to keep the original
columns in the output of this component, define these columns in the output schema using the same
column names as the original ones.
Basic settings
Field to split Select an incoming field from the Field to split list to split.
Ignore NULL as the source data Select this check box to ignore the Null value in the source
data.
Clear this check box to generate the Null records that
correspond to the Null value in the source data.
Note:
Since this component uses regex to split a filed and the
regex syntax uses special characters as operators, make
sure to precede the regex operator you use as a field
separator by a double backslash. For example, you have
to use "\\|" instead of "|".
Die on error Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
937
tExtractDelimitedFields
Built-In: You create and store the schema locally for this
component only.
Advanced settings
Advanced separator (for number) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Trim column Select this check box to remove leading and trailing
whitespace from all columns.
Check each row structure against schema Select this check box to check whether the total number
of columns in each row is consistent with the schema. If
not consistent, an error message will be displayed on the
console.
Validate date Select this check box to check the date format strictly
against the input schema.
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
938
tExtractDelimitedFields
Usage
Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.
939
tExtractDelimitedFields
2. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
three columns: Id of Integer type, and Name and DelimitedField of String type.
Click OK to close the schema editor and accept the propagation prompted by the pop-up dialog
box.
3. In the Mode area, select Use Inline Content(delimited file). Then in the Content field displayed,
enter the data to write to the database. This input data includes a delimited string column. In this
example, the input data is as follows:
1;Adam;32,Component Team,Developer
2;Bill;28,Component Team,Tester
3;Chris;30,Doc Team,Writer
4;David;35,Doc Team,Leader
5;Eddie;33,QA Team,Tester
940
tExtractDelimitedFields
5. Fill the Host, Port, Database, Username, Password fields with the MySQL database connection
information.
6. Fill the Table field with the name of the table to be written. In this example, it is employee.
7. Select Drop table if exists and create from the Action on table list.
8. Double-click the first tLogRow to open its Basic settings view.
In the Mode area, select Table (print values in cells of a table) for better readability of the result.
Extracting the delimited string column in the database table into multiple columns
Procedure
1. Double-click tMysqlInput to open its Basic settings view.
941
tExtractDelimitedFields
2. Fill the Host, Port, Database, Username, Password fields with the MySQL database connection
information.
3. Click the [...] button next to Edit schema and in the pop-up window define the schema of the
tMysqlInput component same as the schema of the tMysqlOutput component.
4. In the Table Name field, enter the name of the table into which the data was written. In this
example, it is employee.
5. Click the Guess Query button to fill the Query field with the SQL query statement to be executed
on the specified table. In this example, it is as follows:
SELECT
`employee`.`Id`,
`employee`.`Name`,
`employee`.`DelimitedField`
FROM `employee`
942
tExtractDelimitedFields
7. In the Field to split list, select the delimited string column to be extracted. In this example, it is
DelimitedField.
In the Field separator, enter the separator used to separate the fields in the delimited string
column. In this example, it is ,.
8. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
five columns: Id of Integer type, and Name, Age, Team, Title of String type.
In this example, the delimited string column DelimitedField is split into three columns Age, Team
and Title, and the Id and Name columns are kept as well.
Click OK to close the schema editor and accept the propagation prompted by the pop-up dialog
box.
9. Double-click the second tLogRow to open its Basic settings view.
In the Mode area, select Table (print values in cells of a table) for better readability of the result.
943
tExtractDelimitedFields
As shown above, the primitive input data and the data after extraction are displayed on the
console, and the delimited string column DelimitedField is extracted into three columns Age, Team,
and Title.
944
tExtractJSONFields
tExtractJSONFields
Extracts the desired data from JSON fields based on the JSONPath or XPath query.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.
945
tExtractJSONFields
Loop Jasonpath query Enter the path pointing to the node within the JSON field,
on which the loop is based.
Note if you have selected Xpath from the Read by drop-
down list, the Loop Xpath query field is displayed instead.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
Use the loop node as root Select this check box to use the loop node as the root for
querying the file.
The loop node is set in the Loop Json query text frame in
the Basic Settings view. If this option is checked, only the
child elements of the loop node are available for querying;
otherwise, both the parent elements and the child elements
of the loop node can be queried. You can specify a parent
element through JSON path syntax.
This check box is available only when JsonPath is selected
in the Read By drop-down list of the Basic settings view.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
946
tExtractJSONFields
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
947
tExtractJSONFields
Procedure
1. Double-click tFixedFlowInput to display its Basic settings view.
948
tExtractJSONFields
Click the [+] button to add three columns, namely firstname, lastname and dept, with the type of
string.
Click OK to close the editor.
3. Select Use Inline Content and enter the data below in the Content box:
Andrew;Wallace;Doc
John;Smith;R&D
Christian;Dior;Sales
Procedure
1. Click tWriteJSONField to display its Basic settings view.
949
tExtractJSONFields
Repeat the steps to add two more sub-nodes, namely lastname and dept.
6. Right-click firstname and select Set As Loop Element from the context menu.
7. Drop firstname from the Linker source panel to its counterpart in the Linker target panel.
In the pop-up dialog box, select Add linker to target node.
10. Click the [+] button in the right panel to add one column, namely staff, which will hold the JSON
data generated.
Click OK to close the editor.
950
tExtractJSONFields
Procedure
1. Double-click tExtractJSONFields to display its Basic settings view.
3. Click the [+] button in the right panel to add three columns, namely firstname, lastname and dept,
which will hold the data of their counterpart nodes in the JSON field staff.
Click OK to close the editor.
4. In the pop-up Propagate box, click Yes to propagate the schema to the subsequent components.
951
tExtractJSONFields
5. In the Loop XPath query field, enter "/staff", which is the root node of the JSON data.
6. In the Mapping area, type in the node name of the JSON data under the XPath query part. The
data of those nodes will be extracted and passed to their counterpart columns defined in the
output schema.
7. Specifically, define the XPath query "firstname" for the column firstname, "lastname" for the column
lastname, and "" for the column dept. Note that "" is not a valid XPath query and will lead to
execution errors.
Procedure
1. Double-click data_extracted to display its Basic settings view.
2. Select Table (print values in cells of a table) for a better display of the results.
3. Perform the same setup on the other tLogRow component, namely reject_info.
As shown above, the reject row offers such details as the data extracted, the JSON fields whose
data is not extracted and the cause of the extraction failure.
952
tExtractJSONFields
953
tExtractJSONFields
Click the [+] button to add one column, namely friends, of the String type.
Click OK to close the editor.
3. Click the [...] button to browse for the JSON file, facebook.json in this case:
954
tExtractJSONFields
7. Click the [+] button in the right panel to add five columns, namely id, name, like_id, like_name and
like_category, which will hold the data of relevant nodes in the JSON field friends.
Click OK to close the editor.
8. In the pop-up Propagate box, click Yes to propagate the schema to the subsequent components.
955
tExtractJSONFields
10. In the Mapping area, type in the queries of the JSON nodes in the XPath query column. The data
of those nodes will be extracted and passed to their counterpart columns defined in the output
schema.
11. Specifically, define the XPath query "../../id" (querying the "/friends/id" node) for the column id, "../../
name" (querying the "/friends/name" node) for the column name, "id" for the column like_id, "name"
for the column like_name, and "category" for the column like_category.
12. Double-click tLogRow to display its Basic settings view.
13. Select Table (print values in cells of a table) for a better display of the results.
As shown above, the friends data of the Facebook user Kelly Clarkson is extracted correctly.
956
tExtractJSONFields
{
"Guid": "a2hdge9-5517-4e12-b9j6-887ft29e1711",
"Transactions": [
{
"TransactionId": 1,
"Products": [
{
"ProductId": "A1",
"Packs": [
{
"Quantity": 20,
"Price": 40.00,
"Due_Date": "2019/03/01"
}
]
}
]
},
{
"TransactionId": 2,
"Products": [
{
"ProductId": "B1",
"Packs": [
{
"Quantity": 1,
"Price": 15.00,
"Due_Date": "2019/01/01"
},
{
"Quantity": 21,
"Price": 315.00,
"Due_Date": "2019/02/14"
}
]
}
]
},
{
"TransactionId": 3,
"Products": [
{
"ProductId": "C1",
"Packs": [
{
"Quantity": 2,
"Price": 5.00,
"Due_Date": "2019/02/19"
},
{
"Quantity": 3,
"Price": 7.50,
"Due_Date": "2019/05/21"
}
]
}
]
}
]
}
957
tExtractJSONFields
Procedure
1. Create a Job and add a tFileInputJSON component, three tExtractJsonFields components, and a
tLogRow component.
2. Connect the components using Row > Main connections.
Procedure
1. In the Basic settings view of the tFileInputJSON component, select JsonPath from the Read By
drop-down list.
2. In the filename field, specified the input JSON file, sample.json in this example.
3. In the schema editor, add two columns, Guid (type String) and Transactions (type Object).
958
tExtractJSONFields
4. Click Yes in the subsequent dialog box to propagate the schema to the next component.
The columns just added appear in the Mapping table of the Basic settings view.
5. In the Basic settings view, enter "$" in the Loop Json query text box to loop the elements within
the root elements.
6. In the Json query column of the Mapping table, enter the following Json query expressions in
double quotation marks.
• $.Guid to extract the value of the Guid element;
• $.Transactions to extract the content of the Transactions element.
Procedure
1. In the schema editor of the first tExtractJSONFileds component, add the following columns in the
output table.
• Guid, type String;
• TransactionId, type Integer;
• Products, type Object
2. Close the schema editor and click Yes in the subsequent dialog box to propagate the schema to
the next component.
The columns just added appear in the Mapping table of the Basic settings view.
959
tExtractJSONFields
The settings loop all the elements within the Transactions element and extract the values of the
TransactionId and the Products elements.
4. In the schema editor of the second tExtractJSONFileds component, add the following columns in
the output table.
• Guid, type String;
• TransactionId, type Integer;
• ProductId, type String;
• Packs, type Object
5. Close the schema editor and click Yes in the subsequent dialog box to propagate the schema to
the next component.
The columns just added appear in the Mapping table of the Basic settings view.
6. Set the other options in the Basic settings view as follows.
• JSON field: Products;
• Loop Jsonpath query: "*" (in double quotation marks);
• Guid: empty, for receiving the Guid value from the previous component;
• TransactionId: empty, for receiving the TransactionId from the previous component;
• ProductId: "ProductId" (in double quotation marks);
• Packs: "Packs" (in double quotation marks);
• Others: unchanged
The settings in the above figure loop all the elements within the Products element and extract
the values of the ProductId and the Packs elements.
960
tExtractJSONFields
7. In the schema editor of the third tExtractJSONFileds component, add the following columns in the
output table.
• Guid, type String;
• TransactionId, type Integer;
• ProductId, type String;
• Quantity, type Integer;
• Price, type Float;
• Due_Date, type Date
8. Close the schema editor and click Yes in the subsequent dialog box to propagate the schema to
the next component.
The columns just added appear in the Mapping table of the Basic settings view.
9. Set the other options in the Basic settings view as follows.
• JSON field: Packs;
• Loop Jsonpath query: "*" (in double quotation marks);
• Guid: empty, for receiving the Guid value from the previous component;
• TransactionId: empty, for receiving the TransactionId value from the previous component;
• ProductId: empty, for receiving the ProductId value from the previous component;
• Quantity: "Quantity" (in double quotation marks);
• Price: "Price" (in double quotation marks);
• Due_Date: "Due_Date" (in double quotation marks);
• Others: unchanged
The settings in the above figure loop all the elements within the Packs element and extract the
values of the Quantity, the Price, and the Due_Date elements.
Procedure
1. Open the Basic settings view of the tLogRow component.
961
tExtractJSONFields
Procedure
1. Press Ctrl+S to save the Job.
2. Press F6 to execute the Job. The following figure shows the result.
The values of the Guid element, the TransactionId element, the ProductId element, the Quantity
element, the Price element, and the Due_date element are extracted from the source JSON file
and displayed.
962
tExtractPositionalFields
tExtractPositionalFields
Extracts data and generates multiple columns from a formatted string using positional fields.
tExtractPositionalFields generates multiple columns from one column using positional fields.
Basic settings
Ignore NULL as the source data Select this check box to ignore the Null value in the source
data.
Clear this check box to generate the Null records that co
rrespond to the Null value in the source data.
Customize Select this check box to customize the data format of the
positional file and define the table columns:
Column: Select the column you want to customize.
Size: Enter the column size.
Padding char: Type in between inverted commas the
padding character used, in order for it to be removed from
the field. A space by default.
Alignment: Select the appropriate alignment parameter.
Die on error Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
963
tExtractPositionalFields
Built-In: You create and store the schema locally for this
component only.
Advanced settings
Advanced separator (for number) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Trim Column Select this check box to remove leading and trailing
whitespace from all columns.
Check each row structure against schema Select this check box to check whether the total number
of columns in each row is consistent with the schema. If
not consistent, an error message will be displayed on the
console.
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
964
tExtractPositionalFields
Usage
Related scenario
For a related scenario, see Extracting name, domain and TLD from e-mail addresses on page 967.
965
tExtractRegexFields
tExtractRegexFields
Extracts data and generates multiple columns from a formatted string using regex matching.
Basic settings
Field to split Select an incoming field from the Field to split list to split.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component connected in the Job.
Warning:
Make sure that the output schema does not contain any
column with the same name as the input column to be
split. Otherwise, the regular expression will not work as
expected.
Built-In: You create and store the schema locally for this
component only.
966
tExtractRegexFields
Advanced settings
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Check each row structure against schema Select this check box to check whether the total number
of columns in each row is consistent with the schema. If
not consistent, an error message will be displayed on the
console.
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
Usage
967
tExtractRegexFields
2. Click the [...] button next to the File name/Stream field to browse to the file where you want to
extract information from.
The input file used in this scenario is called test4. It is a text file that holds three columns: id,
email, and age.
id;email;age
1;[email protected];24
2;[email protected];31
3;[email protected];20
968
tExtractRegexFields
5. Select the column to split from the Field to split list: email in this scenario.
6. Enter the regular expression you want to use to perform data matching in the Regex panel. In
this scenario, the regular expression "([a-z]*)@([a-z]*).([a-z]*)" is used to match the
three parts of an email address: user name, domain name and TLD name.
For more information about the regular expression, see http://en.wikipedia.org/wiki/
Regular_expression.
7. Click Edit schema to open the Schema of tExtractRegexFields dialog box, and click the plus button
to add five columns for the output schema.
In this scenario, we want to split the input email column into three columns in the output flow,
name, domain, and tld. The two other input columns will be extracted as they are.
969
tExtractRegexFields
Results
The tExtractRegexFields component matches all given e-mail addresses with the defined regular
expression and extracts the name, domain, and TLD names and displays them on the console in three
separate columns. The two other columns, id and age, are extracted as they are.
970
tExtractXMLField
tExtractXMLField
Reads the XML structured data from an XML field and sends the data as defined in the schema to the
following component.
Basic settings
Schema type and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Loop XPath query Node of the XML tree, which the loop is based on.
971
tExtractXMLField
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
Ignore the namespaces Select this check box to ignore namespaces when reading
and extracting the XML data.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
972
tExtractXMLField
Procedure
Procedure
1. Drop the following components from the Palette onto the design workspace: tMysqlInput,
tExtractXMLField, and tFileOutputDelimited.
Connect the three components using Main links.
2. Double-click tMysqlInput to display its Basic settings view and define its properties.
3. If you have already stored the input schema in the Repository tree view, select Repository first
from the Property Type list and then from the Schema list to display the Repository Content
dialog box where you can select the relevant metadata.
For more information about storing schema metadata in the Repository tree view, see Talend
Studio User Guide.
If you have not stored the input schema locally, select Built-in in the Property Type and Schema
fields and enter the database connection and the data structure information manually. For more
information about tMysqlInput properties, see tMysqlInput on page 2437.
4. In the Table Name field, enter the name of the table holding the XML data, customerdetails in this
example.
Click Guess Query to display the query corresponding to your schema.
5. Double-click tExtractXMLField to display its Basic settings view and define its properties.
973
tExtractXMLField
6. Click Sync columns to retrieve the schema from the preceding component. You can click the
three-dot button next to Edit schema to view/modify the schema.
The Column field in the Mapping table will be automatically populated with the defined schema.
7. In the Xml field list, select the column from which you want to extract the XML data. In this
example, the filed holding the XML data is called CustomerDetails.
In the Loop XPath query field, enter the node of the XML tree on which to loop to retrieve data.
In the Xpath query column, enter between inverted commas the node of the XML field holding the
data you want to extract, CustomerName in this example.
8. Double-click tFileOutputDelimited to display its Basic settings view and define its properties.
9. In the File Name field, define or browse to the path of the output file you want to write the
extracted data in.
Click Sync columns to retrieve the schema from the preceding component. If needed, click the
three-dot button next to Edit schema to view the schema.
10. Save your Job and click F6 to execute it.
974
tExtractXMLField
Results
tExtractXMLField read and extracted the clients names under the node CustomerName of the
CustomerDetails field of the defined database table.
Procedure
Procedure
1. Drop the following components from the Palette to the design workspace: tFileInputDelimited,
tExtractXMLField, tFileOutputDelimited and tLogRow.
Connect the first three components using Row Main links.
Connect tExtractXMLField to tLogRow using a Row Reject link.
2. Double-click tFileInputDelimited to open its Basic settings view and define the component
properties.
975
tExtractXMLField
3. Select Built-in in the Schema list and fill in the file metadata manually in the corresponding
fields.
Click the three-dot button next to Edit schema to display a dialog box where you can define the
structure of your data.
Click the plus button to add as many columns as needed to your data structure. In this example,
we have one column in the schema: xmlStr.
Click OK to validate your changes and close the dialog box.
Note:
If you have already stored the schema in the Metadata folder under File delimited, select
Repository from the Schema list and click the three-dot button next to the field to display the
Repository Content dialog box where you can select the relevant schema from the list. Click Ok
to close the dialog box and have the fields automatically filled in with the schema metadata.
For more information about storing schema metadata in the Repository tree view, see Talend
Studio User Guide.
4. In the File Name field, click the three-dot button and browse to the input delimited file you want
to process, CustomerDetails_Error in this example.
This delimited file holds a number of simple XML lines separated by double carriage return.
Set the row and field separators used in the input file in the corresponding fields, double carriage
return for the first and nothing for the second in this example.
If needed, set Header, Footer and Limit. None is used in this example.
5. In the design workspace, double-click tExtractXMLField to display its Basic settings view and
define the component properties.
976
tExtractXMLField
6. Click Sync columns to retrieve the schema from the preceding component. You can click the
three-dot button next to Edit schema to view/modify the schema.
The Column field in the Mapping table will be automatically populated with the defined schema.
7. In the Xml field list, select the column from which you want to extract the XML data. In this
example, the filed holding the XML data is called xmlStr.
In the Loop XPath query field, enter the node of the XML tree on which to loop to retrieve data.
8. In the design workspace, double-click tFileOutputDelimited to open its Basic settings view and
display the component properties.
9. In the File Name field, define or browse to the output file you want to write the correct data in,
CustomerNames_right.csv in this example.
Click Sync columns to retrieve the schema of the preceding component. You can click the three-
dot button next to Edit schema to view/modify the schema.
10. In the design workspace, double-click tLogRow to display its Basic settings view and define the
component properties.
Click Sync Columns to retrieve the schema of the preceding component. For more information on
this component, see tLogRow on page 1977.
11. Save your Job and press F6 to execute it.
977
tExtractXMLField
Results
tExtractXMLField reads and extracts in the output delimited file, CustomerNames_right, the client
information for which the XML structure is correct, and displays as well erroneous data on the console
of the Run view.
978
tFileArchive
tFileArchive
Creates a new zip, gzip, or tar.gz archive file from one or more files or folders.
The archive file can be compressed using different compression method.
Basic settings
Subdirectories Select this check box if you want to add the files in the
subdirectories to the archive file.
This field is available only when zip is selected from the
Archive format list.
Source File Specify the path to the file that you want to add to the
archive file.
This field is available only when gzip is selected from the
Archive format list.
Create directory if does not exist Select this check box to create the destination folder if it
does not exist.
Archive format Select an archive file format from the list: zip, gzip, or tar.gz.
All files Select this check box if all files in the specified directory
will be added to the archive file. Clear it to specify the file(s)
you want to add to the archive file in the Files table.
979
tFileArchive
Overwrite Existing Archive This check box is selected by default. This allows you to
save an archive by replacing the existing one. But if you
clear the check box, an error is reported, the replacement
fails and the new archive cannot be saved.
Note:
When the replacement fails, the Job runs.
Encrypt files Select this check box if you want the archive file to be
password protected.
Encrypt method: select an encrypt method from the list, Java
Encrypt, Zip4j AES, or Zip4j STANDARD.
AES Key Strength: select a key strength for the Zip4j AES
method, either AES 128 or AES 256.
Enter Password: enter the encryption password.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
This check box is available only when zip is selected
from the Archive format list. With this check box selected,
the compressed archive file can be decompressed only
by the tFileUnarchive component and not by a common
archiver. For more information about tFileUnarchive, see
tFileUnarchive on page 1168.
ZIP64 mode This option allows for archives with the .zip64 extension to
be created, with three modes available:
• ASNEEDED: archives with the .zip64 extension will be
automatically created based on the file size.
• ALWAYS: archives with the .zip64 extension will be
created, no matter what size the file may be.
• NEVER: no archives with the .zip64 extension will be
created, no matter what size the file may be.
Note that if the file size or the total size of the archive
exceeds 4GB or there are more than 65536 files inside the
archive, you need to set the mode to ALWAYS.
Advanced settings
Use sync flush Select this check box to flush the compressor before
flushing the output stream. Clear this check box to flush
only the output stream.
980
tFileArchive
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
Usage
981
tFileArchive
Procedure
Procedure
1. Drop the tFileArchive component from the Palette onto the workspace.
2. Double-click it to display its Component view.
3. In the Directory field, click the [...] button, browse your directory and select the directory or the
file you want to compress.
4. Select the Subdirectories check box if you want to include the subfolders and their files in the
archive.
5. Then, set the Archive file field, by filling the destination path and the name of your archive file.
6. Select the Create directory if not exists check box if you do not have a destination directory yet
and you want to create it.
7. In the Compress level list, select the compression level you want to apply to your archive. In this
example, we use the normal level.
8. Clear the All Files check box if you only want to zip specific files.
982
tFileArchive
9. Add a row in the table by clicking the [+] button and click the name which appears. Between two
star symbols (ie. *RG*), type part of the name of the file that you want to compress.
10. Press F6 to execute your Job.
Results
The tFileArchive has compressed the selected file(s) and created the folder in the selected directory.
983
tFileCompare
tFileCompare
Compares two files and provides comparison data based on a read-only schema.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component.
The schema of this component is read-only.
If differences are detected, display and If no difference Type in a message to be displayed in the Run console based
detected, display on the result of the comparison.
Advanced settings
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
984
tFileCompare
Usage
Procedure
Procedure
1. Drag and drop the following components: tFileUnarchive, tFileCompare, and tFileOutputDelimited.
2. Link the tFileUnarchive to the tFileCompare with Iterate connection.
3. Connect the tFileCompare to the output component, using a Main row link.
4. In the tFileUnarchive component Basic settings, fill in the path to the archive to unzip.
5. In the Extraction Directory field, fill in the destination folder for the unarchived file.
985
tFileCompare
6. In the tFileCompare Basic settings, set the File to compare. Press Ctrl+Space bar to display the
list of global variables. Select $_globals{tFileUnarchive_1}{CURRENT_FILEPATH} or "((String)glob
alMap.get("tFileUnarchive_1_CURRENT_FILEPATH"))" according to the language you work with, to
fetch the file path from the tFileUnarchive component.
11. Then set the output component as usual with semi-colon as data separators.
12. Save your Job and press F6 to run it.
The message set is displayed to the console and the output shows the schema information data.
986
tFileCompare
987
tFileCopy
tFileCopy
Copies a source file or folder into a target directory.
Basic settings
Copy a directory Select this check box to copy a directory including all su
bdirectories and files in it.
Destination directory Specify the directory to copy the source file or directory to.
Rename Select this check box if you want to rename the file copied
to the destination.
This field does not appear when the Copy a directory check
box is selected.
Remove source file Select this check box to remove the source file after it is
copied to the destination directory.
This field does not appear when the Copy a directory check
box is selected.
Replace existing file Select this check box to overwrite any existing file with the
newly copied file.
988
tFileCopy
This field does not appear when the Copy a directory check
box is selected.
Create the directory if it doesn't exist Select this check box to create the specified destination
directory if it does not exist.
This field does not appear when the Copy a directory check
box is selected.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
989
tFileCopy
Procedure
Procedure
1. Create a new Job and add a tFileList component and a tFileCopy component by typing their names
in the design workspace or dropping them from the Palette.
2. Connect tFileList to tFileCopy using a Row > Iterate link.
3. Double-click tFileList to open its Basic settings view.
990
tFileCopy
6. In the File Name field, press Ctrl+Space to access the global variable list and select the
tFileList_1.CURRENT_FILEPATH variable from the list to fill the field with ((String)globalMap.get("tFil
eList_1_CURRENT_FILEPATH")).
7. In the Destination directory field, browse to or type in the directory to copy each file to.
8. Select the Remove source file check box to get rid of the files that have been copied.
9. Press Ctrl+S to save your Job and press F6 to execute it.
All the files in the defined source directory are copied to the destination directory and are
removed from the source directory.
991
tFileDelete
tFileDelete
Deletes files from a given directory.
Basic settings
File Name Path to the file to be deleted. This field is hidden when
you select the Delete folder check box or the Delete file or
folder check box.
File or directory to delete Enter the path to the file or to the folder you want to
delete. This field is available only when you select the
Delete file or folder check box.
Fail on error Select this check box to prevent the main Job from being
executed if an error occurs, for example, if the file to be
deleted does not exist.
Delete Folder Select this check box to display the Directory field, where
you can indicate the path the folder to be deleted.
Delete file or folder Select this check box to display the File or directory to de
lete field, where you can indicate the path to the file or to
the folder you want to delete.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Global Variables DELETE_PATH: the path to the deleted file or folder. This is
an After variable and it returns a string.
992
tFileDelete
Usage
Deleting files
This very simple scenario describes a Job deleting files from a given directory.
Procedure
Procedure
1. Drop the following components: tFileList, tFileDelete, tJava from the Palette to the design
workspace.
2. In the tFileList Basic settings, set the directory to loop on in the Directory field.
993
tFileDelete
5. press Ctrl+Space bar to access the list of global variables. In Java, the relevant variable to collect
the current file is: ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")).
6. Then in the tJava component, define the message to be displayed in the standard output (Run
console). In this Java use case, type in the Code field, the following script: System.out.println( ((S
tring)globalMap.get("tFileList_1_CURRENT_FILE"))
+ " has been deleted!" );
7. Then save your Job and press F6 to run it.
Results
The message set in the tJava component displays in the log, for each file that has been deleted
through the tFileDelete component.
994
tFileExist
tFileExist
Checks if a file exists or not.
Basic settings
File name/Stream Path to the file you want to check if it exists or not.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Global Variables EXISTS: the result of whether a specified file exists. This is a
Flow variable and it returns a boolean.
FILENAME: the name of the file processed. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
995
tFileExist
996
tFileExist
2. In the File name field, enter the file path or browse to the file you want to check if it exists or not.
3. In the design workspace, select tFileInputDelimited and click the Component tab to define its
basic settings.
4. Browse to the input file you want to read to fill out the File Name field.
Warning:
If the path of the file contains some accented characters, you will get an error message when
executing your Job.
The schema in file2 consists of five columns: Num, Ref, Price, Quant, and tax.
8. In the design workspace, select the tFileOutputDelimited component.
9. Click the Component tab to define the basic settings of tFileOutputDelimited.
997
tFileExist
17. Click the If link to display its properties in the Basic settings view.
18. In the Condition panel, press Ctrl+Space to access the variable list and select the global variable
EXISTS. Type an exclamation mark before the variable to negate the meaning of the variable.
998
tFileExist
999
tFileFetch
tFileFetch
Retrieves a file through the given protocol (HTTP, HTTPS, FTP, or SMB).
Basic settings
Protocol Select the protocol you want to use from the list and fill in
the corresponding fields: http, https, ftp, smb.
The properties differ slightly depending on the type of
protocol selected. The additional fields are defined in this
table, after the basic settings.
URI Type in the URI of the site from which the file is to be
fetched.
Use cache to save resource Select this check box to save the data in the cache.
This option allows you to process the file data flow (in
streaming mode) without saving it on your drive. This is
faster and improves performance.
Username and Password Enter the authentication information required to access the
server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Available for the smb protocol.
Destination Directory Browse to the destination folder where the file fetched is to
be placed.
1000
tFileFetch
Create full path according to URI It allows you to reproduce the URI directory path. To save
the file at the root of your destination directory, clear the
check box.
Available for the http, https and ftp protocols.
Add header Select this check box if you want to add one or more HTTP
request headers as fetch conditions. In the Headers table,
enter the name(s) of the HTTP header parameter(s) in the
Name field and the corresponding value(s) in the Value
field.
Available for the http and https protocols.
POST method This check box is selected by default. It allows you to use
the POST method. In the Parameters table, enter the name
of the variable(s) in the Name field and the corresponding
value in the Value field.
Clear the check box if you want to use the GET method.
Available for the http and https protocols.
Die on error Clear this check box to skip the rows in error and to
complete the process for the error free rows
Available for the http, https and ftp protocols.
Read Cookie Select this check box for tFileFetch to load a web
authentication cookie.
Available for the http, https, ftp and smb protocols.
Save Cookie Select this check box to save the web page authentication
cookie. This means you will not have to log on to the same
web site in the future.
Available for the http, https, ftp and smb protocols.
Cookie file Type in the full path to the file which you want to use to
save the cookie or click [...] and browse to the desired file to
save the cookie.
Available for the http, https, ftp and smb protocols.
Cookie policy Choose a cookie policy from this drop-down list. Four
options are available, BROWSER_COMPATIBILITY, DEFAULT,
NETSCAPE and RFC_2109.
Available for the http, https, ftp and smb protocols.
Single cookie header Check this box to put all cookies into one request header for
maximum compatibility among different servers.
Available for the http, https, ftp and smb protocols.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at each
component level.
1001
tFileFetch
Print response to console Select this check box to print the server response in the
console.
Available for the http and https protocols.
Upload file Select this check box to upload one or more files to the
server. For each file to be uploaded, click the [+] button
beneath the table displayed and set the following fields:
• Name: the value of the name attribute of the <input
type="file"> field in the original HTML form.
• File: the full path of the file to upload, e.g. "D:/
filefetch.txt".
• Content-Type: the content type of the file to upload.
The default value is "application/octet-
stream".
• Charset: the character set of the file to upload. The
default value is "ISO-8859-1".
Thhis option is available for the http and https protocols,
with the POST method option in the Basic settings view
selected.
With this option selected, the upload response will be saved
in the file specified in the Destination filename field in the
Basic settings view.
Enable proxy server Select this check box if you are connecting via a proxy
and complete the fields which follow with the relevant
information.
Available for the http, https and ftp protocols.
Enable NTLM Credentials Select this check box if you are using an NTLM
authentication protocol.
Domain: The client domain name.
Host: The client's IP address.
Available for the http and https protocols.
Need authentication Select this check box and enter the username and password
in the relevant fields, if they are required to access the
protocol.
Available for the http and https protocols.
Support redirection Select this check box to repeat the redirection request until
redirection is successful and the file can be retrieved.
Available for the http, https and ftp protocols.
Global Variables
1002
tFileFetch
Usage
1003
tFileFetch
2. Select the protocol you want to use from the list. Here, http is selected.
3. In the URI field, type in the URI where the file to be fetched can be retrieved from. You can paste
the URI directly in your browser to view the data in the file.
4. In the Destination directory field, browse to the folder where the fetched file is to be stored. In
this example, it is D:/Output.
5. In the Destination filename field, type in a new name for the file if you want it to be changed. In
this example, new.txt.
6. If needed, select the Add header check box and define one or more HTTP request headers as fetch
conditions. For example, to fetch the file only if it has been modified since 19:43:31 GMT, October
29, 1994, fill in the Name and Value fields with "If-Modified-Since" and "Sat, 29 Oct 1994 19:43:31
GMT" respectively in the Headers table. For details about HTTP request header definitions, see
Header Field Definitions.
7. Double-click tFileInputDelimited to open its Basic settings view.
8. In the File name field, type in the full path to the fetched file which had been stored locally.
1004
tFileFetch
9. Click the [...] button next to Edit schema to open the Schema dialog box. In
this example, add one column output to store the data from the fetched file.
1005
tFileFetch
Procedure
1. Double click tFileFetch_1 to open its component view.
1006
tFileFetch
2. Select the protocol you want to use from the Protocol list. Here, we use the https protocol.
3. In the URI field, type in the URI through which you can log in the website and fetch the web page
accordingly. In this example, the URI is https://www.codeproject.com/script/Members
hip/LogOn.aspx?download=true.
4. In the Destination directory field, browse to the folder where the fetched web page is to be stored.
This folder will be created on the fly if it does not exist. In this example, type in D:/download.
5. In the Destination Filename field, type in a new name for the file if you want it to be changed. In
this example, codeproject.html.
6. Under the Parameters table, click the plus button to add two rows and fill in the credentials for
accessing the desired website..
In the Name column, type in a new name respectively for the two rows. In this example, they are
Email and Password, which are required by the website you are logging in.
In the Value column, type in the authentication information.
7. Select the Save cookie check box.
8. In the Cookie file field, type in the full path to the file which you want to use to save the cookie.
In this example, it is D:/download/cookie.
9. Click Advanced settings to open its view.
10. Select the Support redirection check box so that the redirection request will be repeated until the
redirection is successful.
Procedure
1. Double-click tFileFetch_2 to open its Component view.
1007
tFileFetch
1008
tFileFetch
Related scenario
For an example of transferring data in streaming mode, see Reading data from a remote file in
streaming mode on page 1020
1009
tFileInputARFF
tFileInputARFF
Reads an ARFF file row by row to split them up into fields and then sends the fields as defined in the
schema to the next component.
Basic settings
File Name Name and path of the ARFF file and/or variable to be
processed.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
1010
tFileInputARFF
Advanced settings
Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
Usage
Usage rule Use this component to read a file and separate the fields
with the specified separator.
1011
tFileInputARFF
It is generally made of two parts. The first part describes the data structure, that is to say the rows
which begin by @attribute and the second part comprises the raw data, which follows the
expression @data.
1012
tFileInputARFF
5. Click on the button as many times as required to create the number of columns required,
according to the source file. Name the columns as follows.
6. For every column, the Nullable check box is selected by default. Leave the check boxes selected,
for all of the columns.
7. Click OK.
8. In the workspace, double-click the tLogRow to display its Component view.
1013
tFileInputARFF
9. Click the [...] button next to Edit schema to check that the schema has been propagated. If not,
click the Sync columns button.
The console displays the data contained in the ARFF file, delimited using a vertical line (the
default separator).
1014
tFileInputDelimited
tFileInputDelimited
Reads a delimited file row by row to split them up into fields and then sends the fields as defined in
the schema to the next component.
Basic settings
File Name/Stream File name: Name and path of the file to be processed.
Stream: The data flow to be processed. The data must be
added to the flow in order for tFileInputDelimited to fetch
these data via the corresponding representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using
along with this component; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.
Related topic to the available variables: see Talend Studio
User Guide
CSV options Select this check box to specify the following CSV
parameters:
• Escape char: enter the escape character between
double quotation marks.
• Text enclosure: enter the enclosure character (only
one character) between double quotation marks.
For example, """ needs to be entered when double
quotation marks (") are used as the enclosure character.
1015
tFileInputDelimited
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Note that if the input value of any non-nullable primitive
field is null, the row of data including that field will be
rejected.
Built-In: You create and store the schema locally for this
component only.
Skip empty rows Select this check box to skip the empty rows.
Uncompress as zip file Select this check box to uncompress the input file.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
1016
tFileInputDelimited
Advanced settings
Advanced separator (for numbers) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Extract lines at random Select this check box to set the number of lines to be
extracted randomly.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
Trim all column Select this check box to remove the leading and trailing
whitespaces from all columns. When this check box is
cleared, the Check column to trim table is displayed, which
lets you select particular columns to trim.
Check each row structure against schema Select this check box to check whether the total number
of columns in each row is consistent with the schema. If
not consistent, an error message will be displayed on the
console.
Check date Select this check box to check the date format strictly
against the input schema.
Check columns to trim This table is filled automatically with the schema being
used. Select the check box(es) corresponding to the
column(s) to be trimmed.
Split row before field Select this check box to split rows before splitting fields.
Permit hexadecimal (0xNNN) or octal (0NNNN) for numeric Select this check box if any of your numeric types
types - it will act the opposite for Byte (long, integer, short, or byte type), will be parsed from a
hexadecimal or octal string.
In the table that appears, select the check box next to the
column or columns of interest to transform the input string
of each selected column to the type defined in the schema.
Select the Permit hexadecimal or octal check box to select
all the columns.
This table appears only when the Permit hexadecimal
(0xNNN) or octal (0NNNN) for numeric types - it will act the
opposite for Byte check box is selected.
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
1017
tFileInputDelimited
Usage
Usage rule Use this component to read a file and separate fields
contained in this file using a defined separator. It allows
you to create a data flow using a Row > Main link or via a
Row > Reject link in which case the data is filtered by data
that does not correspond to the type defined. For further
information, please see Procedure on page 975.
1018
tFileInputDelimited
2. Fill in a path to the file in the File Name field. This field is mandatory.
Warning:
If the path of the file contains some accented characters, you will get an error message when
executing your Job.
3. Define the Row separator allowing to identify the end of a row. Then define the Field separator
used to delimit fields in a row.
4. In this scenario, the header and footer limits are not set. And the Limit number of processed rows
is set on 50.
5. Set the Schema as either a local (Built-in) or a remotely managed (Repository) to define the data
to pass on to the tLogRow component.
6. You can load and/or edit the schema via the Edit Schema function.
Related topics: see Talend Studio User Guide.
7. Enter the encoding standard the input file is encoded in. This setting is meant to ensure encoding
consistency throughout all input and output files.
8. Select the tLogRow and define the Field separator to use for the output display. Related topic:
tLogRow on page 1977.
9. Select the Print schema column name in front of each value check box to retrieve the column
labels in the output displayed.
1019
tFileInputDelimited
The Log sums up all parameters in a header followed by the result of the Job.
1020
tFileInputDelimited
2. From the Protocol list, select the appropriate protocol to access the server on which your data is
stored.
3. In the URI field, enter the URI required to access the server on which your file is stored.
4. Select the Use cache to save the resource check box to add your file data to the cache memory.
This option allows you to use the streaming mode to transfer the data.
5. In the workspace, click tSleep to display the Basic settings tab in the Component view and set the
properties.
By default, tSleep's Pause field is set to 1 second. Do not change this setting. It pauses the second
Job in order to give the first Job, containing tFileFetch, the time to read the file data.
6. In the workspace, double-click tFileInputDelimited to display its Basic settings tab in the
Component view and set the properties.
1021
tFileInputDelimited
8. From the Schema list, select Built-in and click [...] next to the Edit schema field to describe the
structure of the file that you want to fetch. The US_Employees file is composed of six columns: ID,
Employee, Age, Address, State, EntryDate.
Click [+] to add the six columns and set them as indicated in the above screenshot. Click OK.
9. In the workspace, double-click tLogRow to display its Basic settings in the Component view and
click Sync Columns to ensure that the schema structure is properly retrieved from the preceding
component.
2. Select the Multi thread execution check box in order to run the two Jobs at the same time. Bear
in mind that the second Job has a one second delay according to the properties set in tSleep.
This option allows you to fetch the data almost as soon as it is read by tFileFetch, thanks to the
tFileDelimited component.
3. Save the Job and press F6 to run it.
1022
tFileInputDelimited
1023
tFileInputExcel
tFileInputExcel
Reads an Excel file row by row to split them up into fields using regular expressions and then sends
the fields as defined in the schema to the next component.
Basic settings
Read excel2007 file format (xlsx / xlsm) Select this check box to read the .xlsx or .xlsm file of Excel
2007.
File Name/Stream File name: Name of the file and/or the variable to be
processed.
Stream: Data flow to be processed. The data must be added
to the flow in order to be collected by tFileInputExcel via
the INPUT_STREAM variable in the auto-completion list
(Ctrl+Space).
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
Password Provide the password set for the Excel file in double
quotation marks by clicking the three-dot button to the
right of this frame.
This field is for Excel 2007 (and higher versions) files
protected by passwords and is available when Read
excel2007 file format(xlsx) is selected.
This component supports standard encryption and agile
encryption.
All sheets Select this check box to process all sheets of the Excel file.
1024
tFileInputExcel
Sheet list Click the plus button to add as many lines as needed to the
list of the excel sheets to be processed:
Sheet (name or position): enter the name or position of the
excel sheet to be processed.
Use Regex: select this check box if you want to use a regular
expression to filter the sheets to process.
Affect each sheet(header&footer) Select this check box if you want to apply the parameters
set in the Header and Footer fields to all excel sheets to be
processed.
Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.
First column and Last column Define the range of the columns to be processed through
setting the first and last columns in the First column and
Last column fields respectively.
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
1025
tFileInputExcel
Advanced settings
Advanced separator Select this check box to change the used data separators.
Trim all columns Select this check box to remove the leading and trailing
whitespaces from all columns. When this check box is
cleared, the Check column to trim table is displayed, which
lets you select particular columns to trim.
Check column to trim This table is filled automatically with the schema being
used. Select the check box(es) corresponding to the
column(s) to be trimmed.
Convert date column to string Available when Read excel2007 file format (xlsx) is
selected in the Basic settings view.
Select this check box to show the table Check need convert
date column. Here you can parse the string columns that
contain date values based on the given date pattern.
Column: all the columns available in the schema of the
source .xlsx file.
Convert: select this check box to choose all the columns for
conversion (only if they are all of the string type). You can
also select the individual check box next to each column for
conversion.
Date pattern: set the date format here.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
Read real values for numbers Select this check box to read numbers in real values. This
check box becomes unavailable when you select Read
excel2007 file format (xlsx) in the Basic settings view.
Stop reading on encountering empty rows Select this check box to ignore the empty line encountered
and, if there are any, the lines that follow this empty line.
This check box becomes unavailable when you select Read
excel2007 file format (xlsx) in the Basic settings view.
1026
tFileInputExcel
Don't validate the cells Select this check box to in order not to validate data. This
check box becomes unavailable when you select Read
excel2007 file format (xlsx) in the Basic settings view.
Ignore the warning Select this check box to ignore all warnings generated to
indicate errors in the Excel file. This check box becomes
unavailable when you select Read excel2007 file format
(xlsx) in the Basic settings view.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Usage rule Use this component to read an Excel file and to output the
data separately depending on the schemas identified in the
file. You can use a Row > Reject link to filter the data which
doesn't correspond to the type defined. For an example of
how to use these two links, see Procedure on page 975.
Related scenarios
No scenario is available for the Standard version of this component yet.
1027
tFileInputFullRow
tFileInputFullRow
Reads a file row by row and sends complete rows of data as defined in the schema to the next
component via a Row link.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
1028
tFileInputFullRow
Skip empty rows Select this check box to skip the empty rows.
Advanced settings
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
Extract lines at random Select this check box to set the number of lines to be
extracted randomly.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Usage rule Use this component to read full rows in delimited files that
can get very large.
1029
tFileInputFullRow
5;California
6;Colorado
7;Connecticut
8;Delaware
9;Florida
10;Georgia
3. Double-click the tFileInputFullRow component to open its Basic settings view on the Component
tab.
4. Click the [...] button next to Edit schema to view the data to be passed onto the tLogRow
component. Note that the schema is read-only and it consists of only one column line.
5. In the File Name field, browse to or enter the path to the file to be processed. In this scenario, it is
E:/states.csv.
6. In the Row Separator field, enter the separator used to identify the end of a row. In this example,
it is the default value \n.
1030
tFileInputFullRow
7. In the Header field, enter 1 to skip the header row at the beginning of the file.
8. Double-click the tLogRow component to open its Basic settings view on the Component tab.
In the Mode area, select Table (print values in cells of a table) for better readability of the result.
9. Press Ctrl+S to save your Job and then F6 to execute it.
As shown above, ten rows of data in the delimited file states.csv are read one by one, ignoring
field separators, and the complete rows of data are displayed on the console.
To extract fields from rows, you must use tExtractDelimitedFields, tExtractPositionalFields,
or tExtractRegexFields. For more information, see tExtractDelimitedFields on page 937,
tExtractPositionalFields on page 963 and tExtractRegexFields on page 966.
1031
tFileInputJSON
tFileInputJSON
Extracts JSON data from a file and transfers the data to a file, a database table, etc.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.
1032
tFileInputJSON
Use Url Select this check box to retrieve data directly from the Web.
URL Enter the URL path from which you will retrieve data.
This field is available only when the Use Url check box is
selected.
Filename Specify the file from which you will retrieve data.
This field is not visible if the Use Url check box is selected.
Loop Jsonpath query Enter the path pointing to the node within the JSON field,
on which the loop is based.
Note if you have selected Xpath from the Read by drop-
down list, the Loop Xpath query field is displayed instead.
Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.
Advanced settings
Advanced separator (for numbers) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Use the loop node as root Select this check box to use the loop node as the root for
querying the file.
The loop node is set in the Loop Json query text frame in
the Basic Settings view. If this option is checked, only the
child elements of the loop node are available for querying;
1033
tFileInputJSON
Validate date Select this check box to check the date format strictly
against the input schema.
This check box is available only if the Read By XPath check
box is selected.
Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
1034
tFileInputJSON
The JSON file Store.json contains information about a department store and the content of the file is
as follows:
{"store": {
"name": "Sunshine Department Store",
"address": "Wangfujing Street",
"goods": {
"book": [
{
"category": "Reference",
"title": "Sayings of the Century",
"author": "Nigel Rees",
"price": 8.88
},
{
"category": "Fiction",
"title": "Sword of Honour",
"author": "Evelyn Waugh",
"price": 12.66
}
],
"bicycle": {
"type": "GIANT OCR2600",
"color": "White",
"price": 276
}
}
}}
In the following example, we will extract the store name, the store address, and the bicycle
information from this file.
1035
tFileInputJSON
2. Select JsonPath without loop from the Read By drop-down list. With this option, you need to
specify the complete JSON path for each node of interest in the JSONPath query fields of the
Mapping table.
3. Click the [...] button next to Edit schema to open the schema editor.
4. Click the [+] button to add five columns, store_name, store_address, bicycle_type, and bicycle_color
of String type, and bicycle_price of Double type.
Click OK to close the schema editor. In the pop-up dialog box, click Yes to propagate the schema
to the subsequent component.
5. In the Filename field, specify the path to the JSON file that contains the data to be extracted. In
this example, it is "E:/Store.json".
6. In the Mapping table, the Column fields are automatically filled with the schema columns you
have defined.
In the JSONPath query fields, enter the JSONPath query expressions between double quotation
marks to specify the nodes that hold the desired data.
1036
tFileInputJSON
• For the columns store_name and store_address, enter the JSONPath query expressions
"$.store.name" and "$.store.address" relative to the nodes name and address respectively.
• For the columns bicycle_type, bicycle_color, and bicycle_price, enter the JSONPath query
expressions "$.store.goods.bicycle.type", "$.store.goods.bicycle.color", and "$.store.goods
.bicycle.price" relative to the child nodes type, color, and price of the bicycle node respectively.
7. Double-click the tLogRow component to display its Basic settings view.
8. In the Mode area, select Table (print values in cells of a table) for better readability of the result.
As shown above, the store name, the store address, and the bicycle information are extracted
from the source JSON data and displayed in a flat table on the console.
Procedure
Procedure
1. In the Studio, open the Job used in Extracting JSON data from a file using JSONPath without
setting a loop node on page 1034 to display it in the design workspace.
1037
tFileInputJSON
Select the five columns added previously and click the x button to remove all of them.
Click the [+] button to add four columns, book_title, book_category, and book_author of String type,
and book_price of Double type.
Click OK to close the schema editor. In the pop-up dialog box, click Yes to propagate the schema
to the subsequent component.
6. In the Json query fields of the Mapping table, enter the JSONPath query expressions between
double quotation marks to specify the nodes that hold the desired data. In this example, enter the
JSONPath query expressions "title", "category", "author", and "price" relative to the four child nodes
of the book node respectively.
7. Press Ctrl+S to save the Job.
1038
tFileInputJSON
As shown above, the book information is extracted from the source JSON data and displayed in a
flat table on the console.
Procedure
Procedure
1. In the Studio, open the Job used in Extracting JSON data from a file using JSONPath without
setting a loop node on page 1034 to display it in the design workspace.
2. Double-click the tFileInputJSON component to open its Basic settings view.
1039
tFileInputJSON
Select the five columns added previously and click the x button to remove all of them.
Click the [+] button to add five columns, store_name, book_title, book_category, and book_author of
String type, and book_price of Double type.
Click OK to close the schema editor. In the pop-up dialog box, click Yes to propagate the schema
to the subsequent component.
5. In the Loop XPath query field, enter the XPath query expression between double quotation marks
to specify the node on which the loop is based. In this example, it is "/store/goods/book".
6. In the XPath query fields of the Mapping table, enter the XPath query expressions between do
uble quotation marks to specify the nodes that hold the desired data.
• For the column store_name, enter the XPath query "../../name" relative to the name node.
• For the columns book_title, book_category, book_author, and book_price, enter the XPath query
expressions "title", "category", "author", and "price" relative to the four child nodes of the book
node respectively.
7. Press Ctrl+S to save the Job.
8. Press F6 to execute the Job.
As shown above, the store name and the book information are extracted from the source JSON
data and displayed in a flat table on the console.
1040
tFileInputJSON
The JSON file facebook.json is deployed on the Tomcat server, specifically, located in the folder
<tomcat path>/webapps/docs, and the content of the file is as follows:
{"user": {
"id": "9999912398",
"name": "Kelly Clarkson",
"friends": [
{
"name": "Tom Cruise",
"id": "55555555555555",
"likes": {"data": [
{
"category": "Movie",
"name": "The Shawshank Redemption",
"id": "103636093053996",
"created_time": "2012-11-20T15:52:07+0000"
},
{
"category": "Community",
"name": "Positiveretribution",
"id": "471389562899413",
"created_time": "2012-12-16T21:13:26+0000"
}
]}
},
{
"name": "Tom Hanks",
"id": "88888888888888",
"likes": {"data": [
{
"category": "Journalist",
"name": "Janelle Wang",
"id": "136009823148851",
"created_time": "2013-01-01T08:22:17+0000"
},
{
"category": "Tv show",
"name": "Now With Alex Wagner",
"id": "305948749433410",
"created_time": "2012-11-20T06:14:10+0000"
}
]}
}
]
}}
1041
tFileInputJSON
4. Link the tExtractJSONFields component to the second tLogRow component using a Row > Main
connection.
2. Select JsonPath without loop from the Read By drop-down list. Then select the Use Url check box
and in the URL field displayed enter the URL of the file facebook.json from which the data
will be retrieved. In this example, it is http://localhost:8080/docs/facebook.json.
3. Click the [...] button next to Edit schema and in the Schema dialog box define the schema by
adding one column friends of String type.
Click OK to close the dialog box and accept the propogation prompted by the pop-up dialog box.
4. In the Mapping table, enter the JSONPath query "$.user.friends[*]" next to the friends
column to retrieve the entire friends node from the source file.
5. Double-click tExtractJSONFields to open its Basic settings view.
1042
tFileInputJSON
Click OK to close the dialog box and accept the propogation prompted by the pop-up dialog box.
9. In the XPath query fields of the Mapping table, type in the XPath query expressions between
double quotation marks to specify the JSON nodes that hold the desired data. In this example:
• "../../id" (querying the "/friends/id" node) for the column id,
• "../../name" (querying the "/friends/name" node) for the column name,
• "id" for the column like_id,
• "name" for the column like_name, and
• "category" for the column like_category.
10. Double-click the second tLogRow component to open its Basic settings view.
1043
tFileInputJSON
In the Mode area, select Table (print values in cells of a table) for better readability of the result.
As shown above, the friends data in the JSON file specified using the URL is extracted and then
the data from the node friends is extracted and displayed in a flat table.
1044
tFileInputLDIF
tFileInputLDIF
Reads an LDIF file row by row to split them up into fields and sends the fields as defined in the
schema to the next component using a Row connection.
Basic settings
add operation as prefix when the entry is modify type Select this check box to display the operation mode.
Value separator Type in the separator required for parsing data in the given
file. By default, the separator used is ",".
Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.
Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
1045
tFileInputLDIF
Advanced settings
Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.
Use field options (for Base64 decode checked) Select this check box to specify the Base64-encoded
columns of the input flow. Once selected, this check box
activates the Decode Base64 encoding values table to ena
ble you to precise the columns to be decoded from Base64.
Note:
The data type of the columns to be handled by this check
box is byte that you define in the input schema editor.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Usage rule Use this component to read full rows in a voluminous LDIF
file. This component enables you to create a data flow,
using a Row > Main link, and to create a reject flow with
a Row > Reject link filtering the data which type does
not match the defined type. For an example of usage, see
Procedure on page 1096 from tFileInputXML.
1046
tFileInputLDIF
Related scenario
For a related scenario, see Writing data from a database table into an LDIF file on page 1133.
1047
tFileInputMail
tFileInputMail
Reads the standard key data of a given MIME or MSG email file.
Basic settings
File Name Specify the email file to read and extract data from.
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Mail type Select a type of email from the drop-down list, either MIME
or MSG.
Attachment export directory Specify the directory to which you want to export email
attachments.
Mail parts Specify the header fields to extract from the MIME email file
specified in the File Name field.
• Column: The Column cells are automatically filled with
the column names defined in the schema.
1048
tFileInputMail
MSG Mail parts Specify what to extract from the defined MSG email file for
each schema column.
• Column: The Column cells are automatically filled with
the column name defined in the schema.
• Mail part: Click each cell and then select an email part
to be extracted.
This table appears only when MSG is selected from the Mail
type drop-down list.
Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
1049
tFileInputMail
Procedure
Procedure
1. Drop a tFileInputMail and a tLogRow component from the Palette to the design workspace.
2. Connect the two components together using a Main Row link.
3. Double-click tFileInputMail to display its Basic settings view and define the component
properties.
4. Click the three-dot button next to the File Name field and browse to the mail file to be processed.
5. Set schema type to Built-in and click the three-dot button next to Edit schema to open a dialog
box where you can define the schema including all columns you want to retrieve on your output.
6. Click the plus button in the dialog box to add as many columns as you want to include in the
output flow. In this example, the schema has four columns: Date, Author, Object and Status.
7. Once the schema is defined, click OK to close the dialog box and propagate the schema into the
Mail parts table.
8. Click the three-dot button next to Attachment export directory and browse to the directory in
which you want to export email attachments, if any.
9. In the Mail part column of the Mail parts table, type in the actual header or body standard keys
that will be used to retrieve the values to be displayed.
10. Select the Multi Value check box next to any of the standard keys if more than one value for the
relative standard key is present in the input file.
1050
tFileInputMail
11. If needed, define a separator for the different values of the relative standard key in the Separator
field.
12. Double-click tLogRow to display its Basic settings view and define the component properties in
order for the values to be separated by a carriage return. On Windows OS, type in \n between
double quotes.
13. Save your Job and press F6 to execute it and display the output flow on the console.
Results
The header key values are extracted as defined in the Mail parts table. Mail reception date, author,
subject and status are displayed on the console.
1051
tFileInputMSDelimited
tFileInputMSDelimited
Reads the data structures (schemas) of a multi-structured delimited file and sends the fields as
defined in the different schemas to the next components using Row connections.
Basic settings
Multi Schema Editor The Multi Schema Editor helps to build and configure the
data flow in a multi-structure delimited file to associate
one schema per output.
For more information, see The Multi Schema Editor on page
1053.
Output Lists all the schemas you define in the Multi Schema
Editor, along with the related record type and the field
separator that corresponds to every schema, if different field
separators are used.
Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows.
Advanced settings
Trim all column Select this check box to remove leading and trailing
whitespaces from defined columns.
Validate date Select this check box to check the date format strictly
against the input schema.
Advanced separator (for numbers) Select this check box to modify the separators used for nu
mbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
1052
tFileInputMSDelimited
Usage
Warning: Use absolute path (instead of relative path) for this field to avoid possible errors.
• define the source file properties,
• define data structure for each of the output schemas.
When you define data structure for each of the output schemas in the Multi Schema Editor, column
names in the different data structures automatically appear in the input schema lists of the
components that come after tFileInputMSDelimited. However, you can still define data structures
directly in the Basic settings view of each of these components.
The Multi Schema Editor also helps to declare the schema that should act as the source schema
(primary key) from the incoming data to insure its unicity.The editor uses this mapping to associate all
schemas processed in the delimited file to the source schema in the same file.
The editor opens with the first column, that usually holds the record type indicator, selected by
default. However, once the editor is open, you can select the check box of any of the schema columns
to define it as a primary key.
The below figure illustrates an example of the Multi Schema Editor.
1053
tFileInputMSDelimited
For detailed information about the usage of the Multi Schema Editor, see Reading a multi structure
delimited file on page 1054.
1054
tFileInputMSDelimited
1055
tFileInputMSDelimited
2. Click Browse... next to the File name field to locate the multi schema delimited file you need to
process.
3. In the File Settings area:
-Select from the list the encoding type the source file is encoded in. This setting is meant to
ensure encoding consistency throughout all input and output files.
-Select the field and row separators used in the source file.
Note:
Select the Use Multiple Separator check box and define the fields that follow accordingly if
different field separators are used to separate schemas in the source file.
A preview of the source file data displays automatically in the Preview panel.
1056
tFileInputMSDelimited
Note:
Column 0 that usually holds the record type indicator is selected by default. However, you can
select the check box of any of the other columns to define it as a primary key.
4. Click Fetch Codes to the right of the Preview panel to list the type of schema and records you
have in the source file. In this scenario, the source file has three schema types (A, B, C).
Click each schema type in the Fetch Codes panel to display its data structure below the Preview
panel.
5. Click in the name cells and set column names for each of the selected schema.
In this scenario, column names read as the following:
-Schema A: Type, DiscName, Author, Date,
-Schema B: Type, SongName,
1057
tFileInputMSDelimited
8. Click anywhere in the editor and the false in the Key cell will become true.
You need now to declare the parent schema by which you want to group the other "children"
schemas (DiscName in this scenario). To do that:
9. In the Fetch Codes panel, select schema B and click the right arrow button to move it to the right.
Then, do the same with schema C.
Note:
The Cardinality field is not compulsory. It helps you to define the number (or range) of fields
in "children" schemas attached to the parent schema. However, if you set the wrong number or
range and try to execute the Job, an error message will display.
10. In the Multi Schema Editor, click OK to validate all the changes you did and close the editor.
The three defined schemas along with the corresponding record types and field separators display
automatically in the Basic settings view of tFileInputMSDelimited.
1058
tFileInputMSDelimited
The three schemas you defined in the Multi Schema Editor are automatically passed to the three
tLogRow components.
11. If needed, click the Edit schema button in the Basic settings view of each of the tLogRow
components to view the input and output data structures you defined in the Multi Schema Editor
or to modify them.
1059
tFileInputMSDelimited
1060
tFileInputMSPositional
tFileInputMSPositional
Reads the data structures (schemas) of a multi-structured positional file and sends the fields as
defined in the different schemas to the next components using Row connections.
Basic settings
Skip from footer Number of rows to be skipped at the end of the file.
1061
tFileInputMSPositional
Die on parse error Let the component die if an parsing error occurs.
Die on unknown header type Length values separated by commas, interpreted as a string
between quotes. Make sure the values entered in this fields
are consistent with the schema defined.
Advanced settings
Process long rows (needed for processing rows longer than Select this check box to process long rows (this is necessary
100,000 characters wide) to process rows longer than 100 000 characters).
Advanced separator (for numbers) Select this check box to modify the separators used for nu
mbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Trim all column Select this check box to remove leading and trailing
whitespaces from defined columns.
Validate date Select this check box to check the date format strictly
against the input schema.
Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
1062
tFileInputMSPositional
Usage
Usage rule Use this component to read a multi schemas positional file
and separate fields using a position separator value. You
can also create a rejection flow using a Row > Reject link
to filter the data which does not correspond to the type
defined. For an example of how to use these two links, see
Procedure on page 975.
1063
tFileInputMSPositional
2. In the File name/Stream field, type in the path to the input file. Also, you can click the [...] button
to browse and choose the file.
3. In the Header Field Position field, enter the start-end position for the schema identifier in the
input file, 0-1 in this case as the first character in each row is the schema identifier.
4. Click the [+] button twice to added two rows in the Records table.
5. Click the cell under the Schema column to show the [...] button.
Click the [...] button to show the schema naming box.
1064
tFileInputMSPositional
7. Define the schema car_owner, which has four columns: schema_id, car_make, owner and age.
8. Repeat the steps to define the schema car_insurance, which has four columns: schema_id,
car_owner, age and car_insurance.
9. Connect tFileInputMSPositional to the car_owner component with the Row > car_owner link, and
the car_insurance component with the Row > car_insurance link.
10. In the Header value column, type in the schema identifier value for the schema, 1 for the schema
car_owner and 2 for the schema car_insurance in this case.
11. In the Pattern column, type in the length of each field in the schema, the number of characters,
number, etc in each field, 1,8,10,3 for the schema car_owner and 1,10,3,3 for the schema
car_insurance in this case.
12. In the Skip from header field, type in the number of beginning rows to skip, 2 in this case as the
two rows in the beginning just describes the two schemas, instead of the values.
13. Choose Table (print values in cells of a table) in the Mode area of the components car_owner and
car_insurance.
1065
tFileInputMSPositional
The file is read row by row based on the length values defined in the Pattern field and output in
two tables with different schemas.
1066
tFileInputMSXML
tFileInputMSXML
Reads the data structures (schemas) of a multi-structured XML file and sends the fields as defined in
the different schemas to the next components using Row connections.
Basic settings
Root XPath query The root of the XML tree, which the query is based on.
Enable XPath in column "Schema XPath loop" but lose the o Select this check box if you want to define a XPath path in
rder the Schema XPath loop field of the Outputs table while not
keeping the order of the data shown in the source XML file.
Warning:
This options takes effect only if you select the Dom4j
generation mode in the Advanced settings view.
Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows.
Advanced settings
Trim all column Select this check box to remove leading and trailing
whitespaces from defined columns.
Validate date Select this check box to check the date format strictly
against the input schema.
1067
tFileInputMSXML
Ignore DTD file Select this check box to ignore the DTD file indicated in the
XML file being processed.
Note:
This option allows you to use dom4j to process the
XML files of high complexity.
Encoding Select the encoding type from the list or select CUSTOM
and define it manually. This field is compulsory for DB data
handling.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
1068
tFileInputMSXML
2. Browse to the XML file you want to process. In this example, it is D:/Input/multischema_xml.xml,
which contains the following data:
<root>
<toy>Cat</toy>
<record>We Belong Together</record>
<book>As You Like It</book>
<book>All's Well That Ends Well</book>
<record>When You Believe</record>
<toy>Dog</toy>
</root>
3. In the Root XPath query field, enter the root of the XML tree, which the query will be based on. In
this example, it is "/root".
4. Select the Enable XPath in column "Schema XPath loop" but lose the order check box.
In this example, to extract the desired fields, you need to define a XPath path in the Schema
XPath loop field in the Outputs table for each output flow while not keeping the order of the data
shown in the source XML file.
5. Click the plus button to add lines in the Outputs table where you can define the output schemas,
record and book in this example.
6. In the Outputs table, click in the Schema cell and then click a three-dot button to display a dialog
box where you can define the schema name.
Enter a name for the output schema and click OK to close the dialog box.
1069
tFileInputMSXML
1070
tFileInputMSXML
1071
tFileInputPositional
tFileInputPositional
Reads a positional file row by row to split them up into fields based on a given pattern and then sends
the fields as defined in the schema to the next component.
Basic settings
File name/Stream File name: Name and path of the file to be processed.
Use byte length as the cardinality Select this check box to enable the support of double-byte
character to this component. JDK 1.6 is required for this
feature.
Customize Select this check box to customize the data format of the
positional file and define the table columns:
Column: Select the column you want to customize.
Size: Enter the column size.
1072
tFileInputPositional
Pattern Units The unit of the length values specified in the Pattern field.
• Bytes: With this option selected, the length values in
the Pattern field should be the count of bytes that
represent symbols in original encoding of the input file.
• Symbols: With this option selected, the length values
in the Pattern field should be the count of regular
symbols, not including surrogate pairs.
• Symbols (including rare): With this option selected,
the length values in the Pattern field should be the
count of symbols, including rare symbols such as
surrogate pairs, and each surrogate pair counts as a
single symbol. Considering the performance factor, it is
not recommended to use this option when your input
data consists of only regular symbols.
Skip empty rows Select this check box to skip the empty rows.
Uncompress as zip file Select this check box to uncompress the input file.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
1073
tFileInputPositional
Advanced settings
Needed to process rows longer than 100 000 characters Select this check box if the rows to be processed in the
input file are longer than 100 000 characters.
Advanced separator (for numbers) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Trim all column Select this check box to remove leading and trailing
whitespaces from defined columns.
Validate date Select this check box to check the date format strictly
against the input schema.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
1074
tFileInputPositional
Usage
Usage rule Use this component to read a file and separate fields using
a position separator value. You can also create a rejection
flow using a Row > Reject link to filter the data which does
not correspond to the type defined. For an example of how
to use these two links, see Procedure on page 975.
Procedure
1. Drop a tFileInputPositional component from the Palette to the design workspace.
2. Drop a tFileOutputXML component as well. This file is meant to receive the references in a
structured way.
3. Right-click the tFileInputPositional component and select Row > Main. Then drag it onto the
tFileOutputXML component and release when the plug symbol shows up.
1075
tFileInputPositional
2. Define the Job Property type if needed. For this scenario, we use the built-in Property type.
As opposed to the Repository, this means that the Property type is set for this station only.
3. Fill in a path to the input file in the File Name field. This field is mandatory.
4. Define the Row separator identifying the end of a row if needed, by default, a carriage return.
5. If required, select the Use byte length as the cardinality check box to enable the support of
double-byte character.
6. Define the Pattern to delimit fields in a row. The pattern is a series of length values corresponding
to the values of your input files. The values should be entered between quotes, and separated by
a comma. Make sure the values you enter match the schema defined.
7. Fill in the Header, Footer and Limit fields according to your input file structure and your need. In
this scenario, we only need to skip the first row when reading the input file. To do this, fill the
Header field with 1 and leave the other fields as they are.
8. Next to Schema, select Repository if the input schema is stored in the Repository. In this use case,
we use a Built-In input schema to define the data to pass on to the tFileOutputXML component.
9. You can load and/or edit the schema via the Edit Schema function. For this schema, define three
columns, respectively Contract, CustomerRef and InsuranceNr matching the structure of the input fi
le. Then, click OK to close the Schema dialog box and propagate the changes.
1076
tFileInputPositional
6. Click the plus button to add a line in the Root tags table, and enter a root tag (or more) to wrap
the XML output structure, in this case ContractsList.
7. Define parameters in the Output format table if needed. For example, select the As attribute
check box for a column if you want to use its name and value as an attribute for the parent XML
element, clear the Use schema column name check box for a column to reuse the column label
from the input schema as the tag label. In this use case, we keep all the default output format
settings as they are.
8. To group output rows according to the contract number, select the Use dynamic grouping check
box, add a line in the Group by table, select Contract from the Column list field, and enter an
attribute for it in the Attribute label field.
1077
tFileInputPositional
1078
tFileInputProperties
tFileInputProperties
Reads a text file row by row and separates the fields according to the model key = value.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
For this component, the schema is read-only. It is made of
two column, Key and Value, corresponding to the parameter
name and the parameter value to be copied.
File format Select from the list your file format, either: .properties or
.ini.
File Name Name and path to the file to be created and/or the variable
to be used.
For further information about how to define and use a
variable, see Talend Studio User Guide.
Calculate MD5 Hash Select this check box to verify that the file to be processed
has been correctly downloaded.
Advanced settings
Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
1079
tFileInputProperties
Global Variables
Usage
Usage rule Use this component to read a text file and separate data
according to the structure key = value.
1080
tFileInputProperties
The glossary displays on the console listing three columns holding: the key name in the first column,
the English term in the second, and the corresponding French term in the third.
1081
tFileInputProperties
1082
tFileInputProperties
6. Select all columns from the English_terms table and drop them to the output table.
Select the key column from the English_terms table and drop it to the key column in the
French_terms table.
7. In the glossary table in the lower right corner of the tMap editor, rename the value field to EN
because it will hold the values of the English file.
8. Click the plus button to add a line to the glossary table and rename it to FR.
9. In the Length field, set the maximum length to 255.
10. In the upper left corner of the tMap editor, select the value column in the English_terms table and
drop it to the FR column in the French_terms table. When done, click OK to validate your changes
and close the map editor and propagate the changes to the next component.
1083
tFileInputProperties
1084
tFileInputRaw
tFileInputRaw
Reads all data in a raw file and sends it to a single output column for subsequent processing by
another component.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Mode Read the file as a string: The content of the file is read as a
string.
Read the file as a bytes array: The content of the file is read
as a bytes array.
Stream the file: As soon as the first character is entered in
the source file, it is read immediately.
1085
tFileInputRaw
Encoding If you are using the Read the file as a string mode, select
the encoding type from the list or select Custom and define
it manually.
Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.
To catch the FileNotFoundException, you also need to
select this check box.
Advanced settings
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
Global Variables FILENAME_PATH: the path of the input file. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
Usage rule Use this component to provide input data for Jobs that
require a single column of data or that require a whole file
to be read as a single column.
Related Scenario
For a related use case, see:
• Uploading files to Dropbox on page 655
1086
tFileInputRegex
tFileInputRegex
Reads a file row by row to split them up into fields using regular expressions and sends the fields as
defined in the schema to the next component.
Powerful feature which can replace number of other components of the File family. Requires some
advanced knowledge on regular expression syntax.
Basic settings
File name/Stream File name: Name of the file and/or the variable to be
processed.
Warning:
• The regular expression needs to be in double
quotes.
• To extract all the desired strings, make sure the
regular expression contains the corresponding
subpatterns that match the strings. Also, each
subpattern in the regular expression needs to be in
a pair of brackets.
1087
tFileInputRegex
Ignore error message for the unmatched record Select this check box to avoid outputing error messages for
records that do not match the specified regex. This check
box is cleared by default.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
Skip empty rows Select this check box to skip the empty rows.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
In the Map/Reduce version of tFileInputRegex, you need to
select the Custom encoding check box to display this list.
1088
tFileInputRegex
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Usage rule Use this component to read a file and separate fields
contained in this file according to the defined Regex. You
can also create a rejection flow using a Row > Reject link
to filter the data which doesn't correspond to the type
defined. For an example of how to use these two links, see
Procedure on page 975.
1089
tFileInputRegex
2. The Job is built-in for this scenario. Hence, the Properties are set for this station only.
3. Fill in a path to the file in File Name field. This field is mandatory.
4. Define the Row separator identifying the end of a row.
5. Then define the Regular expression in order to delimit fields of a row, which are to be passed on
to the next component. You can type in a regular expression using Java code, and on mutiple lines
if needed.
Warning:
Regex syntax requires double quotes.
6. In this expression, make sure you include all subpatterns matching the fields to be extracted.
7. In this scenario, ignore the header, footer and limit fields.
8. Select a local (Built-in) Schema to define the data to pass on to the tFileOutputPositional
component.
9. You can load or create the schema through the Edit Schema function.
10. Then define the second component properties:
1090
tFileInputRegex
1091
tFileInputXML
tFileInputXML
Reads an XML structured file row by row to split them up into fields and sends the fields as defined in
the schema to the next component.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
File name/Stream File name: Name and path of the file to be processed.
1092
tFileInputXML
Loop XPath query Node of the tree, which the loop is based on.
Note:
The Get Nodes option functions in the DOM4j and SAX
modes, although in SAX mode namespaces are not s
upported. For further information concerning the DOM4j
and SAX modes, please see the properties noted in the
Generation mode list of the Advanced Settings tab.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
Ignore DTD file Select this check box to ignore the DTD file indicated in the
XML file being processed.
Advanced separator (for number) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
1093
tFileInputXML
Ignore the namespaces Select this check box to ignore name spaces.
Generate a temporary file: click the three-dot button to
browse to the XML temporary file and set its path in the
field.
Use Separator for mode Xerces Select this check box if you want to separate concatenated
children node values.
Note:
This field can only be used if the selected Generation
mode is Xerces.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
Generation mode From the drop-down list select the generation mode for the
XML file, according to the memory available and the desired
speed:
• Slow and memory-consuming (Dom4j)
Note:
This option allows you to use dom4j to process the
XML files of high complexity.
• Memory-consuming (Xerces).
• Fast with low memory consumption (SAX)
Validate date Select this check box to check the date format strictly
against the input schema.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
1094
tFileInputXML
Usage
Procedure
Procedure
1. Drop tFileInputXML and tLogRow from the Palette to the design workspace.
2. Connect both components together using a Main Row link.
3. Double-click tFileInputXML to open its Basic settings view and define the component properties.
1095
tFileInputXML
4. As the street dir file used as input file has been previously defined in the Metadata area, select
Repository as Property type. This way, the properties are automatically leveraged and the rest
of the properties fields are filled in (apart from Schema). For more information regarding the
metadata creation wizards, see Talend Studio User Guide.
5. Select the same way the relevant schema in the Repository metadata list. Edit schema if you want
to make any change to the schema loaded.
6. The Filename shows the structured file to be used as input
7. In Loop XPath query, change if needed the node of the structure where the loop is based.
8. On the Mapping table, fill the fields to be extracted and displayed in the output.
9. If the file size is consequent, fill in a Limit of rows to be read.
10. Enter the encoding if needed then double-click on tLogRow to define the separator character.
11. Save your Job and press F6 to execute it.
Results
The fields defined in the input properties are extracted from the XML structure and displayed on the
console.
Procedure
Procedure
1. Drop the following components from the Palette to the design workspace: tFileInputXML,
tFileOutputXML and tLogRow.
Right-click tFileInputXML and select Row > Main in the contextual menu and then click
tFileOutputXML to connect the components together.
Right-click tFileInputXML and select Row > Reject in the contextual menu and then click tLogRow
to connect the components together using a reject link.
1096
tFileInputXML
2. Double-click tFileInputXML to display the Basic settings view and define the component
properties.
3. In the Property Type list, select Repository and click the three-dot button next to the field to
display the Repository Content dialog box where you can select the metadata relative to the input
file if you have already stored it in the File xml node under the Metadata folder of the Repository
tree view. The fields that follow are automatically filled with the fetched data. If not, select Built-
in and fill in the fields that follow manually.
For more information about storing schema metadat in the Repository tree view, see Talend
Studio User Guide.
4. In the Schema Type list, select Repository and click the three-dot button to open the dialog box
where you can select the schema that describe the structure of the input file if you have already
stored it in the Repository tree view. If not, select Built-in and click the three-dot button next to
Edit schema to open a dialog box where you can define the schema manually.
1097
tFileInputXML
The schema in this example consists of five columns: id, CustomerName, CustomerAddress, idState
and id2.
5. Click the three-dot button next to the Filename field and browse to the XML file you want to
process.
6. In the Loop XPath query, enter between inverted commas the path of the XML node on which to
loop in order to retrieve data.
In the Mapping table, Column is automatically populated with the defined schema.
In the XPath query column, enter between inverted commas the node of the XML file that holds
the data you want to extract from the corresponding column.
7. In the Limit field, enter the number of lines to be processed, the first 10 lines in this example.
8. Double-click tFileOutputXML to display its Basic settings view and define the component
properties.
9. Click the three-dot button next to the File Name field and browse to the output XML file you want
to collect data in, customer_data.xml in this example.
In the Row tag field, enter between inverted commas the name you want to give to the tag that
will hold the recuperated data.
Click Edit schema to display the schema dialog box and make sure that the schema matches that
of the preceding component. If not, click Sync columns to retrieve the schema from the preceding
component.
10. Double-click tLogRow to display its Basic settings view and define the component properties.
Click Edit schema to open the schema dialog box and make sure that the schema matches that
of the preceding component. If not, click Sync columns to retrieve the schema of the preceding
component.
1098
tFileInputXML
Results
The output file customer_data.xml holding the correct XML data is created in the defined path and
erroneous XML data is displayed on the console of the Run view.
1099
tFileList
tFileList
Iterates a set of files or folders in a given directory based on a filemask pattern.
Note: This component iterates over every file in a directory, including system file, hidden file, zero-
byte file, and so on, as long as the file meets the conditions set in the Files field.
Basic settings
FileList Type Select the type of input you want to iterate on from the list:
Files if the input is a set of files,
Directories if the input is a set of directories,
Both if the input is a set of the above two types.
Include subdirectories Select this check box if the selected input source type
includes sub-directories.
Case Sensitive Set the case mode from the list to either create or not
create case sensitive filter on filenames.
Generate Error if no file found Select this check box to generate an error message if no
files or directories are found.
Use Glob Expressions as Filemask This check box is selected by default. It filters the results
using a Global Expression (Glob Expressions).
Files Click the plus button to add as many filter lines as needed:
Filemask: in the added filter lines, type in a filename or a
filemask using special characters or regular expressions.
Order by The folders are listed first of all, then the files. You can
choose to prioritise the folder and file order either:
By default: alphabetical order, by folder then file;
By file name: alphabetical order or reverese alphabetical
order;
By file size: smallest to largest or largest to smallest;
By modified date: most recent to least recent or least recent
to most recent.
1100
tFileList
Note:
If ordering by file name, in the event of identical file
names then modified date takes precedence. If ordering
by file size, in the event of identical file sizes then file
name takes precedence. If ordering by modified date,
in the event of identical dates then file name takes
precedence.
Order action Select a sort order by clicking one of the following radio
buttons:
ASC: ascending order;
DESC: descending order;
Advanced settings
Use Exclude Filemask Select this check box to enable Exclude Filemask field to
exclude filtering condition based on file type:
Exclude Filemask: Fill in the field with file types to be
excluded from the Filemasks in the Basic settings view.
Note:
File types in this field should be quoted with double
quotation marks and seperated by comma.
Format file path to slash(/) style(useful on Windows) Select this check box to format the file path to slash(/) style
which is useful on Windows.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
1101
tFileList
Usage
1102
tFileList
2. Browse to the Directory that holds the files you want to process. To display the path on the Job
itself, use the label (__DIRECTORY__) that shows up when you put the pointer anywhere in the
Directory field. Type in this label in the Label Format field you can find if you click the View tab in
the Basic settings view.
3. In the Basic settings view and from the FileList Type list, select the source type you want to
process, Files in this example.
4. In the Case sensitive list, select a case mode, Yes in this example to create case sensitive filter on
file names.
5. Keep the Use Glob Expressions as Filemask check box selected if you want to use global
expressions to filter files, and define a file mask in the Filemask field.
6. Double-click tFileInputDelimited to display its Basic settings view and set its properties.
7. Enter the File Name field using a variable containing the current filename path, as you filled in
the Basic settings of tFileList. Press Ctrl+Space bar to access the autocomplete list of variables,
and select the global variable ((String)globalMap.get("tFileList_1_CURREN
T_FILEPATH")) . This way, all files in the input directory can be processed.
8. Fill in all other fields as detailed in the tFileInputDelimited section. Related topic:
tFileInputDelimited on page 1015.
9. Select the last component, tLogRow, to display its Basic settings view and fill in the separator to
be used to distinguish field content displayed on the console. Related topic: tLogRow on page
1977.
1103
tFileList
The Job iterates on the defined directory, and reads all included files. Then delimited data is passed
on to the last component which displays it on the console.
1104
tFileList
1105
tFileList
2. Double-click the first tIterateToFlow component to show its Basic settings view.
3. Double-click the [...] button next to Edit schema to open the Schema dialog box and define the
schema of the text file the next component will write filenames to. When done, click OK to close
the dialog box and propagate the schema to the next component.
In this example, the schema contains only one column: Filename.
1106
tFileList
4. In Value field of the Mapping table, press Ctrl+Space to access the autocomplete list of variables,
and select the global variable ((String)globalMap.get("tFileList_1_CURREN
T_FILE")) to read the name of each file in the input directory, which will be put into a data
flow to pass to the next component.
5. In the Basic settings view of the first tFileOutputDelimited component, fill the File Name field
with the path of the text file that will store the filenames from the incoming flow, D:/temp/tempda
ta.csv in this example. This completes the configuration of the first subJob.
6. Repeat the steps above to complete the configuration of the second subJob, but:
• fill the Directory field in the Basic settings view of the second tFileList component with the
other folder you want to read filenames from, E:/DataFiles/DQ/images in this example.
• select the Append check box in the Basic settings view of the second tFileOutputDelimited
component so that the filenames previously written to the text file will not be overwritten.
7. In the Basic settings view of the tFileInputDelimited component, fill the File name/Stream
field with the path of the text file that stores the list of filenames, D:/temp/tempdata.csv in this
example, and define the file schema, which contains only one column in this example, Filename.
1107
tFileList
8. In the Basic settings view of the tUniqRow component, select the Key attribute check box for the
only column, Filename in this example.
9. In the Basic settings view of the tLogRow component, select the Table (print values in cells of a
table) option for better display effect.
1108
tFileList
Results
For other scenarios using tFileList, see tFileCopy on page 988.
1109
tFileOutputARFF
tFileOutputARFF
Writes an ARFF file that holds data organized according to the defined schema.
Basic settings
File name Name or path to the output file and/or the variable to be
used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
Attribute Define Displays the schema you defined in the Edit schema dialog
box.
Column: Name of the column.
Type: Data type.
Pattern: Enter the data model (pattern), if necessary.
Append Select this check box to add the new rows at the end of the
file.
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
1110
tFileOutputARFF
Built-in: You can create the schema and store it locally for
this component. Related topic: see Talend Studio User Guide.
Create directory if not exists This check box is selected by default. It creates a directory
to hold the output table if it does not exist.
Advanced settings
Don't generate empty file Select this check box if you do not want to generate empty
files.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Usage rule Use this component along with a Row link to collect data
from another component and to re-write the data to an
ARFF file.
1111
tFileOutputARFF
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenario
For tFileOutputARFF related scenario, see Displaying the content of a ARFF file on page 1011.
1112
tFileOutputDelimited
tFileOutputDelimited
Outputs the input data to a delimited file according to the defined schema.
Basic settings
Use Output Stream Select this check box process the data flow of interest. Once
you have selected it, the Output Stream field displays and
you can type in the data flow of interest.
The data flow to be processed must be added to the flow
in order for this component to fetch these data via the
corresponding representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using
along with this component; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.
For further information about how to use a stream, see
Reading data from a remote file in streaming mode on page
1020.
File Name Name or path to the output file and/or the variable to be
used.
This field becomes unavailable once you have selected the
Use Output Stream check box.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
1113
tFileOutputDelimited
Append Select this check box to add the new rows at the end of the
file.
Include Header Select this check box to include the column header to the
file.
Compress as zip file Select this check box to compress the output file in zip
format.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the output component.
Advanced settings
Advanced separator (for numbers) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
CSV options Select this check box to specify the following CSV
parameters:
• Escape char: enter the escape character between
double quotation marks.
• Text enclosure: enter the enclosure character (only
one character) between double quotation marks.
For example, """ needs to be entered when double
quotation marks (") are used as the enclosure character.
It is recommended to use standard escape character, that
is "\". Otherwise, you should set the same character for
1114
tFileOutputDelimited
Create directory if not exists This check box is selected by default. It creates the directory
that holds the output delimited file, if it does not already
exist.
Split output in several files In case of very big output files, select this check box to
divide the output delimited file into several files.
Rows in each output file: set the number of lines in each of
the output files.
Custom the flush buffer size Select this check box to define the number of lines to write
before emptying the buffer.
Row Number: set the number of lines to write.
Output in row mode Select this check box to ensure atomicity of the flush so
that each row of data can remain consistent as a set and
incomplete rows of data are never written to a file.
This check box is mostly useful when using this component
in the multi-thread situation.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
Don't generate empty file Select this check box if you do not want to generate empty
files.
Throw an error if the file already exist Select this check box to throw an exception if the output
file specified in the File Name field on the Basic settings
view already exists.
Clear this check box to overwrite the existing file.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
1115
tFileOutputDelimited
Usage
Usage rule Use this component to write a delimited file and separate
fields using a field separator value.
1116
tFileOutputDelimited
2. Click tFileInputDelimited and then OK to close the dialog box. A tFileInputDelimited component
holding the name of your input schema appears on the design workspace.
3. Drop a tMap component and a tFileOutputDelimited component from the Palette to the design
workspace.
4. Link the components together using Row > Main connections.
Procedure
1. Double-click tFileInputDelimited to open its Basic settings view. All its property fields are
automatically filled in because you defined your input file locally.
2. If you do not define your input file locally in the Repository tree view, fill in the details manually
after selecting Built-in in the Property type list.
3. Click the [...] button next to the File Name field and browse to the input file, customer.csv in this
example.
1117
tFileOutputDelimited
Warning:
If the path of the file contains some accented characters, you will get an error message when
executing your Job.
4. In the Row Separators and Field Separators fields, enter respectively "\n" and ";" as line and field
separators.
5. If needed, set the number of lines used as header and the number of lines used as footer in the
corresponding fields and then set a limit for the number of processed rows.
In this example, Header is set to 6 while Footer and Limit are not set.
6. In the Schema field, schema is automatically set to Repository and your schema is already defined
since you have stored your input file locally for this example. Otherwise, select Built-in and click
the [...] button next to Edit Schema to open the Schema dialog box where you can define the
input schema, and then click OK to close the dialog box.
Procedure
1. In the design workspace, double-click tMap to open its editor.
1118
tFileOutputDelimited
2.
In the tMap editor, click on top of the panel to the right to open the Add a new output table
dialog box.
3. Enter a name for the table you want to create, row2 in this example.
4. Click OK to validate your changes and close the dialog box.
5. In the table to the left, row1, select the first three lines (Id, CustomerName and CustomerAddress)
and drop them to the table to the right
6. In the Schema editor view situated in the lower left corner of the tMap editor, change the type of
RegisterTime to String in the table to the right.
Procedure
1. In the design workspace, double-click tFileOutputDelimited to open its Basic settings view and
define the component properties.
2. In the Property Type field, set the type to Built-in and fill in the fields that follow manually.
3. Click the [...] button next to the File Name field and browse to the output file you want to write
data in, customerselection.txt in this example.
4. In the Row Separator and Field Separator fields, set "\n" and ";" respectively as row and field
separators.
1119
tFileOutputDelimited
5. Select the Include Header check box if you want to output columns headers as well.
6. Click Edit schema to open the schema dialog box and verify if the recuperated schema
corresponds to the input schema. If not, click Sync Columns to recuperate the schema from the
preceding component.
The three specified columns Id, CustomerName and CustomerAddress are output in the defined
output file.
1120
tFileOutputDelimited
new java.io.File("C:/myFolder").mkdirs();
globalMap.put("out_file",new
java.io.FileOutputStream("C:/myFolder/customerselection.txt",false));
Note:
In this scenario, the command we use in the Code area of tJava will create a new folder C:/
myFolder where the output file customerselection.txt will be saved. You can customize the
command in accordance with actual practice.
4. Select Use Output Stream check box to enable the Output Stream field in which you can define
the output stream using command.
Fill in the Output Stream field with following command:
(java.io.OutputStream)globalMap.get("out_file")
Note:
You can customize the command in the Output Stream field by pressing CTRL+SPACE to select
built-in command from the list or type in the command into the field manually in accordance
with actual practice. In this scenario, the command we use in the Output Stream field will call
the java.io.OutputStream class to output the filtered data stream to a local file which is
defined in the Code area of tJava in this scenario.
5. Click Sync columns to retrieve the schema defined in the preceding component.
1121
tFileOutputDelimited
1122
tFileOutputExcel
tFileOutputExcel
Writes an MS Excel file with separated data values according to a defined schema.
Basic settings
Write excel 2007 file format (xlsx / xlsm) Select this check box to write the processed data into the
.xlsx or .xlsm format of Excel 2007.
Use Output Stream Select this check box process the data flow of interest. Once
you have selected it, the Output Stream field displays and
you can type in the data flow of interest.
The data flow to be processed must be added to the flow
in order for this component to fetch these data via the
corresponding representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using
along with this component; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of writing manually, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.
For further information about how to use a stream, see
Reading data from a remote file in streaming mode on page
1020.
1123
tFileOutputExcel
Include header Select this check box to include a header row to the output
file.
Append existing file Select this check box to add the new lines at the end of the
file.
Append existing sheet: Select this check box to add the new
lines at the end of the Excel sheet.
Is absolute Y pos. Select this check box to add information in specified cells:
First cell X: cell position on the X-axis (X-coordinate or
Abcissa).
First cell Y: cell position on the Y-axis (Y-coordinate).
Keep existing cell format: select this check box to retain the
original layout and format of the cell you want to write into.
Define all columns auto size Select this check box if you want the size of all your
columns to be defined automatically. Otherwise, select the
Auto size check boxes next to the column names you want
their size to be defined automatically.
Protect file Select this check box and enter the password in the
Password field to protect the file using a password.
This component supports agile encryption.
This option is available when Write excel2007 file
format(xlsx) is selected and Use Output Stream is not
selected.
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
1124
tFileOutputExcel
Advanced settings
Create directory if not exists This check box is selected by default. This option creates
the directory that will hold the output files if it does not
already exist.
Custom the flush buffer size Available when Select this check box to write the processed
data into theWrite excel2007 file format (xlsx) is selected
in the Basic settings view.
Select this check box to set the maximum number of rows
in the Row number field that are allowed in the buffer.
Advanced separator (for numbers) Select this check box to modify the separators you want to
use for numbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.
Don't generate empty file Select the check box to avoid the generation of an empty
file.
Recalculate formula Select this check box if you need to recalculate formula(s) in
the specified Excel file.
This check box appears only when you select all these three
check boxes: Write excel2007 file format(xlsx), Append
existing file, and Append existing sheet.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
1125
tFileOutputExcel
Usage
Usage rule Use this component to write an MS Excel file with data
passed on from other components using a Row link.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenario
For tFileOutputExcel related scenario, see tSugarCRMInput (deprecated);
For scenario about the usage of Use Output Stream check box, see Utilizing Output Stream to save
filtered data to a local file on page 1120.
1126
tFileOutputJSON
tFileOutputJSON
Receives data and rewrites it in a JSON structured data block in an output file.
Basic settings
Generate an array json Select this check box to generate an array JSON file.
Name of data block Enter a name for the data block to be written, between
double quotation marks.
This field disappears when the Generate an array json check
box is selected.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the Output component.
1127
tFileOutputJSON
Advanced settings
Create directory if not exists This check box is selected by default. This option creates
the directory that will hold the output files if it does not
already exist.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Procedure
Procedure
1. Drop a tRowGenerator and a tFileOutputJSON component onto the workspace from the Palette.
2. Link the components using a Row > Main connection.
3. Double click tRowGenerator to define its Basic Settings properties in the Component view.
1128
tFileOutputJSON
4. Click [...] next to Edit Schema to display the corresponding dialog box and define the schema.
1129
tFileOutputJSON
10. Under Functions, select pre-defined functions for the columns, if required, or select [...] to set
customized function parameters in the Function parameters tab.
11. Enter the number of rows to be generated in the corresponding field.
12. Click OK to close the dialog box.
13. Click tFileOutputJSON to set its Basic Settings properties in the Component view.
14. Click [...] to browse to where you want the output JSON file to be generated and enter the file
name.
15. Enter a name for the data block to be generated in the corresponding field, between double
quotation marks.
16. Select Built-In as the Schema type.
17. Click Sync Columns to retrieve the schema from the preceding component.
18. Press F6 to run the Job.
Results
The data from the input schema is written in a JSON structured data block in the output file.
1130
tFileOutputLDIF
tFileOutputLDIF
Writes or modifies an LDIF file with data separated in respective entries based on the schema defined,
or else deletes content from an LDIF file.
tFileOutputLDIF outputs data to an LDIF type of file which can then be loaded into an LDAP directory.
Basic settings
Change type Select a changetype that defines the operation you want to
perform on the entries in the output LDIF file.
• Add: the LDAP operation for adding the entry.
• Modify: the LDAP operation for modifying the entry.
• Delete: the LDAP operation for deleting the entry.
• Modrdn: the LDAP operation for modifying an entry's
RDN (Relative Distinguished Name).
• Default: the default LDAP operation.
Multi-Values / Modify Detail Specify the attributes for multi-value fields when Add or
Default is selected from the Change type list or provide the
detailed modification information when Modify is selected
from the Change type list.
• Column: The Column cells are automatically filled with
the defined schema column names.
• Operation: Select an operation to be performed on
the corresponding field. This column is available only
when Modify is selected from the Change type list.
• MultiValue: Select the check box if the corresponding
field is a multi-value field.
• Separator: Specify the value separator in the
corresponding multi-value field.
• Binary: Select the check box if the corresponding field
represents binary data.
• Base64: Select the check box if the corresponding
field should be base-64 encoded. The base-64
encoded data in the LDIF file is represented by the ::
symbol.
This table is available only when Add, Modify, or Default is
selected from the Change type list.
1131
tFileOutputLDIF
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the Output component.
Append Select this check box to add the new rows at the end of the
file.
Advanced settings
Enforce safe base 64 conversion Select this check box to enable the safe base-64 encoding.
For more detailed information about the safe base-64
encoding, see https://www.ietf.org/rfc/rfc2849.txt.
Create directory if not exists This check box is selected by default. It creates the
directory that holds the output delimited file, if it does not
already exist.
Custom the flush buffer size Select this check box to specify the number of lines to
write before emptying the buffer.
Row number Type in the number of lines to write before emptying the bu
ffer.
This field is available only when the Custom the flush
buffer size check box is selected.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
Don't generate empty file Select this check box if you do not want to generate empty
files.
1132
tFileOutputLDIF
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Usage rule This component is used to write an LDIF file with data
passed on from an input component using a Row > Main
connection.
1133
tFileOutputLDIF
Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.
2. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
four columns: dn, id_owners, registration, and make, all of String type.
1134
tFileOutputLDIF
3. Click OK to close the schema editor and accept the propagation prompted by the pop-up dialog
box.
4. In the Mode area, select Use Inline Content(delimited file), and then in the Content field
displayed, enter the following input data:24;24;5382 KC 94;Volkswagen 32;32;9591
0E 79;Honda 35;35;3129 VH 61;Volkswagen
5. Double-click tMysqlOutput to open its Basic settings view.
6. Fill in the Host, Port, Database, Username, and Password fields with your MySQL database
connection details.
7. In the Table field, enter the name of the table into which the data will be written. In this example,
it is ldifdata.
8. Select Drop table if exists and create from the Action on table drop-down list.
1135
tFileOutputLDIF
Extracting data from the database table and writing it into an LDIF file
Procedure
1. Double-click tMysqlInput to open its Basic settings view.
2. Fill in the Host, Port, Database, Username, and Password fields with your MySQL database
connection details.
3. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
four columns: dn, id_owners, registration, and make, all of String type.
4. In the Table Name field, enter the name of the table from which the data will be read. In this
example, it is ldifdata.
5. Click the Guess Query button to fill in the Query field with the auto-generated query.
6. Double-click tFileOutputLDIF to open its Basic settings view.
7. In the File Name field, browse to or enter the path to the LDIF file to be generated. In this
example, it is E:/out.ldif.
1136
tFileOutputLDIF
The LDIF file created contains the data from the database table and the change type for the
entries is set to add.
1137
tFileOutputMSDelimited
tFileOutputMSDelimited
Creates a complex multi-structured delimited file, using data structures (schemas) coming from
several incoming Row flows.
Basic settings
File Name Name and path to the file to be created and/or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
Use Multi Field Separators Select this check box to set a different field separator for
each of the schemas using the Field separator field in the
Schemas area.
Advanced settings
Advanced separator (for numbers) Select this check box to modify the separators used for nu
mbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
CSV options Select this check box to take into account all parameters
specific to CSV files, in particular Escape char and Text
enclosure parameters.
1138
tFileOutputMSDelimited
Create directory if not exists This check box is selected by default. It creates the
directory that holds the output delimited file, if it does not
already exist.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
Don't generate empty file Select this check box if you do not want to generate empty
files.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
1139
tFileOutputMSPositional
tFileOutputMSPositional
Creates a complex multi-structured file, using data structures (schemas) coming from several
incoming Row flows.
Basic settings
File Name Name and path to the file to be created and/or variable to
be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
Advanced settings
Advanced separator (for numbers) Select this check box to modify the separators used for nu
mbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Create directory if not exists This check box is selected by default. It creates the
directory that holds the output delimited file, if it does not
already exist.
1140
tFileOutputMSPositional
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
1141
tFileOutputMSXML
tFileOutputMSXML
Creates a complex multi-structured XML file, using data structures (schemas) coming from several
incoming Row flows.
Basic settings
File Name Name and path to the file to be created and or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
Configure XML tree Opens the dedicated interface to help you set the XML
mapping. For details about the interface, see Defining the
MultiSchema XML tree on page 1143.
Advanced settings
Create directory only if not exists This check box is selected by default. It creates the
directory that holds the output delimited file, if it does not
already exist.
Advanced separator (for numbers) Select this check box to modify the separators used for nu
mbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
Don't generate empty file Select this check box if you do not want to generate empty
files.
Trim the whitespace characters Select this check box to remove leading and trailing
whitespace from the columns.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
1142
tFileOutputMSXML
Global Variables
To the left of the mapping interface, under Linker source, the drop-down list includes all the input
schemas that should be added to the multi-schema output XML file (only if more than one input flow
is connected to the tFileOutputMSXML component).
And under Schema List, are listed all columns retrieved from the input data flow in selection.
1143
tFileOutputMSXML
To the right of the interface, are expected all XML structures you want to create in the output XML
file.
You can create manually or easily import the XML structures. Then map the input schema columns
onto each element of the XML tree, respectively for each of the input schemas in selection under
Linker source.
Procedure
1. Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2. Right-click on the root tag to display the contextual menu.
3. On the menu, select Import XML tree.
4. Browse to the file to import and click OK.
• You can import an XML tree from files in XML, XSD and DTD formats.
• When importing an XML tree structure from an XSD file, you can choose an element as the
root of your XML tree.
The XML Tree column is hence automatically filled out with the correct elements.
5. If you need to add or remove an element or sub-elements, right-click the relevant element of the
tree to display the contextual menu.
6. Select Delete to remove the selection from the tree or select the relevant option among: Add sub-
element, Add attribute, Add namespace to enrich the tree.
Procedure
1. Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2. Right-click on the root tag to display the contextual menu.
3. On the menu, select Add sub-element to create the first element of the structure.
4. If you need to add an attribute or a child element to any element or remove any element, right-
click the left of the corresponding element name to display the contextual menu.
5. Right-click to the left of the element name to display the contextual menu.
6. On the menu, select the relevant option among: Add sub-element, Add attribute, Add namespace
or Delete.
1144
tFileOutputMSXML
Procedure
1. Click on one of the Schema column name.
2. Drag it onto the relevant sub-element to the right.
3. Release the mouse button to implement the actual mapping.
A light blue link displays that illustrates this mapping. If available, use the Auto-Map button,
located to the bottom left of the interface, to carry out this operation automatically.
4. If you need to disconnect any mapping on any element of the XML tree, select the element and
right-click to the left of the element name to display the contextual menu
5. Select Disconnect link.
The light blue link disappears.
Procedure
1. Select the relevant element on the XML tree.
2. Right-click to the left of the element name to display the contextual menu.
3. Select Set as Loop Element.
Results
The Node Status column shows the newly added status.
There can only be one loop element at a time.
1145
tFileOutputMSXML
Procedure
1. Select the relevant element on the XML tree.
2. Right-click to the left of the element name to display the contextual menu.
3. Select Set as Group Element.
Results
The Node Status column shows the newly added status and any group status required are
automatically defined, if needed.
Click OK once the mapping is complete to validate the definition and continue the job configuration
where needed.
Related scenarios
No scenario is available for the Standard version of this component yet.
1146
tFileOutputPositional
tFileOutputPositional
Writes a file row by row according to the length and the format of the fields or columns in a row.
Basic settings
Use Output Stream Select this check box process the data flow of interest. Once
you have selected it, the Output Stream field displays and
you can type in the data flow of interest.
The data flow to be processed must be added to the flow
in order for this component to fetch these data via the
corresponding representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using
along with this component; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.
For further information about how to use a stream, see
Reading data from a remote file in streaming mode on page
1020.
File Name Name or path to the file to be processed and or the variable
to be used.
This field becomes unavailable once you have selected the
Use Output Stream check box.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
1147
tFileOutputPositional
Built-In: You create and store the schema locally for this
component only.
Append Select this check box to add the new rows at the end of the
file.
Include header Select this check box to include the column header to the fi
le.
Compress as zip file Select this check box to compress the output file in zip fo
rmat.
Formats Customize the positional file data format and fill in the
columns in the Formats table.
Column: Select the column you want to customize.
Size: Enter the column size.
Padding char: Type in between quotes the padding
characters used. A space by default.
Alignment: Select the appropriate alignment parameter.
Keep: If the data in the column or in the field are too long,
select the part you want to keep.
Advanced settings
Advanced separator (for numbers) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Use byte length as the cardinality Select this check box to add support of double-byte
character to this component. JDK 1.6 is required for this
feature.
1148
tFileOutputPositional
Create directory if not exists This check box is selected by default. It creates a directory
to hold the output table if it does not exist.
Custom the flush buffer size Select this check box to define the number of lines to write
before emptying the buffer.
Row Number: set the number of lines to write.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
Don't generate empty file Select this check box if you do not want to generate empty
files.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Usage rule Use this component to read a file and separate the fields
using the specified separator.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
1149
tFileOutputPositional
Related scenario
For a related scenario, see Reading data using a Regex and outputting the result to Positional file on
page 1089.
For scenario about the usage of Use Output Stream check box, see Utilizing Output Stream to save
filtered data to a local file on page 1120.
1150
tFileOutputProperties
tFileOutputProperties
Writes a configuration file, of the type .ini or .properties, containing text data organized according to
the model key = value.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
For this component, the schema is read-only. It is made of
two column, Key and Value, corresponding to the parameter
name and the parameter value to be copied.
File format Select from the list file format: either .properties or .ini.
File Name Name or path to the file to be processed and/or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
Advanced settings
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
1151
tFileOutputProperties
Global Variables
Usage
Usage rule Use this component to write files where data is organized
according to the structure key = value.
Related scenarios
For a related scenario, see Reading and matching the keys and the values of different .properties files
and outputting the results in a glossary on page 1080 of tFileInputProperties on page 1079.
1152
tFileOutputRaw
tFileOutputRaw
Provides data coming from another component, in the form of a single column of output data.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Encoding If the output is a string, select the encoding type from the
list or select Custom and define it manually.
Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.
To catch the FileNotFoundException, you also need to
select this check box.
1153
tFileOutputRaw
Advanced settings
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
Global Variables FILENAME_PATH: the path of the input file. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
1154
tFileOutputXML
tFileOutputXML
Writes an XML file with separated data values according to a defined schema.
Basic settings
File Name Name or path to the output file and/or the variable to be
used.
Related topic: see Defining variables from the Component
view section in Talend Studio User Guide
Incoming record is a document Select this check box if the data from the preceding
component is in XML format.
When this check box is selected, a Column list appears
allowing you to select a Document type column of the
schema that holds the data, and the Row tag field d
isappears.
When this check box is selected, in the Advanced settings
view, only the check boxes Create directory if not exists,
Don't generate empty file, Trim data, tStatCatcher Statistics
and the list Encoding are available.
Row tag Specify the tag that will wrap data and structure per row.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
1155
tFileOutputXML
Built-In: You create and store the schema locally for this
component only.
Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the input component.
Advanced settings
Split output in several files If the output is big, you can split the output into several
files, each containing the specified number of rows.
Rows in each output file: Specify the number of rows in each
output file.
Create directory if not exists This check box is selected by default. It creates a directory
to hold the output XML files if required.
Root tags Specify one or more root tags to wrap the whole output file
structure and data. The default root tag is root.
Note:
If the same column is selected in both the Output format
table as an attribute and in the Use dynamic grouping
setting as the criterion for dynamic grouping, only the
dynamic group setting will take effect for that column.
Use dynamic grouping Select this check box if you want to dynamically group the
output columns. Click the plus button to add one ore more
grouping criteria in the Group by table.
Column: Select a column you want to use as a wrapping
element for the grouped output rows.
Attribute label: Enter an attribute label for the group
wrapping element, between quotation marks.
Custom the flush buffer size Select this check box to define the number of rows to buffer
before the data is written into the target file and the buffer
is emptied.
Row Number: Specify the number of rows to buffer.
1156
tFileOutputXML
Advanced separator (for numbers) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
Don't generate empty file Select the check box to avoid the generation of an empty fi
le.
Trim data Select this check box to remove the spaces at the beginning
and at the end of the text, and merge multiple consecutive
spaces into one within the text.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Usage rule Use this component to write an XML file with data passed
on from other components using a Row link.
Related scenarios
For related scenarios using tFileOutputXML, see Reading a Positional file and saving filtered results to
XML on page 1075 and Using a SOAP message from an XML file to get country name information and
saving the information to an XML file on page 3454.
1157
tFileProperties
tFileProperties
Creates a single row flow that displays the main properties of the processed file.
Basic settings
Schema and Edit schema A schema is a row description, it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. It describes the
main properties of the specified file. You can click the [...]
button next to Edit schema to view the predefined schema
which contains the following fields:
• abs_path: the absolute path of the file.
• dirname: the directory of the file.
• basename: the name of the file.
• mode_string: the access mode of the file, r and w for
read and write permissions respectively.
• size: the file size in bytes.
• mtime: the timestamp indicating when the file was
last modified, in milliseconds that have elapsed since
the Unix epoch (00:00:00 UTC, Jan 1, 1970).
• mtime_string: the date and time the file was last
modified.
Calculate MD5 Hash Select this check box to check the MD5 of the downloaded
file.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
1158
tFileProperties
Usage
Procedure
Procedure
1. Drop a tFileProperties component and a tLogRow component from the Palette onto the design
workspace.
2. Right-click on tFileProperties and connect it to tLogRow using a Main Row link.
1159
tFileProperties
Results
The properties of the defined file are displayed on the console.
1160
tFileRowCount
tFileRowCount
Opens a file and reads it row by row in order to determine the number of rows inside.
Basic settings
File Name Name or path to the file to be processed and/or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.
Row separator String (ex: "\n"on Unix) to distinguish rows in the output
file.
Ignore empty rows Select this check box to ignore the empty rows while the
component is counting the rows in the file.
Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.
Advanced settings
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
Global Variables COUNT: the number of rows in a file. This is a Flow variable
and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
1161
tFileRowCount
Usage
1;andy
2;mike
1162
tFileRowCount
2. In the File Name field, type in the full path of the .txt file. You can also click the [...] button to
browse for this file.
Select the Ignore empty rows check box.
3. Double-click tJava to open its Basic settings view.
In the Code box, enter the function to print out the number of rows in the file:
System.out.println(globalMap.get("tFileRowCount_1_COUNT"));
1163
tFileRowCount
In the Condition box, enter the statement to judge if the number of rows is 2:
((Integer)globalMap.get("tFileRowCount_1_COUNT"))==2
This if trigger means that if the row count equals 2, the rows of the .txt file will be written to
MySQL.
5. Double-click tFlieInputDelimited to open its Basic settings view.
In the File name/Stream field, type in the full path of the .txt file. You can also click the [...]
button to browse for this file.
6. Click the Edit schema button open the schema editor.
7. Click the [+] button to add two columns, namely id and name, respectively of the integer and
string type.
8. Click the Yes button in the pop-up box to propagate the schema setup to the following
component.
1164
tFileRowCount
10. In the Host and Port fields, enter the connection details.
In the Database field, enter the database name.
In the Username and Password fields, enter the authentication details.
In the Table field, enter the table name, for instance "staff".
11. In the Action on table list, select Create table if not exists.
In the Action on data list, select Insert.
As shown above, the Job has been executed successfully and the number of rows in the .txt file
has been printed out.
3. Go to the MySQL GUI and open the table staff.
As shown above, the table has been created with the two records inserted.
1165
tFileTouch
tFileTouch
Creates an empty file or, if the specified file already exists, updates its date of modification and of last
access while keeping the contents unchanged.
Basic settings
File Name Path and name of the file to be created and/or the variable
to be used.
Create directory if not exists This check box is selected by default. It creates a directory
to hold the output table if it does not exist.
Advanced settings
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
Usage
1166
tFileTouch
Related scenarios
No scenario is available for the Standard version of this component yet.
1167
tFileUnarchive
tFileUnarchive
Decompresses an archive file for further processing, in one of the following formats: *.tar.gz , *.tgz,
*.tar, *.gz and *.zip.
Basic settings
Use archive file name as root directory Select this check box to create a folder named as the
archive, if it does not exist, under the specified directory and
extract the zipped file(s) to that folder.
Check the integrity before unzip Select this check box to run an integrity check before
unzipping the archive.
Extract file paths Select this check box to reproduce the file path structure
zipped in the archive.
Need a password Select this check box and provide the correct decrypt
method and password if the archive to be unzipped is
password protected. Note that the encrypted archive must
be one created by the tFileArchive component; otherwise
you will see error messages or get nothing extracted even if
no error message is displayed.
Decrypt method: select the decrypt method from the list,
either Java Decrypt or Zip4j Decrypt.
Enter the password: enter the decryption password.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Advanced settings
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
1168
tFileUnarchive
Global Variables
Usage
Limitation
Warning:
Such files can be decompressed: *.tar.gz , *.tgz, *.tar, *.gz and
*.zip.
Related scenario
For tFileUnarchive related scenario, see tFileCompare on page 984.
1169
tFilterColumns
tFilterColumns
Homogenizes schemas either by ordering the columns, removing unwanted columns or adding new
columns.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component in the Job.
Built-In: You create and store the schema locally for this
component only.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
1170
tFilterColumns
Usage
Related Scenario
For more information regarding the tFilterColumns component in use, see Cleaning up and filtering a
CSV file on page 3027.
1171
tFilterRow
tFilterRow
Filters input rows by setting one or more conditions on the selected columns.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
The schema of this component is built-in only.
Logical operator used to combine conditions Select a logical operator to combine simple conditions and
to combine the filter results of both modes if any advanced
conditions are defined.
And: returns the boolean value of true if all conditions are tr
ue; otherwise false. For each two conditions combined using
a logical AND, the second condition is evaluated only if the
first condition is evaluated to be true.
Or: returns the boolean value of true if any condition is true;
otherwise false. For each two conditions combined using a
logical OR, the second condition is evaluated only if the first
condition is evaluated to be false.
Use advanced mode Select this check box when the operations you want to
perform cannot be carried out through the standard
functions offered, for example, different logical operations
in the same component. In the text field, type in the regular
expression as required.
If multiple advanced conditions are defined, use a logical
operator between two conditions:
&& (logical AND): returns the boolean value of true if both
conditions are true; otherwise false. The second condition is
evaluated only if the first condition is evaluated to be true.
1172
tFilterRow
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
1173
tFilterRow
When done, click OK to validate the schema setting and close the dialog box. A new dialog box
opens and asks you if you want to propagate the schema. Click Yes.
1174
tFilterRow
3. Set the row and field separators in the corresponding fields if needed. In this example, use the
default settings for both, namely the row separator is a carriage return and the field separator is a
semi-colon.
4. Select the Use Inline Content(delimited file) option in the Mode area and type in the input data in
the Content field.
Van Buren;M;73;Chicago
Adams;M;40;Albany
Jefferson;F;66;New York
Adams;M;9;Albany
Jefferson;M;30;Chicago
Carter;F;26;Chicago
Harrison;M;40;New York
Roosevelt;F;15;Chicago
Monroe;M;8;Boston
Arthur;M;20;Albany
Pierce;M;18;New York
Quincy;F;83;Albany
McKinley;M;70;Boston
Coolidge;M;4;Chicago
Monroe;M;60;Chicago
5. Double-click tFilterRow to display its Basic settings view and define its properties.
1175
tFilterRow
6. In the Conditions table, add four conditions and fill in the filtering parameters.
• From the InputColumn list field of the first row, select LastName, from the Function list field,
select Length, from the Operator list field, select Lower than, and in the Value column, type in
9 to limit the length of last names to nine characters.
• From the InputColumn list field of the second row, select Gender, from the Operator list field,
select Equals, and in the Value column, type in M in double quotes to filter records of male
persons.
Warning:
In the Value field, you must type in your values between double quotes for all types of va
lues, except for integer values, which do not need quotes.
• From the InputColumn list field of the third row, select Age, from the Operator list field, select
Greater than, and in the Value column, type in 10 to set the lower limit to 10 years.
• From the InputColumn list field of the four row, select Age, from the Operator list field, select
Lower than, and in the Value column, type in 80 to set the upper limit to 80 years.
7. To combine the conditions, select And as that only those records that meet all the defined
conditions are accepted.
8. In the Basic settings of tLogRow components, select Table (print values in cells of a table) in the
Mode area.
As shown above, the first table lists the records of male persons aged between 10 and 80 years,
whose last names are made up of less than nine characters, and the second table lists all the records
that do not match the filter conditions. Each rejected record has a corresponding error message that
explains the reason of rejection.
1176
tFilterRow
Procedure
Procedure
1. Double-click the tFilterRow component to show its Basic settings view.
2. Select the Use advanced mode check box, and type in the following expression in the text field:
This defines two conditions on the City column of the input data to filter records that contain the
cities of Chicago and New York, and uses a logical OR to combine the two conditions so that reco
rds satisfying either condition will be accepted.
3. Press Ctrl+S to save the Job and press F6 to execute it.
1177
tFilterRow
As shown above, the result list of the previous scenario has been further filtered, and only the
records containing the cities of New York and Chicago are accepted.
1178
tFirebirdClose
tFirebirdClose
Closes a transaction with a Firebird database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1179
tFirebirdClose
Related scenarios
No scenario is available for the Standard version of this component yet.
1180
tFirebirdCommit
tFirebirdCommit
Commits a global transaction instead of doing so on every row or every batch, thus providing a gain in
performance.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.
Warning:
If you want to use a Row > Main connection to link
tFirebirdCommit to your Job, your data will be committed
row by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
1181
tFirebirdCommit
Related scenario
For tFirebirdCommit related scenario, see Inserting data in mother/daughter tables on page 2426
1182
tFirebirdConnection
tFirebirdConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.
1183
tFirebirdConnection
Advanced settings
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.
Usage
Related scenarios
For tFirebirdConnection related scenario, see tMysqlConnection on page 2425
1184
tFirebirdInput
tFirebirdInput
Executes a database query on a Firebird database with a strictly defined order which must correspond
to the schema definition then passes on the field list to the next component via a Main row link.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1185
tFirebirdInput
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
Advanced Settings
Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
1186
tFirebirdInput
Usage
Usage rule This component covers all possible SQL queries for FireBird
databases.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related topics, see:
1187
tFirebirdInput
See also related topic: Reading data from different MySQL databases using dynamically loaded
connection parameters on page 497.
1188
tFirebirdOutput
tFirebirdOutput
Executes the action defined on the table in a Firebird database and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.
tFirebirdOutput writes, updates, makes changes or suppresses entries in a database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1189
tFirebirdOutput
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.
Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You can
do that by clicking Edit Schema and selecting the check
box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings
view where you can simultaneously define primary keys for
the update and delete operations. To do that: Select the
Use field options check box and then in the Key in update
column, select the check boxes next to the column name on
which you want to base the update operation. Do the same
in the Key in delete column for the deletion operation.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
1190
tFirebirdOutput
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
1191
tFirebirdOutput
Use field options Select this check box to customize a request, especially
when there is double action on data.
Debug query mode Select this check box to display each step during processing
entries in a database.
Support null in "SQL WHERE" statement Select this check box if you want to deal with the Null
values contained in a DB table.
Note:
Make sure the Nullable check box is selected for the c
orresponding columns in the schema.
Use Batch Select this check box to activate the batch mode for data
processing.
Note:
This check box is available only when you have selected
the Insert, Update, or Delete option in the Action on data
option.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
1192
tFirebirdOutput
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a Firebird database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.
1193
tFirebirdRollback
tFirebirdRollback
Cancels the transation committed in the connected Firebird database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
1194
tFirebirdRollback
Related scenario
For tFirebirdRollback related scenario, see Rollback from inserting data in mother/daughter tables on
page 2429.
1195
tFirebirdRow
tFirebirdRow
Executes the stated SQL query on the specified Firebird database.
Depending on the nature of the query and the database, tFirebirdRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL
statements.
tFirebirdRow is the specific component for this database query. The row suffix means the component
implements a flow in the job design although it doesn't provide output.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1196
tFirebirdRow
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
1197
tFirebirdRow
Advanced settings
Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
1198
tFirebirdRow
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all possible SQL queries.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related topics, see:
• Combining two flows for selective output on page 2503
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.
1199
tFixedFlowInput
tFixedFlowInput
Generates a fixed flow from internal variables.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either built-in or remote in the
Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Mode From the three options, select the mode that you want to
use.
Use Single Table : Enter the data that you want to generate
in the relevant value field.
Use Inline Table : Add the row(s) that you want to generate.
Use Inline Content : Enter the data that you want to
generate, separated by the separators that you have already
defined in the Row and Field Separator fields.
1200
tFixedFlowInput
Advanced settings
tStat Catcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Related scenarios
For related scenarios, see:
• Buffering output data on the webapp server on page 421.
• Iterating on a DB table and listing its column names on page 2419.
• Filtering a list of names using simple conditions on page 1173.
1201
tFlowMeter
tFlowMeter
Counts the number of rows processed in the defined flow, so this number can be caught by the
tFlowMeterCatcher component for logging purposes.
Basic settings
Use input connection name as label Select this check box to reuse the name given to the input
main row flow as label in the logged data.
Mode Select the type of values for the data measured: Absolute:
the actual number of rows is logged
Relative: a ratio (%) of the number of rows is logged. When
this option is selected, a Connections List shows to let you
select a reference connection.
Global Variables
Usage
If you have a need of log, statistics and other measurement of your data flows, see Talend Studio User
Guide.
1202
tFlowMeter
Related scenario
For related scenario, see Catching flow metrics from a Job on page 1205
1203
tFlowMeterCatcher
tFlowMeterCatcher
Operates as a log function triggered by the use of a tFlowMeter component in the Job.
Based on a defined schema, the tFlowMeterCatcher catches the processing volumetric from the
tFlowMeter component and passes them on to the output component.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the fields to be
processed and passed on to the next component. In this
particular case, the schema is read-only, as this component
gathers standard log information including:
Pid: Process ID
1204
tFlowMeterCatcher
Global Variables
Usage
• Drop the following components from the Palette to the design workspace: tMysqlInput,
tFlowMeter (x2), tMap, tLogRow, tFlowMeterCatcher and tFileOutputDelimited.
• Link components using row main connections and click on the label to give consistent name
throughout the Job, such as US_States from the input component and filtered_states for the output
from the tMap component, for example.
• Link the tFlowMeterCatcher to the tFileOutputDelimited component using a row main link also as
data is passed.
1205
tFlowMeterCatcher
• On the tMysqlInput Component view, configure the connection properties as Repository, if the
table metadata are stored in the Repository. Or else, set the Type as Built-in and configure
manually the connection and schema details if they are built-in for this Job.
• The 50 States of the USA are recorded in the table states. In order for all 50 entries of the table to
get selected, the query to run onto the Mysql database is as follows:
select * from states.
• Select the relevant encoding type on the Advanced settings vertical tab.
• Then select the following component which is a tFlowMeter and set its properties.
• Select the check box Use input connection name as label, in order to reuse the label you chose in
the log output file (tFileOutputDelimited).
• The mode is Absolute as there is no reference flow to meter against, also no Threshold is to be set
for this example.
• Then launch the tMap editor to set the filtering properties.
• For this use case, drag and drop the ID and State columns from the Input area of the tMap towards
the Output area. No variable is used in this example.
1206
tFlowMeterCatcher
• On the Output flow area (labelled filtered_states in this example), click the arrow & plus button to
activate the expression filter field.
• Drag the State column from the Input area (row2) towards the expression filter field and type in
the rest of the expression in order to filter the state labels starting with the letter M. The final
expression looks like: row2.State.startsWith("M")
• Click OK to validate the setting.
• Then select the second tFlowMeter component and set its properties.
• Select the Append check box in order to log all tFlowMeter measures.
1207
tFlowMeterCatcher
The Run view shows the filtered state labels as defined in the Job.
In the delimited csv file, the number of rows shown in column count varies between tFlowMeter1
and tFlowMeter2 as the filtering has then been carried out. The reference column shows also this
difference.
1208
tFlowToIterate
tFlowToIterate
Reads data line by line from the input flow and stores the data entries in iterative global variables.
Basic settings
Use the default (key, value) in global variables When selected, the system uses the default value of the
global variable in the current Job.
Customize key: Type in a name for the new global variable. Press Ctrl
+Space to access all available variables either global or
user-defined.
Global Variables
Usage
1209
tFlowToIterate
Note:
The File Name field is mandatory.
1210
tFlowToIterate
The input file used in this scenario is Customers.txt. It is a text file that contains a list of names
of three other simple text files: Name.txt, E-mail.txt and Address.txt. The first text file, Name.txt,
is made of one column holding customers' names. The second text file, E-mail.txt, is made of
one column holding customers' e-mail addresses. The third text file, Address.txt, is made of one
column holding customers' postal addresses.
Fill in all other fields as needed. For more information, see tFileInputDelimited Standard
properties on page 1015. In this scenario, the header and the footer are not set and there is no
limit for the number of processed rows.
3. Click Edit schema to describe the data structure of this input file. In this scenario, the schema is
made of one column, FileName.
Click the plus button to add new parameter lines and define your variables, and click in
the key cell to enter the variable name as desired. In this scenario, one variable is defined:
"Name_of_File".
Alternatively, you can select the Use the default (key, value) in global variables check box to use
the default in global variables.
5. Double-click the second tFileInputDelimited to display its Basic settings view.
1211
tFlowToIterate
In the File name field, enter the directory of the files to be read, and then press Ctrl+Space to
select the global variable "Name_of_File". In this scenario, the syntax is as follows:
"C:/scenario/flow_to_iterate/"+((String)globalMap.get("Name_of_File"))
Click Edit schema to define the schema column name. In this scenario, it is RowContent.
Fill in all other fields as needed. For more information, see tFileInputDelimited Standard
properties on page 1015.
6. In the design workspace, select the last component, tLogRow, and click the Component tab to
define its basic settings.
Define your settings as needed. For more information, see tLogRow Standard properties on page
1977.
1212
tFlowToIterate
Results
Customers' names, customers' e-mails, and customers' postal addresses appear on the console
preceded by the schema column name.
1213
tForeach
tForeach
Creates a loop on a list for an iterate link.
Basic settings
Values Use the [+] button to add rows to the Values table. Then
click on the fields to enter the list values to be iterated
upon, between double quotation marks.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at a component
level.
Global Variables
Usage
1214
tForeach
Results
2. Click the [+] button to add as many rows to the Values list as required.
3. Click on the Value fields to enter the list values, between double quotation marks.
4. Double-click tJava to open its Basic settings view:
1215
tForeach
Results
The tJava run view displays the list values retrieved from tForeach, each one suffixed with _out:
1216
tFTPClose
tFTPClose
Closes an active FTP connection to release the occupied resources.
Basic settings
Component list Select the component that opens the connection you need
to close from the list.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Usage rule This component is more commonly used with other FTP
components, especially with the tFTPConnection compon
ent.
Related scenarios
• Listing and getting files/folders on an FTP directory on page 1230
• Putting files onto an FTP server on page 1246
• Renaming a file located on an FTP server on page 1253
1217
tFTPConnection
tFTPConnection
Opens an FTP connection to transfer files in a single transaction.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.
Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.
Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
1218
tFTPConnection
FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.
Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.
Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.
Connection mode Select the connection mode from the list, either Passive or
Active.
Advanced settings
Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.
Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
1219
tFTPConnection
Global Variables
Usage
Related scenarios
• Listing and getting files/folders on an FTP directory on page 1230
• Putting files onto an FTP server on page 1246
• Renaming a file located on an FTP server on page 1253
1220
tFTPDelete
tFTPDelete
Deletes files or folders in a specified directory on an FTP server.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.
SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.
1221
tFTPDelete
Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.
Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.
FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.
Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.
Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.
Use Perl5 Regex Expression as Filemask Select this check box to use Perl5 regular expressions in the
or Files field as file filters. This is useful when the name of
the file to be processed contains special characters such as
parentheses.
For more information about Perl5 regular expression syntax,
see Perl5 Regular Expression Syntax.
Target Type Select the type of the target to be deleted, either File or
Directory.
1222
tFTPDelete
Connection mode Select the connection mode from the list, either Passive or
Active.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any error and continue the Job
execution process.
Advanced settings
Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.
Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.
Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
1223
tFTPDelete
Usage
Related scenario
No scenario is available for this component yet.
1224
tFTPFileExist
tFTPFileExist
Checks if a file or a directory exists on an FTP server.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Remote directory The remote directory under which the file or the directory
will be checked.
Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.
Target Type Select the type of the target to be checked, either File or
Directory.
File Name The name of the file or the path to the file to be checked.
1225
tFTPFileExist
Directory Name The name of the directory or the path to the directory to be
checked.
This property is available only when Directory is
selected from the Target Type list.
SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.
Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.
Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.
FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.
Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.
Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.
Connection mode Select the connection mode from the list, either Passive or
Active.
1226
tFTPFileExist
Advanced settings
Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.
Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.
Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
No scenario is available for this component yet.
1227
tFTPFileList
tFTPFileList
Lists all files and folders directly under a specified directory based on a filemask pattern.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Remote directory The remote directory where the files and folders to be listed
are located.
Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.
File detail Select this check box to list the details of each file/folder.
The informative details include the file/folder permissions,
the name of the author, the name of the group of users
that have read/write permissions, the file size, and the last
modification date.
1228
tFTPFileList
SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.
Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.
Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.
FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.
Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.
Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.
Connection mode Select the connection mode from the list, either Passive or
Active.
1229
tFTPFileList
Advanced settings
Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.
Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.
Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
CURRENT_FILE The current file name. This is a Flow variable and it returns
a string.
CURRENT_FILEPATH The current file path. This is a Flow variable and it returns a
string.
Usage
1230
tFTPFileList
1231
tFTPFileList
Procedure
1. Create a new Job and add a tFTPConnection component, a tFTPFileList component, a
tIterateToFlow component, a tLogRow component, a tFTPGet component, and a tFTPClose
component by typing their names in the design workspace or dropping them from the Palette.
2. Link the tFTPFileList component to the tIterateToFlow component using a Row > Iterate
connection.
3. Link the tIterateToFlow component to the tLogRow component using a Row > Main connection.
4. Link the tFTPConnection component to the tFTPFileList component using a Trigger > OnSubjobOk
connection.
5. Do the same to link the tFTPFileList component to the tFTPGet component, and the tFTPGet
component to the tFTPClose component.
Procedure
1. Double-click the tFTPConnection component to open its Basic settings view.
2. In the Host and Port fields, enter the FTP server IP address and the listening port number
respectively.
3. In the Username and Password fields, enter the authentication details.
Procedure
1. Double-click the tFTPFileList component to open its Basic settings view.
2. Specify the connection details required to access the FTP server. In this example, select the Use
an existing connection check box and from the Component list drop-down list displayed, select
the connection component to reuse its connection details you have already defined.
1232
tFTPFileList
3. In the Remote directory field, specify the FTP server directory on which the files and folders will
be iterated. In this example, it is /, which means the root directory of the FTP server.
4. Clear the Move to the current directory check box.
5. Double-click the tIterateToFlow component to open its Basic settings view.
6. Click the button next to Edit schema to open the schema dialog box.
7. Click the button to add two String type columns filename and filepath that will hold the names
and paths of the files to be iterated respectively. When done, click OK to close the dialog box.
8. In the Mapping table, set the values for the filename and filepath columns. In this example, the
global variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILE")) for filename and the global
variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILEPATH")) for filepath.
Note that you can fill the values by pressing Ctrl + Space to access the global variables list and
then selecting tFTPFileList_1_CURRENT_FILE and tFTPFileList_1_CURRENT_FILEPATH from the list.
9. Double-click the tLogRow component to open its Basic settings view, and then select Table (print
values in cells of a table) in the Mode area for better readability of the result.
1233
tFTPFileList
Procedure
1. Double-click the tFTPGet component to open its Basic settings view.
2. Specify the connection details required to access the FTP server. In this example, select the Use
an existing connection check box and from the Component list drop-down list displayed, select
the connection component to reuse its connection details you have already defined.
3. In the Local directory field, specify the local directory to which the files and folders will be
downloaded. In this example, it is D:/FtpDownloads.
4. In the Remote directory field, specify the FTP server directory under which the files and folders
will be downloaded. In this example, it is /, which means the root directory of the FTP server.
5. In the Files table, click the [+] button to add a line and in the Filemask column field, enter *.txt
between double quotation marks to get only the text files on the FTP directory to the local
directory.
Procedure
1. Double-click the tFTPClose component to open its Basic settings view.
1234
tFTPFileList
2. From the Component list drop-down list, select the tFTPConnection component that opens the
connection you need to close. In this example, only one tFTPConnection component is used and it
is selected by default.
Executing the Job to list and get files/folders on the FTP directory
After setting up the Job and configuring the components used in the Job for listing and getting files/
folders on the FTP directory, you can then execute the Job and verify the Job execution result.
Procedure
1. Press Ctrl + S to save the Job.
2. Press F6 to execute the Job.
As shown above, the names and paths of the files and folders on the FTP server root directory are
displayed on the console, and only the text files are downloaded to the specified local directory.
1235
tFTPFileProperties
tFTPFileProperties
Retrieves the properties of a specified file on an FTP server.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. It describes the
main properties of the specified file. You can click the [...]
button next to Edit schema to view the predefined schema
which contains the following fields:
• abs_path: the absolute path of the file.
• dirname: the directory of the file.
• basename: the name of the file.
• size: the file size in bytes.
• mtime: the timestamp indicating when the file was last
modified, in milliseconds that have elapsed since the
Unix epoch (00:00:00 UTC, Jan 1, 1970).
• mtime_string: the date and time the file was last
modified.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
1236
tFTPFileProperties
Remote directory The path to the directory where the file is available.
File The name of the file or the path to the file whose properties
will be retrieved.
Transfer mode Select the transfer mode from the list, either asciibinary.
SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.
Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.
Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.
FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.
Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.
Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.
Connection mode Select the connection mode from the list, either Passive or
Active.
1237
tFTPFileProperties
Calculate MD5 Hash Select this check box to check the file's MD5.
Advanced settings
Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.
Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.
Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
Displaying the properties of a processed file on page 1159
1238
tFTPGet
tFTPGet
Downloads files to a local directory from an FTP directory.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Local directory The local directory in which downloaded files will be saved.
Remote directory The FTP directory from which files will be downloaded.
Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.
Transfer mode Select the transfer mode from the list, either asciibinary.
Overwrite file Select the action to be performed when the file already
exists.
• never: Never overwrite the file.
1239
tFTPGet
Append Select this check box to append data at the end of the file in
order to avoid overwriting data.
SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.
Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.
Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.
FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.
Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.
Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.
1240
tFTPGet
Use Perl5 Regex Expression as Filemask Select this check box to use Perl5 regular expressions in the
or Files field as file filters. This is useful when the name of
the file to be processed contains special characters such as
parentheses.
For more information about Perl5 regular expression syntax,
see Perl5 Regular Expression Syntax.
Connection mode Select the connection mode from the list, either Passive or
Active.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any error and continue the Job
execution process.
Advanced settings
Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.
Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.
Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.
Print message Select this check box to display the list of files downloaded
on the console.
Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
1241
tFTPGet
Global Variables
Usage
Related scenario
Listing and getting files/folders on an FTP directory on page 1230
1242
tFTPPut
tFTPPut
Uploads files from a local directory to an FTP directory.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Local directory The local directory from which the files will be uploaded to
the FTP server.
Remote directory The FTP directory where the uploaded files will be placed.
Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.
Transfer mode Select the transfer mode from the list, either asciibinary.
Overwrite file Select the action to be performed when the file already
exists.
1243
tFTPPut
Append Select this check box to append data at the end of the file in
order to avoid overwriting data.
SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.
Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.
Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.
FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.
Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.
Security Mode Select the security mode from the list, either Implicit or
Explicit.
1244
tFTPPut
Use Perl5 Regex Expression as Filemask Select this check box to use Perl5 regular expressions in the
or Files field as file filters. This is useful when the name of
the file to be processed contains special characters such as
parentheses.
For more information about Perl5 regular expression syntax,
see Perl5 Regular Expression Syntax.
Use Perl5 Regex Expression as Filemask Select this check box to use Perl5 regular expressions in the
or Files field as file filters. This is useful when the name of
the file to be processed contains special characters such as
parentheses.
For more information about Perl5 regular expression syntax,
see Perl5 Regular Expression Syntax.
Connection mode Select the connection mode from the list, either Passive or
Active.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any error and continue the Job
execution process.
Advanced settings
Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.
Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.
Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.
Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
1245
tFTPPut
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
CURRENT_FILE_EXISTS The result of whether the current file exists. This is a Flow
variable and it returns a boolean.
Usage
1246
tFTPPut
Procedure
1. Create a new Job and add a tFTPConnection component, a tFTPPut component, and a tFTPClose
component by typing their names in the design workspace or dropping them from the Palette.
2. Link the tFTPConnection component to the tFTPPut component using a Trigger > OnSubjobOk
connection.
3. Link the tFTPPut component to the tFTPClose component using a Trigger > OnSubjobOk
connection.
Procedure
1. Double-click the tFTPConnection component to open its Basic settings view.
2. In the Host and Port fields, enter the FTP server IP address and the listening port number
respectively.
3. In the Username and Password fields, enter the authentication details.
4. From the Connection Mode drop-down list, select the FTP connection mode you want to use,
Active in this example.
Procedure
1. Double-click the tFTPPut component to open its Basic settings view.
1247
tFTPPut
2. Specify the connection details required to access the FTP server. In this example, select the Use
an existing connection check box and from the Component list drop-down list displayed, select
the connection component to reuse its connection details you have already defined.
3. In the Local directory field, specify the local directory that contains the files to be put onto the
FTP server. In this example, it is D:/components.
4. In the Remote directory field, specify the FTP server directory onto which the files will be put. In
this example, it is /, which means the root directory of the FTP server.
5. Clear the Move to the current directory check box.
6. In the Files table, click twice the [+] button to add two lines, and in the two Filemask column
fields, enter *.txt and *.png respectively, which means only the text and png files in the specified
local directory will be put onto the FTP server root directory.
Procedure
1. Double-click the tFTPClose component to open its Basic settings view.
2. From the Component list drop-down list, select the tFTPConnection component that opens the
connection you need to close. In this example, only one tFTPConnection component is used and it
is selected by default.
1248
tFTPPut
Procedure
1. Press Ctrl + S to save the Job and then F6 to execute the Job.
2. Connect to the FTP server to verify the result.
As shown above, only the text and png files in the local directory are put onto the FTP server.
1249
tFTPRename
tFTPRename
Renames files in an FTP directory.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Remote directory The path to the FTP directory where the files to be renamed
are available.
Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.
Overwrite file Select the action to be performed when the file already
exists.
• never: Never overwrite the file.
• always: Always overwrite the file.
1250
tFTPRename
• size different or: Overwrite the file when the file size is
different.
SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.
Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.
Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.
FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.
Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.
Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.
Connection mode Select the connection mode from the list, either Passive or
Active.
1251
tFTPRename
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any error and continue the Job
execution process.
Advanced settings
Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.
Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.
Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
1252
tFTPRename
Procedure
1. Create a new Job and add a tFTPConnection component, a tFTPRename component, and a
tFTPClose component by typing their names in the design workspace or dropping them from the
Palette.
2. Link the tFTPConnection component to the tFTPRename component using a Trigger >
OnSubjobOk connection.
3. Link the tFTPRename component to the tFTPClose component using a Trigger > OnSubjobOk
connection.
1253
tFTPRename
Procedure
1. Double-click the tFTPConnection component to open its Basic settings view.
2. In the Host and Port fields, enter the FTP server IP address and the listening port number
respectively.
3. In the Username and Password fields, enter the authentication details.
Procedure
1. Double-click the tFTPRename component to open its Basic settings view.
2. Specify the connection details required to access the FTP server. In this example, select the Use
an existing connection check box and from the Component list drop-down list displayed, select
the connection component to reuse its connection details you have already defined.
3. In the Remote directory field, enter the directory on the FTP server where the file to be renamed
exists. In this example, it is /movies.
4. Clear the Move to the current directory check box.
5. In the Files table, click the [+] button to add a line, and then enter the existing file name in the
Filemask column field and the new file name in the New name column field. In this example, they
are movies.json and action_movies.json respectively.
Procedure
1. Double-click the tFTPClose component to open its Basic settings view.
1254
tFTPRename
2. From the Component list drop-down list, select the tFTPConnection component that opens the
connection you need to close. In this example, only one tFTPConnection component is used and it
is selected by default.
Procedure
1. Press Ctrl + S to save the Job and then F6 to execute the Job.
2. Connect to the FTP server to verify the result.
As shown above, the file on the FTP server has been renamed from movies.json to action_movies.
json.
1255
tFTPTruncate
tFTPTruncate
Truncates files in an FTP directory.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Remote directory The path to the FTP directory in which the files will be
truncated.
Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.
SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.
1256
tFTPTruncate
Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.
Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.
FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.
Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.
Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.
Use Perl5 Regex Expression as Filemask Select this check box to use Perl5 regular expressions in the
or Files field as file filters. This is useful when the name of
the file to be processed contains special characters such as
parentheses.
For more information about Perl5 regular expression syntax,
see Perl5 Regular Expression Syntax.
Connection mode Select the connection mode from the list, either Passive or
Active.
1257
tFTPTruncate
Advanced settings
Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.
Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.
Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.
Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
No scenario is available for this component yet.
1258
tFuzzyMatch
tFuzzyMatch
Compares a column from the main flow with a reference column from the lookup flow and outputs
the main flow data displaying the distance.
Basic settings
Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Two read-only columns, Value and Match are added to the
output schema automatically.
1259
tFuzzyMatch
Matching column Select the column of the main flow that needs to be
checked against the reference (lookup) key column
Unique matching Select this check box if you want to get the best match
possible, in case several matches are available.
Matching item separator In case several matches are available, all of them are
displayed unless the unique match box is selected. Define
the delimiter between all matches.
Advanced settings
tStatCatcher Select this check box to collect log data at the component level.
Statistics
Global Variables
Usage
1260
tFuzzyMatch
Warning:
Make sure the reference column is set as key column in the schema of the lookup flow.
1261
tFuzzyMatch
4. Double-click the tFuzzyMatch component to open its Basic settings view, and check its schema.
The Schema should match the Main input flow schema in order for the main flow to be checked
against the reference.
Note that two columns, Value and Matching, are added to the output schema. These are standard
matching information and are read-only.
5. Select the method to be used to check the incoming data. In this scenario, Levenshtein is the
Matching type to be used.
6. Then set the distance. In this method, the distance is the number of char changes (insertion,
deletion or substitution) that needs to be carried out in order for the entry to fully match the
reference.
In this use case, we set both the minimum distance and the maximum distance to 0. This means
only the exact matches will be output.
7. Also, clear the Case sensitive check box.
8. Check that the matching column and look up column are correctly selected.
9. Leave the other parameters as default.
1262
tFuzzyMatch
Results
As the edit distance has been set to 0 (min and max), the output shows the result of a regular join
between the main flow and the lookup (reference) flow, hence only full matches with Value of 0 are
displayed.
A more obvious example is with a minimum distance of 1 and a maximum distance of 2, see
Procedure on page 1263
Procedure
Procedure
1. In the Component view of the tFuzzyMatch, change the minimum distance from 0 to 1. This
excludes straight away the exact matches (which would show a distance of 0).
2. Change also the maximum distance to 2. The output will provide all matching entries showing a
discrepancy of 2 characters at most.
1263
tFuzzyMatch
3. Make sure the Matching item separator is defined, as several references might be matching the
main flow entry.
4. Save the new Job and press F6 to run it.
As the edit distance has been set to 2, some entries of the main flow match more than one
reference entry.
Results
You can also use another method, the metaphone, to assess the distance between the main flow and
the reference, which will be described in the next scenario.
Procedure
Procedure
1. Change the Matching type to Metaphone. There is no minimum nor maximum distance to set as
the matching method is based on the discrepancies with the phonetics of the reference.
2. Save the Job and press F6. The phonetics value is displayed along with the possible matches.
1264
tFuzzyMatch
1265
tGoogleDataprocManage
tGoogleDataprocManage
Creates or deletes a Dataproc cluster in the Global region on Google Cloud Platform.
Basic settings
Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or visit
Google Cloud Platform Auth Guide.
Region From this drop-down list, select the Google Cloud region to
be used.
Instance configuration Enter the parameters to determine how many masters and
workers to be used by the Dataproc cluster to be created
and the performance of these masters and workers.
1266
tGoogleDataprocManage
Advanced settings
Wait for cluster Select this check box to keep this component running until the cluster is completely set up.
ready
When you clear this check box, this component stops running immediately after sending the
command to create.
Master disk size Enter a number without quotation marks to determine the size of the disk of each master instance.
Master local SSD Enter a number without quotation marks to determine the number of local solid-state drive (SSD)
storage devices to be added to each master instance.
According to Google, these local SSDs are suitable only for temporary storage such as caches,
processing space or low value data. It is recommended to store important data to durable storage
options of Google. For further information about the Google storage options, see Durable storage
options.
Worker disk size Enter a number without quotation marks to determine the size of the disk of each worker instance.
Worker local SSD Enter a number without quotation marks to determine the number of local solid-state drive (SSD)
storage devices to be added to each worker instance.
According to Google, these local SSDs are suitable only for temporary storage such as caches,
processing space or low value data. It is recommended to store important data to durable storage
options of Google. For further information about the Google storage options, see Durable storage
options.
Network or Select either check box to use a Google Compute Engine network or subnetwork for the cluster to be
Subnetwork created to enable intra-cluster communications.
As Google does not allow network and subnetwork to be used concurrently, selecting one check box
hides the other check box.
For further information about Google Dataproc cluster network configuration, see Dataproc Network.
Initialization action In this table, select the initialization actions that are available in the shared bucket on Google Cloud
Storage to run on all the nodes in your Dataproc cluster immediately after this cluster is set up.
If you need to use custom initialization scripts, upload them to this shared Google bucket so that
tGoogleDataprocManage can read them.
• In the Executable file column, enter the Google Cloud Storage URI to these scripts to be used,
for example gs://dataproc-initialization-actions/MyScript
• In the Executable timeout column, enter the amount of time within double quotation marks
to determine the duration of the execution. If the executable is not completed at the end of
this timeout, an explanatory error message is returned. The value is a string with up to nine
fractional digits, for example, "3.5s" for 3.5 seconds.
For further information about this shared bucket and the initialization actions, see Initialization
actions.
tStatCatcher Select this check box to collect log data at the component level.
Statistics
Usage
1267
tGoogleDriveConnection
tGoogleDriveConnection
Opens a Google Drive connection that can be reused by other Google Drive components.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.
1268
tGoogleDriveConnection
Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.
Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.
Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.
Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.
Advanced settings
DataStore Path The path to the credential file that stores the refresh token.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
1269
tGoogleDriveConnection
Usage
Usage rule This component is more commonly used with other Google
Drive components. In a Job design, it is usually used to
open a Google Drive connection that can be reused by other
Google Drive components.
Procedure
1. Go to Google API Console and select an existing project or create a new one. In this example, we
create a new project TalendProject.
2. Go to the Library page and in the right panel, find Google Drive API and enable the Google Drive
API that allows you to access resources from Google Drive.
1270
tGoogleDriveConnection
3. Go to the Credentials page, click OAuth consent screen in the right panel and set a product name
in the Product name shown to users field. In this example, it is TalendProduct. When done,
click Save.
1271
tGoogleDriveConnection
4. Click Create credentials > OAuth client ID, and in the Create client ID page, create a new client ID
TalendApplication with Application type set to Other.
1272
tGoogleDriveConnection
5. Click Create. You will be shown your client ID and client secret that can be used by Google Drive
components and metadata wizard to access Google Drive using the OAuth method Installed
Application (Id & Secret).
1273
tGoogleDriveConnection
Procedure
1. Go to Google API Console.
2. Go to the Credentials page.
3. Click the Download JSON button to download the client secret JSON file and securely store it in a
local folder. This JSON file can then be used by Google Drive components and metadata wizard to
access Google Drive via the OAuth method Installed Application (JSON).
Procedure
1. Go to Google API Console.
2. Open the Service accounts page. If prompted, select your project.
1274
tGoogleDriveConnection
1275
tGoogleDriveConnection
5. Click Create. In the pop-up window, choose a folder and click Save to store your service account
JSON file securely. This JSON file can then be used by Google Drive components and metadata
wizard to access Google Drive via the OAuth method Service Account.
Procedure
1. Go to Google Developers OAuth Playground.
2. Click OAuth 2.0 Configuration and select Use your own OAuth credentials check box, enter the
OAuth client ID and client secret you have already created in the OAuth Client ID and OAuth
Client secret fields respectively.
1276
tGoogleDriveConnection
1277
tGoogleDriveConnection
4. In OAuth 2.0 Playground Step 2, click Exchange authorization code for tokens to generate the
OAuth access token.
The OAuth access token is displayed on the right panel as shown in below figure. It can be used
by Google Drive components and metadata wizard to access Google Drive via the OAuth method
Access Token.
1278
tGoogleDriveConnection
Note that the access token expires in every 3600 seconds. You can click Refresh access token in
OAuth 2.0 Playground Step 2 to refresh it.
Related scenario
Managing files with Google Drive on page 1297
1279
tGoogleDriveCopy
tGoogleDriveCopy
Creates a copy of a file/folder in Google Drive.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Connection Component Select the component that opens the database connection
to be reused by this component.
OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.
1280
tGoogleDriveCopy
Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.
Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.
Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.
Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.
Source Access Mode Select the method by which the source file/folder is
specified, either by Name or by Id.
Destination Folder Name The name or ID of the destination folder in which the copy
of the source file/folder will be saved.
Destination Access Mode Select the method by which the destination folder is
specified, either by Name or by Id.
Rename (set new title) Select this check box to rename the copy of the file/folder
in the destination folder. In the Destination Name field
displayed, enter the name for the file/folder after being
copied to the destination folder.
1281
tGoogleDriveCopy
Remove Source File Select this check box to remove the source file after it is
copied to the destination folder.
This property is available only when File is selected from
the Copy Mode drop-down list.
Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. You can click
the [...] button next to Edit schema to view the predefined
schema which contains the following fields:
• sourceId: The ID of the source file/folder.
• destinationId: The ID of the destination file/folder.
Advanced settings
DataStore Path The path to the credential file that stores the refresh token.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
Managing files with Google Drive on page 1297
1282
tGoogleDriveCreate
tGoogleDriveCreate
Creates a new folder in Google Drive.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Connection Component Select the component that opens the database connection
to be reused by this component.
OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.
1283
tGoogleDriveCreate
Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.
Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.
Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.
Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.
Parent Folder The name or ID of the parent folder in which a new folder
will be created.
Access Method Select the method by which the parent folder is specified,
either by Name or by Id.
Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. You can click
the [...] button next to Edit schema to view the predefined
schema which contains the following fields:
• parentFolderId: the ID of the parent folder.
• newFolderId: the ID of the new folder.
Advanced settings
DataStore Path The path to the credential file that stores the refresh token.
1284
tGoogleDriveCreate
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
Managing files with Google Drive on page 1297
1285
tGoogleDriveDelete
tGoogleDriveDelete
Deletes a file/folder in Google Drive.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Connection Component Select the component that opens the database connection
to be reused by this component.
OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.
1286
tGoogleDriveDelete
Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.
Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.
Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.
Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.
Use Trash Select this check box to move the file/folder to be deleted
to the trash.
Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. You can click
the [...] button next to Edit schema to view the predefined
schema with only one field named fileId which describes
the ID of the file/folder.
Advanced settings
DataStore Path The path to the credential file that stores the refresh token.
1287
tGoogleDriveDelete
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
No scenario is available for this component yet.
1288
tGoogleDriveGet
tGoogleDriveGet
Gets a file's content and downloads the file to a local directory.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Connection Component Select the component that opens the database connection
to be reused by this component.
OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.
1289
tGoogleDriveGet
Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.
Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.
Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.
Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.
Save as File Select this check box to save the file to a local directory.
In the Save to field displayed, browse to or enter the path
where you want to save the file to be downloaded.
Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. You can click
the [...] button next to Edit schema to view the predefined
schema with only one field named content which describes
the content of the file to be downloaded.
Advanced settings
DataStore Path The path to the credential file that stores the refresh token.
1290
tGoogleDriveGet
Export Google Doc as Select the type for the Google Doc to be exported.
Export Google Draw as Select the type for the Google Draw to be exported.
Export Google Presentation as Select the type for the Google Presentation to be exported.
Export Google Spreadsheet as Select the type for the Google Spreadsheet to be exported.
Add extension Select this check box to add extension to the exported file.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
No scenario is available for this component yet.
1291
tGoogleDriveList
tGoogleDriveList
Lists all files, or folders, or both files and folders in a specified Google Drive folder.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Connection Component Select the component that opens the database connection
to be reused by this component.
OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.
1292
tGoogleDriveList
Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.
Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.
Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.
Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.
Folder Name The name or ID of the folder in which the files/folders will
be listed.
Access Method Select the method by which the folder is specified, either by
Name or by Id.
Include SubDirectories Select this check box to list also the files/folders in the
subdirectories.
Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. You can click
the [...] button next to Edit schema to view the predefined
schema which contains the following fields:
• id: The ID of the file/folder.
• name: The name of the file/folder.
1293
tGoogleDriveList
Advanced settings
DataStore Path The path to the credential file that stores the refresh token.
Include trashed files Select this check box to also take into account files and
folders that have been removed from the specified path.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
Managing files with Google Drive on page 1297
1294
tGoogleDrivePut
tGoogleDrivePut
Uploads data from a data flow or a local file to Google Drive.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Connection Component Select the component that opens the database connection
to be reused by this component.
OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.
1295
tGoogleDrivePut
Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.
Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.
Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.
Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.
File Name The name for the file after being uploaded.
Destination Folder The name or ID of the folder in which uploaded data will be
stored.
Replace if Existing Select this check box to overwrite any existing file with the
newly uploaded one.
Upload Mode Select one of the following upload modes from the drop-
down list:
• Upload Incoming content as File: Select this option
to upload data from an input flow of the preceding
component.
• Upload Local File: Select this option to upload data
from a local file. In the File field displayed, specify the
path to the file to be uploaded.
• Expose As OutputStream: Select this option to expose
output stream of this component, which can be used
by other components to write the file content. For
1296
tGoogleDrivePut
Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. You can click
the [...] button next to Edit schema to view the predefined
schema which contains the following fields:
• content: The content of the uploaded data.
• parentFolderId: The ID of the parent folder.
• fileId: The ID of the file.
Advanced settings
DataStore Path The path to the credential file that stores the refresh token.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
1297
tGoogleDrivePut
the two files to the new folder Talend Backup, and finally lists and displays all files and folders in
the root directory of Google Drive on the console.
1298
tGoogleDrivePut
2. Link the first tGoogleDrivePut component to the first tLogRow component using a Row > Main
connection.
3. Do the same to link the tFileInputRaw component to the second tGoogleDrivePut component,
the second tGoogleDrivePut component to the second tLogRow component, the tGoogleDriveCr
eate component to the third tLogRow component, the tGoogleDriveCopy component to the fourth
tLogRow component, the tGoogleDriveList component to the fifth tLogRow component.
4. Link the tGoogleDriveConnection component to the first tGoogleDrivePut component using a
Trigger > On Subjob Ok connection.
5. Do the same to link the first tGoogleDrivePut component to the tFileInputRaw component,
the tFileInputRaw component to the tGoogleDriveCreate component, the tGoogleDriveCreate
component to the tGoogleDriveCopy component, and the tGoogleDriveCopy component to the
tGoogleDriveList component.
1299
tGoogleDrivePut
Procedure
1. Double-click the tGoogleDriveConnection component to open its Basic settings view in the
Component tab.
2. In the Application Name field, enter the application name required by Google Drive to get access
to its API. In this example, it is TalendProject.
3. Select Installed Application (JSON) from the OAuth Method drop-down list.
4. In the Client Secret JSON field, specify the path to the client secret JSON file you have generated,
D:/client_secret.json in this example.
1300
tGoogleDrivePut
2. Select the component that will create the Google Drive connection from the Connection
Component drop-down list, tGoogleDriveConnection_1 in this example.
3. Select by Name from the Access Method drop-down list and in the Destination Folder field, enter
the name of the folder in which the file will be uploaded, Talend in this example.
Note: When accessing a Google Drive resource by its name, if the name matches more than one
resource, an error will be thrown because the resource cannot be identified precisely. In this
case, you can specify the Google Drive resource using a pseudo path hierarchy, like /Talend/
Documentation. This example specifies a folder named Documentation under the folder
Talend under the Google Drive root folder.
4. In the File Name field, enter the name for the file after being uploaded. In this example, it is
Talend Customers.csv.
5. Select Upload Local File from the Upload Mode drop-down list and in the File field, browse
to or enter the path to the file to be uploaded. In this example, it is D:/Downloads/Talend
Customers.csv.
6. Double-click the tFileInputRaw component and on its Basic settings view, select Read the
file as a bytes array in the Mode area and specify the path to the file whose content will
be uploaded in the Filename field, D:/Downloads/Talend Products.txt in this example.
7. Double-click the second tGoogleDrivePut component to open its Basic settings view in the
Component tab.
8. Repeat step 2 on page 1301 to step 3 on page 1301 to configure this component.
9. In the File Name field, enter the name for the file after being uploaded. In this example, it is
Talend Products.txt.
10. Select Upload Incoming content as File from the Upload Mode drop-down list.
1301
tGoogleDrivePut
2. Select the component that will create the Google Drive connection from the Connection
Component drop-down list, tGoogleDriveConnection_1 in this example.
3. In the Parent Folder field, enter the name of the folder in which a new folder will be created. In
this example, it is root.
4. In the New Folder Name field, enter the name of the folder to be created. In this example, it is
Talend Backup.
5. Double-click the third tLogRow component to open its Basic settings view in the Component tab.
6. In the Mode area, select Vertical (each row is a key/value list) for a better display of the results.
2. Select the component that will create the Google Drive connection from the Connection
Component drop-down list, tGoogleDriveConnection_1 in this example.
3. Select File from the Copy Mode drop-down list.
4. In the Source field, enter the name of the file to be copied. In this example, it is Talend
Customers.csv.
5. In the Destination Folder Name field, enter the name of the folder to which the file will be copied.
In this example, it is Talend Backup.
6. Select the Rename (set new title) check box and in the Destination Name field, enter a new
name for the file after being copied to the destination folder. In this example, it is Talend
Customers v1.0.csv.
1302
tGoogleDrivePut
7. Double-click the fourth tLogRow component to open its Basic settings view in the Component tab.
8. In the Mode area, select Vertical (each row is a key/value list) for a better display of the results.
2. Select the component that will create the Google Drive connection from the Connection
Component drop-down list, tGoogleDriveConnection_1 in this example.
3. In the Folder Name field, enter the name of the folder in which the files/folders will be listed. In
this example, it is the root directory of Google Drive and you can use the alias root to refer to it.
4. Select Both from the FileList Type drop-down list to list both files and folders in the root
directory.
5. Select the Include SubDirectories check box to list also the files/folders in the subdirectories.
6. Double-click the fifth tLogRow component to open its Basic settings view in the Component tab.
7. In the Mode area, select Vertical (each row is a key/value list) for a better display of the results.
1303
tGoogleDrivePut
1304
tGoogleDrivePut
As shown above, two files Talend Products.txt and Talend Customers.csv were
uploaded to the folder Talend, then a new folder Talend Backup was created in the root
folder and the file Talend Customers.csv was copied to the new folder and renamed to
Talend Customers v1.0.csv, and finally all files and folders in the root directory are listed
on the console.
1305
tGPGDecrypt
tGPGDecrypt
Calls the gpg -d command to decrypt a GnuPG-encrypted file and saves the decrypted file in the
specified directory.
Basic settings
No TTY Terminal Select this check box to speficy that no TTY terminal is
used by adding the --no-tty option to the decryption
command.
Advanced settings
tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.
Global Variables
Global Variables FILE: the name of the output file. This is a Flow variable and
it returns a string.
FILEPATH: the path of the output file. This is a Flow variable
and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
1306
tGPGDecrypt
Usage
1307
tGPGDecrypt
Warning:
If the file path contains accented characters, you will get an error message when running the
Job.
4. In the GPG binary path field, browse to the GPG command file.
5. In the Passphrase field, enter the passphrase used when encrypting the input file.
6. Double-click the tFileInputDelimited component to open its Component view and set its
properties:
7. In the File name/Stream field, define the path to the decrypted file, which is the output path you
have defined in the tGPGDecrypt component.
8. In the Header, Footer and Limit fields, define respectively the number of rows to be skipped in the
beginning of the file, at the end of the file and the number of rows to be processed.
9. Use a Built-In schema. This means that it is available for this Job only.
10. Click Edit schema and edit the schema for the component. Click twice the [+] button to add two
columns that you will call idState and labelState.
11. Click OK to validate your changes and close the editor.
1308
tGPGDecrypt
1309
tGPGDecrypt
Results
The specified file is decrypted and the defined number of rows of the decrypted file are printed on the
Run console.
1310
tGreenplumBulkExec
tGreenplumBulkExec
Improves performance when loading data in a Greenplum database.
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT
statement used to feed a database. These two steps are fused together in the tGreenplumOutp
utBulkExec component, detailed in a separate section. The advantage of using a two step process is
that it makes it possible to transform data before it is loaded in the database.
tGreenplumBulkExec performs an Insert action on the data.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1311
tGreenplumBulkExec
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Warning:
This file is located on the machine specified by the URI
in the Host field so it should be on the same machine as
the database server.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
1312
tGreenplumBulkExec
Advanced settings
Copy the OID for each row Retrieve the ID item for each row.
Contains a header line with the names of each column in th Specify that the table contains header.
e file
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
1313
tGreenplumBulkExec
Related scenarios
For more information about tGreenplumBulkExec, see:
• Inserting transformed data in MySQL database on page 2482.
• Inserting data in bulk in MySQL database on page 2489.
• Truncating and inserting file data into an Oracle database on page 2681.
1314
tGreenplumClose
tGreenplumClose
Closes a connection to the Greenplum database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1315
tGreenplumClose
Related scenarios
No scenario is available for the Standard version of this component yet.
1316
tGreenplumCommit
tGreenplumCommit
Commits global transaction in one go instead of repeating the operation for every row or every batch
and thus provides gain in performance.
tGreenplumCommit validates the data processed through the Job into the connected DB. This
component uses an unique connection.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.
Warning:
If you want to use a Row > Main connection to link
tGreenplumCommit to your Job, your data will be
committed row by row. In this case, do not select the Close
connection check box or your connection will be closed
before the end of your first row commit.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
1317
tGreenplumCommit
Related scenarios
For tGreenplumCommit related scenarios, see:
• Mapping data using a simple implicit join on page 686.
• Inserting data in mother/daughter tables on page 2426.
1318
tGreenplumConnection
tGreenplumConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tGreenplumConnection opens a connection to the database for a current transaction.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
1319
tGreenplumConnection
Advanced settings
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.
Usage
Related scenarios
For tGreenplumConnection related scenarios, see:
• Mapping data using a simple implicit join on page 686.
• tMysqlConnection on page 2425.
1320
tGreenplumGPLoad
tGreenplumGPLoad
Bulk loads data into a Greenplum table either from an existing data file, an input flow, or directly from
a data flow in streaming mode through a named-pipe.
tGreenplumGPLoad inserts data into a Greenplum database table using Greenplum's gpload utility.
Basic settings
Action on table On the table defined, you can perform one of the following
operations before loading the data:
None: No operation is carried out.
Clear table: The table content is deleted before the data is
loaded.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
exist.
Drop and create table: The table is removed and created
again.
Drop table if exists and create: The table is removed if it
already exists and created again.
1321
tGreenplumGPLoad
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries.
Merge: Updates or adds data to the table.
Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Merge operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to set
as primary key(s). To define the Update/Merge options,
select in the Match Column column the check boxes
corresponding to the column names that you want to use as
a base for the Update and Merge operations, and select in
the Update Column column the check boxes corresponding
to the column names that you want to update. To define
the Update condition, type in the condition that will be
used to update the data.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Data file Full path to the data file to be used. If this component is
used in standalone mode, this is the name of an existing
data file to be loaded into the database. If this component
is connected with an input flow, this is the name of the file
to be generated and written with the incoming data to later
be used with gpload to load into the database. This field is
hidden when the Use named-pipe check box is selected.
1322
tGreenplumGPLoad
Use named-pipe Select this check box to use a named-pipe. This option is
only applicable when the component is connected with an
input flow. When this check box is selected, no data file is
generated and the data is transferred to gpload through a
named-pipe. This option greatly improves performance in
both Linux and Windows.
Note:
This component on named-pipe mode uses a JNI
interface to create and write to a named-pipe on any
Windows platform. Therefore the path to the associated
JNI DLL must be configured inside the java library path.
The component comes with two DLLs for both 32 and
64 bit operating systems that are automatically provided
in the Studio with the component.
Named-pipe name Specify a name for the named-pipe to be used. Ensure that
the name entered is valid.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Use existing control file (YAML formatted) Select this check box to provide a control file to be used
with the gpload utility instead of specifying all the options
explicitly in the component. When this check box is
selected, Data file and the other gpload related options no
longer apply. Refer to Greenplum's gpload manual for de
tails on creating a control file.
Control file Enter the path to the control file to be used, between
double quotation marks, or click [...] and browse to the
control file. This option is passed on to the gpload utility
via the -f argument.
CSV mode Select this check box to include CSV specific parameters
such as Escape char and Text enclosure.
Warning:
This is gpload's delim argument. The default value is |. To
improve performance, use the default value.
Header (skips the first row of data file) Select this check box to skip the first row of the data file.
Additional options Set the gpload arguments in the corresponding table. Click
[+] as many times as required to add arguments to the
table. Click the Parameter field and choose among the
1323
tGreenplumGPLoad
1324
tGreenplumGPLoad
Log file Browse to or enter the access path to the log file in your d
irectory.
Specify gpload path Select this check box to specify the full path to the gpload
executable. You must check this option if the gpload path is
not specified in the PATH environment variable.
Full path to gpload executable Full path to the gpload executable on the machine in
use. It is advisable to specify the gpload path in the PATH
environment variable instead of selecting this option.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
1325
tGreenplumGPLoad
Related scenario
For a related use case, see Inserting data in bulk in MySQL database on page 2489.
1326
tGreenplumInput
tGreenplumInput
Reads a database and extracts fields based on a query.
tGreenplumInput executes a DB query with a strictly defined order which must correspond to the
schema definition and then it passes on the field list to the next component via a Main row link.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
1327
tGreenplumInput
Built-In: You create and store the schema locally for this
component only.
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.
Guess schema Click the Guess schema button to retrieve the table schema.
Advanced settings
Use cursor When selected, helps to decide the row set to work with at a
time and thus optimize performance.
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
1328
tGreenplumInput
Usage
Usage rule This component covers all possible SQL queries for FireBird
databases.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related topics, see:
• Mapping data using a simple implicit join on page 686.
See also related topic: Reading data from different MySQL databases using dynamically loaded
connection parameters on page 497.
1329
tGreenplumOutput
tGreenplumOutput
Executes the action defined on the table and/or on the data of a table, according to the input flow
from the previous component.
tGreenplumOutput writes, updates, modifies or deletes the data in a database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
1330
tGreenplumOutput
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.
1331
tGreenplumOutput
Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column , select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Use alternate schema Select this option to use a schema other than the one
specified by the component that establishes the database
1332
tGreenplumOutput
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
Use field options Select this check box to customize a request, especially
when there is double action on data.
Use Batch Select this check box to activate the batch mode for data
processing.
Note:
This check box is available only when you have selected
the Insert, Update, or Delete option in the Action on data
option.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
1333
tGreenplumOutput
Usage
Usage rule This component covers all possible SQL queries for
Greenplum databases. It allows you to carry out actions on
a table or on the data of a table in a Greenplum database.
It enables you to create a reject flow, with a Row > Rejects
link filtering the data in error. For a usage example, see
Retrieving data in error with a Reject link on page 2474.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related scenarios, see:
1334
tGreenplumOutput
1335
tGreenplumOutputBulk
tGreenplumOutputBulk
Prepares the file to be used as parameter in the INSERT query to feed the Greenplum database.
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT
operation used to feed a database. These two steps are fused together in the tGreenplumOutp
utBulkExec component, detailed in a separate section. The advantage of using a two step process is
that it makes it possible to transform data before it is loaded in the database.
Writes a file with columns based on the defined delimiter and the Greenplum standards
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Warning:
This file is generated on the local machine or a shared
folder on the LAN.
Append Select this check box to add the new rows at the end of the
records
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
1336
tGreenplumOutputBulk
Advanced settings
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
tStatCatcher statistics Select this check box to collect log data at the component
level.
Global Variables
1337
tGreenplumOutputBulk
Usage
Related scenarios
For use cases in relation with tGreenplumOutputBulk, see the following scenarios:
• Inserting transformed data in MySQL database on page 2482.
• Inserting data in bulk in MySQL database on page 2489.
1338
tGreenplumOutputBulkExec
tGreenplumOutputBulkExec
Provides performance gains during Insert operations to a Greenplum database.
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT
operation used to feed a database. These two steps are fused together in the tGreenplumOutp
utBulkExec component.
tGreenplumOutputBulkExec executes the action on the data provided.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
1339
tGreenplumOutputBulkExec
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted. You have the
possibility to rollback the operation.
Warning:
This file is generated on the machine specified by
the URI in the Host field so it should be on the same
machine as the database server.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
1340
tGreenplumOutputBulkExec
Advanced settings
Copy the OID for each row Retrieve the ID item for each row.
Contains a header line with the names of each column in th Specify that the table contains header.
e file
tStatCatcherStatistics Select this check box to collect log data at the component
level.
Usage
Related scenarios
For use cases in relation with tGreenplumOutputBulkExec, see the following scenarios:
• Inserting transformed data in MySQL database on page 2482.
• Inserting data in bulk in MySQL database on page 2489.
1341
tGreenplumRollback
tGreenplumRollback
Avoids to commit part of a transaction involuntarily.
tGreenplumRollback cancels the transaction committed in the connected DB.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
1342
tGreenplumRollback
Related scenarios
For tGreenplumRollback related scenario, see Rollback from inserting data in mother/daughter tables
on page 2429.
1343
tGreenplumRow
tGreenplumRow
Acts on the actual DB structure or on the data (although without handling data), depending on the
nature of the query and the database.
The SQLBuilder tool helps you write easily your SQL statements.
tGreenplumRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesn't provide output.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1344
tGreenplumRow
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.
1345
tGreenplumRow
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the represented by "?" in the SQL
instruction of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
1346
tGreenplumRow
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For a related scenario, see:
• Combining two flows for selective output on page 2503
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.
1347
tGreenplumSCD
tGreenplumSCD
Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table.
tGreenplumSCD reflects and tracks changes in a dedicated Greenplum SCD table.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1348
tGreenplumSCD
Table Name of the table to be written. Note that only one table
can be written at a time.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
SCD Editor The SCD editor helps to build and configure the data flow
for slowly changing dimension outputs.
For more information, see SCD management methodology
on page 2511.
Use memory saving Mode Select this check box to maximize system performance.
Source keys include Null Select this check box to allow the source key columns to
have Null values.
Warning:
Special attention should be paid to the uniqueness of the
source key(s) value when this option is selected.
1349
tGreenplumSCD
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
End date time details Specify the time value of the SCD end date time setting in
the format of HH:mm:ss. The default value for this field is
12:00:00.
This field appears only when SCD Type 2 is used and Fixed
year value is selected for creating the SCD end date.
Debug mode Select this check box to display each step during
processing entries in a database.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
1350
tGreenplumSCD
Limitation This component does not support using SCD type 0 together
with other SCD types.
Related scenario
For related scenarios, see tMysqlSCD on page 2508.
1351
tGroovy
tGroovy
tGroovy broadens the functionality if the Talend Job, using the Groovy language which is a simplified
Java syntax.
tGroovy allows you to enter customized code which you can integrate in the Talend programme. The
code is run only once.
Basic settings
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at component
level.
Global Variables
Usage
1352
tGroovy
Related Scenarios
• For a scenario using the Groovy code, see Calling a file which contains Groovy code on page
1355.
• For a functional example, see Printing out a variable content on page 1823
1353
tGroovyFile
tGroovyFile
Broadens the functionality of Talend Jobs using the Groovy language which is a simplified Java
syntax.
tGroovyFile allows you to call an existing Groovy script.
Basic settings
Groovy File Name and path of the file containing the Groovy code.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at component
level.
Global Variables
Usage
1354
tGroovyFile
2. In the Groovy File field, enter the path to the file containing the Groovy code, or browse to the
file in your directory. In this example, it is D:/Input/Ageducapitaine.txt, and the file contains the
following Groovy codes:
1355
tGroovyFile
1356
tGSBucketCreate
tGSBucketCreate
Creates a new bucket which you can use to organize data and control access to data in Google Cloud
Storage.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.
Bucket name Specify the name of the bucket which you want to create.
Note that the bucket name must be unique across the
Google Cloud Storage system.
For more information about the bucket naming convention,
see https://developers.google.com/storage/docs/
bucketnaming.
Special configure Select this check box to provide the additional configuration
for the bucket to be created.
Location Select from the list the location where the new bucket
will be created. Currently, Europe and US are available. By
default, the bucket location is in the US.
Note that once a bucket is created in a specific location, it
cannot be moved to another location.
1357
tGSBucketCreate
Acl Select from the list the desired access control list (ACL) for
the new bucket.
Depending on the ACL on the bucket, the access requests
from users may be allowed or rejected. If you do not specify
a predefined ACL for the new bucket, the predefined
project-private ACL applies.
For more information about ACL, see https://develo
pers.google.com/storage/docs/accesscontrol?hl=en.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
For related topics, see Verifing the absence of a bucket, creating it and listing all the S3 buckets on
page 3176.
1358
tGSBucketDelete
tGSBucketDelete
Deletes an empty bucket in Google Cloud Storage so as to release occupied resources.
Note that bucket deletion cannot be undone, so you need to back up any data that you want to keep
before the deletion.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.
Bucket name Specify the name of the bucket that you want to delete.
Make sure that the bucket to be deleted is empty.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
1359
tGSBucketDelete
Usage
Related scenarios
No scenario is available for the Standard version of this component yet.
1360
tGSBucketExist
tGSBucketExist
Checks the existence of a bucket in Google Cloud Storage so as to make further operations.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.
Bucket name Specify the name of the bucket for which you want to perf
orm a check to confirm it exists in Google Cloud Storage.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
1361
tGSBucketExist
Usage
Related scenario
For related topics, see Verifing the absence of a bucket, creating it and listing all the S3 buckets on
page 3176.
1362
tGSBucketList
tGSBucketList
Retrieves a list of buckets from all projects or one specific project in Google Cloud Storage.
tGSBucketList iterates on all buckets within all projects or one specific project in Google Cloud
Storage.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.
Specify project ID Select this check box and in the Project ID field specify a
project ID from which you want to retrieve a list of buckets.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
1363
tGSBucketList
Usage
Related scenario
For related topics, see Verifing the absence of a bucket, creating it and listing all the S3 buckets on
page 3176.
1364
tGSClose
tGSClose
Closes an active connection to Google Cloud Storage in order to release the occupied resources.
Basic settings
Component List Select the tGSConnection component in the list if more than
one connection is planned for the current Job.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Usage rule This component is generally used with other Google Cloud
Storage components, particularly tGSConnection.
Related scenario
For a scenario in which tGSClose is used, see Managing files with Google Cloud Storage on page
1378.
1365
tGSConnection
tGSConnection
Provides the authentication information for making requests to the Google Cloud Storage system and
enables the reuse of the connection it creates to Google Cloud Storage.
Basic settings
Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
1366
tGSConnection
Usage
Usage rule This component is generally used with other Google Cloud
Storage components, particularly tGSClose.
Related scenario
For a scenario in which tGSConnection is used, see Managing files with Google Cloud Storage on page
1378.
1367
tGSCopy
tGSCopy
Copies or moves objects within a bucket or between buckets in Google Cloud Storage.
tGSCopy streamlines processes by automating the copy tasks..
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.
Source bucket name Specify the name of the bucket from which you want to
copy or move objects.
Source is folder Select this check box if the source object is a folder.
Target bucket name Specify the name of the bucket to which you want to copy
or move objects.
Target folder Specify the target folder to which the objects will be copied
or moved.
Action Select the action that you want to perform on objects from
the list.
• Copy: copies objects from the source bucket or folder
to the target bucket or folder.
1368
tGSCopy
Rename Select this check box and in the New name field enter a new
name for the object to be copied or moved.
The Rename check box will not be available if you select
the Source is folder check box.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
For a scenario in which tGSCopy is used, see Managing files with Google Cloud Storage on page 1378.
1369
tGSDelete
tGSDelete
Deletes the objects which match the specified criteria in Google Cloud Storage so as to release the
occupied resources.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.
Key prefix Specify the prefix to delete only objects whose keys begin
with the specified prefix.
Specify project ID Select this check box and in the Project ID field enter the
project ID from which you want to delete objects.
Delete object from bucket list Select this check box and complete the Bucket table to
delete objects in the specified buckets.
• Bucket name: type in the name of the bucket from
which you want to delete objects.
• Key prefix: type in the prefix to delete objects whose
keys begin with the specified prefix in the specified
bucket.
• Delimiter: type in the delimiter to delete those objects
with key names up to the delimiter in the specified
bucket.
1370
tGSDelete
If you select the Delete object from bucket list check box,
the Key prefix and Delimiter fields as well as the Specify
project ID check box will not be available.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Usage rule This component can be used together with the tGSList
component to check if the objects which match the
specified criteria are deleted successfully.
Related scenario
For a scenario in which tGSDelete is used, see Managing files with Google Cloud Storage on page
1378.
1371
tGSGet
tGSGet
Retrieves objects which match the specified criteria from Google Cloud Storage and outputs them to a
local directory.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.
Key prefix Specify the prefix to download only objects which keys
begin with the specified prefix.
Specify project ID Select this check box and in the Project ID field enter the
project ID from which you want to obtain objects.
Use keys Select this check box and complete the Keys table to define
the criteria for objects to be downloaded from Google Cloud
Storage.
• Bucket name: type in the name of the bucket from
which you want to download objects.
• Key: type in the key of the object to be downloaded.
• New name: type in a new name for the object to be
downloaded.
If you select the Use keys check box, the Key prefix and
Delimiter fields as well as the Specify project ID check box
1372
tGSGet
and the Get files from bucket list check box will not be
available.
Get files from bucket list Select this check box and complete the Bucket table to
define the criteria for objects to be downloaded from
Google Cloud Storage.
• Bucket name: type in the name of the bucket from
which you want to download objects.
• Key prefix: type in the prefix to download objects
whose keys start with the specified prefix from the
specified bucket.
• Delimiter: specify the delimiter to download those
objects with key names up to the delimiter from the
specified bucket.
If you select the Get files from bucket list check box, the
Key prefix and Delimiter fields as well as the Specify project
ID check box and the Use keys check box will not be avai
lable.
Output directory Specify the directory where you want to store the
downloaded objects.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Usage rule This component is usually used together with other Google
Cloud Storage components, particularly tGSPut.
1373
tGSGet
Related scenarios
No scenario is available for the Standard version of this component yet.
1374
tGSList
tGSList
Retrieves a list of objects from Google Cloud Storage one by one.
tGSList iterates on a list of objects which match the specified criteria in Google Cloud Storage.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.
Key prefix Specify the key prefix so that only the objects whose keys
begin with the specified string will be listed.
Delimiter Specify the delimiter in order to list only those objects with
key names up to the delimiter.
Specify project ID Select this check box and in the Project ID field enter the
project ID from which you want to retrieve a list of objects.
List objects in bucket list Select this check box and complete the Bucket table to
retrieve objects in the specified buckets.
• Bucket name: type in the name of the bucket from
which you want to retrieve objects.
• Key prefix: type in the prefix to list only objects whose
keys begin with the specified string in the specified
bucket.
• Delimiter: type in the delimiter to list only those
objects with key names up to the delimiter.
1375
tGSList
If you select the List objects in bucket list check box, the
Key prefix and Delimiter fields as well as the Specify project
ID check box will not be available.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Related scenario
For a scenario in which tGSList is used, see Managing files with Google Cloud Storage on page 1378
1376
tGSPut
tGSPut
Uploads files from a local directory to Google Cloud Storage so that you can manage them with
Google Cloud Storage.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.
Bucket name Type in the name of the bucket into which you want to
upload files.
Local directory Type in the full path of or browse to the local directory
where the files to be uploaded are located.
Google Storage directory Type in the Google Storage directory to which you want to
upload files.
Use files list Select this check box and complete the Files table.
• Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions.
• New name: enter a new name for the file after being
uploaded.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
1377
tGSPut
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
1378
tGSPut
Prerequisites: You have purchased a Google Cloud Storage account and created three buckets under
the same Google Storage directory. In this example, the buckets created are bighouse, bed_room, and
study_room.
Procedure
1. Drop the following components from the Palatte to design the workspace: one tGSConnection
component, one tGSPut component, two tGSCopy components, one tGSDelete component, one
1379
tGSPut
tGSList component, one tIterateToFlow component, one tLogRow component and one tGSClose
component.
2. Connect tGSConnection to tGSPut using a Trigger > On Subjob Ok link.
3. Connect tGSPut to the first tGSCopy using a Trigger > On Subjob Ok link.
4. Do the same to connect the first tGSCopy to the second tGSCopy, connect the second tGSCopy to
tGSDelete, connect tGSDelete to tGSList, and connect tGSList to tGSClose.
5. Connect tGSList to tIterateToFlow using a Row > Iterate link.
6. Connect tIterateToFlow to tLogRow using a Row > Main link.
Procedure
1. Double-click the tGSConnection component to open its Basic settings view in the Component tab.
2. Navigate to the Google APIs Console in your web browser to access the Google project hosting
the Cloud Storage services you need to use.
3. Click Google Cloud Storage > Interoperable Access to open its view, and copy the access key and
secret key.
4. In the Component view of the Studio, paste the access key and secret key to the corresponding
fields respectively.
Procedure
1. Double-click the tGSPut component to open its Basic settings view in the Component tab.
2. Select the Use an existing connection check box and then select the connection you have
configured earlier.
3. In the Bucket name field, enter the name of the bucket into which you want to upload files. In this
example, bighouse.
4. In the Local directory field, browse to the directory from which the files will be uploaded, D:/Input/
House in this example.
1380
tGSPut
Procedure
1. Double-click the first tGSCopy component to open its Basic settings view in the Component tab.
2. Select the Use an existing connection check box and then select the connection you have
configured earlier.
3. In the Source bucket name field, enter the name of the bucket from which you want to copy files,
bighouse in this example.
4. Select the Source is a folder check box. All files from the bucket bighouse will be copied.
5. In the Target bucket name field, enter the name of the bucket into which you want to copy files,
bed_room in this example.
6. Select Copy from the Action list.
Procedure
1. Double-click the second tGSCopy component to open its Basic settings view in the Component
tab.
1381
tGSPut
2. Select the Use an existing connection check box and then select the connection you have
configured earlier.
3. In the Source bucket name field, enter the name of the bucket from which you want to move files,
bighouse in this example.
4. In the Source object key field, enter the key of the object to be moved, computer_01.txt in this
example.
5. In the Target bucket name field, enter the name of the bucket into which you want to move files,
study_room in this example.
6. Select Move from the Action list. The specified source file computer_01.txt will be moved from the
bucket bighouse to study_room.
7. Select the Rename check box. In the New name field, enter a new name for the moved file. In this
example, the new name is laptop.txt.
8. Leave other settings as they are.
Procedure
1. Double-click the tGSDelete component to open its Basic settings view in the Component tab.
2. Select the Use an existing connection check box and then select the connection you have
configured earlier.
1382
tGSPut
3. Select the Delete object from bucket list check box. Fill in the Bucket table with the file
information that you want to delete.
In this example, the file computer_03.csv will be deleted from the bucket bed_room whose files are
copied from the bucket bighouse.
Procedure
1. Double-click the tGSList component to open its Basic settings view in the Component tab.
2. Select the Use an existing connection check box and then select the connection you have
configured earlier.
3. Select the List objects in bucket list check box. In the Bucket table, enter the name of the three
buckets in the Bucket name column, bighouse, study_room, and bed_room.
4. Double-click the tIterateToFlow component to open its Basic settings view in the Component tab.
1383
tGSPut
6. The Mapping table will be populated with the defined columns automatically.
In the Value column, enter globalMap.get("tGSList_2_CURRENT_BUCKET") for the bucketName
column and globalMap.get("tGSList_2_CURRENT_KEY") for the key column. You can also press Ctrl +
Space and then choose the appopriate variable.
7. Double-click the tLogRow component to open its Basic settings view in the Component tab.
8. Select Table (print values in cells of a table) for a better view of the results.
Procedure
1. Double-click the tGSClose component to open its Basic settings view in the Component tab.
2. Select the connection you want to close from the Component List.
1384
tGSPut
The files in the three buckets are displayed. As expected, at first, the files from the bucket
bighouse are copied to the bucket bed_room, then the file computer_01.txt from the bucket
bighouse is moved to the bucket study_room and renamed to be laptop.txt, finally the file
computer_03.csv is deleted from the bucket bed_room.
1385
tHashInput
tHashInput
Reads from the cache memory data loaded by tHashOutput to offer high-speed data feed, facilitating
transactions involving a large amount of data.
The components of the Technical family are normally hidden from the Palette by default. For more
information about how to show them on the Palette, see Talend Studio User Guide.
Basic settings
Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Clear cache after reading Select this check box to clear the cache after reading the
data loaded by a certain tHashOutput component. This way,
the following tHashInput components, if any, will not be
able to read the cached data loaded by that tHashOutput
component.
1386
tHashInput
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
1387
tHashInput
6. Connect the second subJob to the last subJob using an OnSubjobOk link.
Procedure
1. Double-click the first tFixedFlowInput component to display its Basic settings view.
Note:
You can select Repository from the Schema drop-down list to fill in the relevant fields
automatically if the relevant metadata has been stored in the Repository. For more information
about Metadata, see the Talend Studio User Guide.
1388
tHashInput
3. Click Edit schema to define the data structure of the input flow. In this case, the input has two
columns: ID and ID_Insurance, and then click OK to close the dialog box.
4. Fill in the Number of rows field to specify the entries to output, e.g. 50000.
5. Select the Use Single Table check box. In the Values table and in the Value column, assign values
to the columns, e.g. 1 for ID and 3 for ID_Insurance.
6. Perform the same operations for the second tFixedFlowInput component, with the only difference
in the values. That is, 2 for ID and 4 for ID_Insurance in this case.
7. Double-click the first tHashOutput to display its Basic settings view.
8. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema
from the previous component. Select Keep all from the Keys management drop-down list and
keep the Append check box selected.
9. Perform the same operations for the second tHashOutput component, and select the Link with a
tHashOutput check box.
Procedure
1. Double-click tHashInput to display its Basic settings view.
1389
tHashInput
2. Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure,
which is the same as that of tHashOutput.
3. Select tHashOutput_1 from the Component list drop down list.
4. Double-click tFileOutputDelimited to display its Basic settings view.
5. Select Built-In from the Property Type drop-down list. In the File Name field, enter the full path
and name of the file, e.g. "E:/Allr70207V5.0/Talend-All-r70207-V5.0.0NB/workspace/out.csv".
6. Select the Include Header check box and click Sync columns to retrieve the schema from the
previous component.
Results
You can find that mass entries are written and read very rapidly.
1390
tHashInput
Procedure
1. Double-click the tLoop component to display its Basic settings view.
2. Select For as the loop type. Type in 1, 2 1 in the From, To and Step fields respectively. Keep the
Values are increasing check box selected.
3. Double-click the tFixedFlowInput component to display its Basic settings view.
1391
tHashInput
Note:
You can select Repository from the Schema drop-down list to fill in the relevant fields
automatically if the relevant metadata has been stored in the Repository. For more information
about Metadata, see the Talend Studio User Guide.
5. Click Edit schema to define the data structure of the input flow. In this case, the input has one
column: Name.
1392
tHashInput
10. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema
from the previous component. Select Keep all from the Keys management drop-down list and
deselect the Append check box.
Procedure
1. Double-click tHashInput to display its Basic settings view.
2. Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure,
which is the same as that of tHashOutput.
3. Select tHashOutput_2 from the Component list drop-down list.
4. Double-click tLogRow to display its Basic settings view.
5. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema
from the previous component. In the Mode area, select Table (print values in cells of a table).
1393
tHashInput
1394
tHashOutput
tHashOutput
Loads data to the cache memory to offer high-speed access, facilitating transactions involving a large
amount of data.
It should be noted that loading data will consume a lot of memory to store records for each record
has an overhead. The number of inputted entries also impacts the usage of memory.
The components of the Technical family are normally hidden from the Palette by default. For more
information about how to show them on the Palette, see Talend Studio User Guide.
Basic settings
Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.
1395
tHashOutput
Note:
If multiple tHashOutput components are linked in this
way, the data loaded to the cache by all of them can be
read by a tHashInput component that is linked with any
of them.
Note:
If Link with a tHashOutput is selected, this check box will
be hidden but is always enabled.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Usage rule This component writes data to the cache memory and
is closely related to tHashInput. Together, these twin
1396
tHashOutput
Related scenarios
For related scenarios, see:
• Reading data from the cache memory for high-speed data access on page 1387.
• Clearing the memory before loading data to it in case an iterator exists in the same subJob on
page 1391.
1397
tHBaseClose
tHBaseClose
Closes an HBase connection you have established in your Job.
tHBaseClose closes an active connection to an HBase database.
Basic settings
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Prerequisites Before starting, ensure that you have met the Loopback IP
prerequisites expected by your database.
The Hadoop distribution must be properly installed, so as to
guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
1398
tHBaseClose
Related scenario
For a scenario in which tHBaseClose is used, see Exchanging customer data with HBase on page
1411.
1399
tHBaseConnection
tHBaseConnection
Establishes an HBase connection to be reused by other HBase components in your Job.
tHBaseConnection opens a connection to an HBase database.
Basic settings
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
1400
tHBaseConnection
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
HBase version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Hadoop version of the distribution This list is displayed only when you have selected Custom
from the distribution list to connect to a cluster not yet
officially supported by the Studio. In this situation, you need
to select the Hadoop version of this custom cluster, that is
to say, Hadoop 1 or Hadoop 2.
Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.
Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.
Inspect the classpath for configurations Select this check box to allow the component to check the
configuration files in the directory you have set with the
$HADOOP_CONF_DIR variable and directly read parameters
from these files in this directory. This feature allows you to
easily change the Hadoop configuration for the component
to switch between different environments, for example,
from a test environment to a production environment.
In this situation, the fields or options used to configure
Hadoop connection and/or Kerberos security are hidden.
1401
tHBaseConnection
1402
tHBaseConnection
Use kerberos authentication If the database to be used is running with Kerberos security,
select this check box, then, enter the principal names in the
displayed fields. You should be able to find the information
in the hbase-site.xml file of the cluster to be used.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
If you need to use a Kerberos keytab file to log in, select
Use a keytab to authenticate. A keytab file contains pairs
of Kerberos principals and encrypted keys. You need to
enter the principal to be used in the Principal field and the
access path to the keytab file itself in the Keytab field. This
keytab file must be stored in the machine in which your Job
actually runs, for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at a
component level.
Global Variables
1403
tHBaseConnection
Usage
Prerequisites Before starting, ensure that you have met the Loopback IP
prerequisites expected by your database.
The Hadoop distribution must be properly installed, so as to
guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.
Related scenario
For a scenario in which tHBaseConnection is used, see Exchanging customer data with HBase on page
1411.
1404
tHBaseInput
tHBaseInput
Reads data from a given HBase database and extracts columns of selection.
HBase is a distributed, column-oriented database that hosts very large, sparsely populated tables on
clusters.
tHBaseInput extracts columns corresponding to schema definition. Then it passes these columns to
the next component via a Main row link.
HBase filters
This table presents the HBase filters available in Talend Studio and the parameters required by those
filters.
Single Column Yes Yes Yes Yes Yes It compares the values of a given
Value Filter column against the value defined
for the Filter value parameter.
If the filtering condition is met,
all columns of the row will be ret
urned.
Column range Yes (The ends Yes It allows intra row scanning and
filter of a range are returns all matching columns of a
separated by scanned row.
comma. )
Row filter Yes Yes Yes It filters on row keys and returns
all rows that matches the filtering
condition.
Value filter Yes Yes Yes It returns only columns that have
a specific value.
1405
tHBaseInput
The use explained above of the listed HBase filters is subject to revisions made by Apache in its
Apache HBase project; therefore, in order to fully understand how to use these HBase filters, we
recommend reading Apache's HBase documentation.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
1406
tHBaseInput
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
HBase version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Hadoop version of the distribution This list is displayed only when you have selected Custom
from the distribution list to connect to a cluster not yet
officially supported by the Studio. In this situation, you need
to select the Hadoop version of this custom cluster, that is
to say, Hadoop 1 or Hadoop 2.
Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.
Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.
Use kerberos authentication If the database to be used is running with Kerberos security,
select this check box, then, enter the principal names in the
displayed fields. You should be able to find the information
in the hbase-site.xml file of the cluster to be used.
1407
tHBaseInput
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
Set table Namespace mappings Enter the string to be used to construct the mapping
between an Apache HBase table and a MapR table.
For the valid syntax you can use, see http://doc.mapr.com/
display/MapR40x/Mapping+Table+Namespace+Between
+Apache+HBase+Tables+and+MapR+Tables.
1408
tHBaseInput
Table name Type in the name of the table from which you need to
extract columns.
Define a row selection Select this check box and then in the Start row and the
End row fields, enter the corresponding row keys to specify
the range of the rows you want the current component to
extract.
Different from the filters you can set using Is by filter
requiring the loading of all records before filtering the ones
to be used, this feature allows you to directly select only the
rows to be used.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Note:
This table is not available when you are using an
existing connection by selecting the Using an existing
connection check box in the Basic settings view.
Logical operation Select the operator you need to use to define the logical
relation between filters. This available operators are:
1409
tHBaseInput
Filter Click the button under this table to add as many rows as
required, each row representing a filter. The parameters you
may need to set for a filter are:
• Filter type: the drop-down list presents pre-existing
filter types that are already defined by HBase. Select
the type of the filter you need to use.
• Filter column: enter the column qualifier on which you
need to apply the active filter. This parameter becomes
mandatory depending on the type of the filter and
of the comparator you are using. For example, it is
not used by the Row Filter type but is required by the
Single Column Value Filter type.
• Filter family: enter the column family on which you
need to apply the active filter. This parameter becomes
mandatory depending on the type of the filter and
of the comparator you are using. For example, it is
not used by the Row Filter type but is required by the
Single Column Value Filter type.
• Filter operation: select from the drop-down list the
operation to be used for the active filter.
• Filter Value: enter the value on which you want to use
the operator selected from the Filter operation drop-
down list.
• Filter comparator type: select the type of the
comparator to be combined with the filter you are
using.
Depending on the Filter type you are using, some or each of
the parameters become mandatory. For further information,
see HBase filters on page 1405
Retrieve timestamps Select this check box to load the timestamps of an HBase
column into the data flow.
• Retrieve from an HBase column: select the HBase
column which is tracked for changes in order to
retrieve its corresponding timestamps.
• Write to a schema column: select the column you
have defined in the schema to store the retrieved
timestamps.
The type of this column must be Long.
Global Variables
1410
tHBaseInput
Usage
Prerequisites Before starting, ensure that you have met the Loopback IP
prerequisites expected by your database.
The Hadoop distribution must be properly installed, so as to
guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.
1411
tHBaseInput
Note:
Before starting the replication, your Hbase and Zookeeper service should have been correctly
installed and well configured. This scenario explains only how to use Talend solution to make data
transaction with a given HBase.
Procedure
1. Drop tHBaseConnection, tFixedFlowInput, tHBaseOutput, tHBaseInput, tLogRow and tHBaseClose
from Palette onto the Design workspace.
2. Right-click tHBaseConnection to open its contextual menu and select the Trigger > On Subjob Ok
link from this menu to connect this component to tFixedFlowInput.
1412
tHBaseInput
3. Do the same to create the OnSubjobOk link from tFixedFlowInput to tHBaseInput and then to
tHBaseClose.
4. Right-click tFixedFlowInput and select the Row > Main link to connect this component to
tHBaseOutput.
5. Do the same to create the Main link from tHBaseInput to tLogrow.
Results
The components to be used in this scenario are all placed and linked. Then you need continue to
configure them sucessively.
Procedure
1. On the Design workspace of your Studio, double-click the tHBaseConnection component to open
its Component view.
2. Select Hortonworks Data Platform 1.0 from the HBase version list.
3. In the Zookeeper quorum field, type in the name or the URL of the Zookeeper service you are
using. In this example, the name of the service in use is hbase.
4. In the Zookeeper client port field, type in the number of client listening port. In this example, it is
2181.
5. If the Zookeeper znode parent location has been defined in the Hadoop cluster you are
connecting to, you need to select the Set zookeeper znode parent check box and enter the value
of this property in the field that is displayed.
1413
tHBaseInput
Procedure
1. On the Design workspace, double-click the tFixedFlowInput component to open its Component
view.
2. In this view, click the three-dot button next to Edit schema to open the schema editor.
3. Click the plus button three times to add three rows and in the Column column, rename the three
rows respectively as: id, name and age.
4. In the Type column, click each of these rows and from the drop-down list, select the data type of
every row. In this scenario, they are Integer for id and age, String for name.
5. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog
box.
6. In the Mode area, select the Use Inline Content (delimited file) to display the fields for editing.
1414
tHBaseInput
7. In the Content field, type in the delimited data to be written into the HBase, separated with the
semicolon ";". In this example, they are:
1;Albert;23
2;Alexandre;24
3;Alfred-Hubert;22
4;Andre;40
5;Didier;28
6;Anthony;35
7;Artus;32
8;Catherine;34
9;Charles;21
10;Christophe;36
11;Christian;67
12;Danniel;54
13;Elisabeth;58
14;Emile;32
15;Gregory;30
Note: If this component does not have the same schema of the preceding component, a
warning icon appears. In this case, click the Sync columns button to retrieve the schema from
the preceding one and once done, the warning icon disappears.
9. Select the Use an existing connection check box and then select the connection you have
configured earlier. In this example, it is tHBaseConnection_1.
10. In the Table name field, type in the name of the table to be created in the HBase. In this example,
it is customer.
11. In the Action on table field, select the action of interest from the drop-down list. In this scenario,
select Drop table if exists and create. This way, if a table named customer exists already in the
HBase, it will be disabled and deleted before creating this current table.
12. Click the Advanced settings tab to open the corresponding view.
1415
tHBaseInput
13. In the Family parameters table, add two rows by clicking the plus button, rename them as family1
and family2 respectively and then leave the other columns empty. These two column families will
be created in the HBase using the default family performance options.
Note: The Family parameters table is available only when the action you have selected in the
Action on table field is to create a table in HBase. For further information about this Family
parameters table, see tHBaseOutput on page 1419.
14. In the Families table of the Basic settings view, enter the family names in the Family name
column, each corresponding to the column this family contains. In this example, the id and the
age columns belong to family1 and the name column to family2.
Note: These column families should already exist in the HBase to be connected to; if not, you
need to define them in the Family parameters table of the Advanced settings view for creating
them at runtime.
Procedure
1. Double-click tHBaseInput to open its Component view.
1416
tHBaseInput
2. Select the Use an existing connection check box and then select the connection you have
configured earlier. In this example, it is tHBaseConnection_1.
3. Click the three-dot button next to Edit schema to open the schema editor.
4. Click the plus button three times to add three rows and rename them as id, name and age
respectively in the Column column. This means that you extract these three columns from the
HBase.
5. Select the types for each of the three columns. In this example, Integer for id and age, String for
name.
6. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog
box.
7. In the Table name field, type in the table from which you extract the columns of interest. In this
scenario, the table is customer.
8. In the Mapping table, the Column column has been already filled automatically since the schema
was defined, so simply enter the name of every family in the Column family column, each
corresponding to the column it contains.
9. Double-click tHBaseClose to open its Component view.
1417
tHBaseInput
10. In the Component List field, select the connection you need to close. In this example, this
connection is tHBaseConnection_1.
These columns of interest are extracted and you can process them according to your needs.
Login to your HBase database, you can check the customer table this Job has created.
1418
tHBaseOutput
tHBaseOutput
Writes columns of data into a given HBase database.
tHBaseOutput receives data from its preceding component, creates a table in a given HBase database
and writes the received data into this HBase table.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
1419
tHBaseOutput
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
HBase version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Hadoop version of the distribution This list is displayed only when you have selected Custom
from the distribution list to connect to a cluster not yet
officially supported by the Studio. In this situation, you need
to select the Hadoop version of this custom cluster, that is
to say, Hadoop 1 or Hadoop 2.
Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.
Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.
1420
tHBaseOutput
Use kerberos authentication If the database to be used is running with Kerberos security,
select this check box, then, enter the principal names in the
displayed fields. You should be able to find the information
in the hbase-site.xml file of the cluster to be used.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
If you need to use a Kerberos keytab file to log in, select
Use a keytab to authenticate. A keytab file contains pairs
of Kerberos principals and encrypted keys. You need to
enter the principal to be used in the Principal field and the
access path to the keytab file itself in the Keytab field. This
keytab file must be stored in the machine in which your Job
actually runs, for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
1421
tHBaseOutput
Set table Namespace mappings Enter the string to be used to construct the mapping
between an Apache HBase table and a MapR table.
For the valid syntax you can use, see http://doc.mapr.com/
display/MapR40x/Mapping+Table+Namespace+Between
+Apache+HBase+Tables+and+MapR+Tables.
Table name Type in the name of the HBase table you need create.
Action on table Select the action you need to take for creating an HBase
table.
Custom Row Key Select this check box to use the customized row keys. Once
selected, the corresponding field appears. Then type in the
user-defined row key to index the rows of the HBase table
being created.
For example, you can type in "France"+Numer
ic.sequence("s1",1,1) to produce the row key series:
France1, France2, France3 and so on.
Custom timestamp column Select a Long column from your schema to provide
timestamps for the HBase columns to be created or updated
by tHBaseOutput.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
Use batch mode Select this check box to activate the batch mode for data
processing.
1422
tHBaseOutput
Note:
This table is not available when you are using an
existing connection by selecting the Using an existing
connection check box in the Basic settings view.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Family parameters Type in the names and, when needs be, the custom
performance options of the column families to be created.
These options are all attributes defined by the HBase data
model, so for further explanation about these options, see
Apache's HBase documentation.
Global Variables
Usage
Prerequisites Before starting, ensure that you have met the Loopback IP
prerequisites expected by your database.
The Hadoop distribution must be properly installed, so as to
guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
1423
tHBaseOutput
Related scenario
For related scenario to the Standard version of tHBaseOutput, see Exchanging customer data with
HBase on page 1411.
1424
tHCatalogInput
tHCatalogInput
Reads data from an HCatalog managed Hive database and send data to the component that follows.
The tHCatalogInput component reads data from the specified HCatalog managed database and sends
data in the data flow to the console or to a specified local file by connecting this component to a
proper component.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
1425
tHCatalogInput
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
HCatalog version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
1426
tHCatalogInput
Templeton hostname Fill this field with the URL of Templeton Webservice.
Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.
Templeton port Fill this field with the port of URL of Templeton Webservice.
By default, this value is 50111.
Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
1427
tHCatalogInput
Partition Fill this field to specify one or more partitions for the
partition operation on a specified table. When you specify
multiple partitions, use commas to separate every two
partitions and use double quotation marks to quote the
partition string.
If you are reading a non-partitioned table, leave this field
empty.
Note:
For further information about Partition, see https://cwiki.
apache.org/Hive/.
Username Fill this field with the username for the Hive database
authentication.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
Custom encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
1428
tHCatalogInput
Retrieve the HCatalog logs Select this check box to retrieve log files generated during
HCatalog operations.
Standard Output Folder Fill this field with the path to which log files are stored.
Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.
Error Output Folder Fill this field with the path to which error log files are
stored.
Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
1429
tHCatalogInput
Related scenario
For a related scenario, see Managing HCatalog tables on Hortonworks Data Platform on page 1444.
1430
tHCatalogLoad
tHCatalogLoad
Reads data directly from HDFS and writes this data into an established HCatalog managed table.
Basic settings
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
1431
tHCatalogLoad
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
HCatalog version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Templeton hostname Fill this field with the URL of Templeton Webservice.
Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.
Templeton port Fill this field with the port of URL of Templeton Webservice.
By default, this value is 50111.
Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
1432
tHCatalogLoad
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
Database Enter the name of the database you need to write data in.
This database must already exist.
Table Enter the name of the table you need to write data in. This
table must already exist.
Partition Fill this field to specify one or more partitions for the pa
rtition operation on the specified table. When you specify
multiple partitions, use commas to separate every two
partitions and use double quotation marks to quote the
partition string.
If you are reading a non-partitioned table, leave this field
empty.
Note:
For further information about Partition, see https://cwiki.
apache.org/Hive/.
Username Fill this field with the username for the DB authentication.
File location Enter the absolute path pointing to the HDFS location from
which data is read.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
1433
tHCatalogLoad
Advanced settings
Retrieve the HCatalog logs Select this check box to retrieve log files generated during
HCatalog operations.
Standard Output Folder Fill this field with the path to which log files are stored.
Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.
Error Output Folder Fill this field with the path to which error log files are
stored.
Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
1434
tHCatalogLoad
Related scenario
For a related scenario, see Managing HCatalog tables on Hortonworks Data Platform on page 1444.
1435
tHCatalogOperation
tHCatalogOperation
Prepares the HCatalog managed database/table/partition to be processed.
tHCatalogOperation manages the data stored in HCatalog managed Hive database/table/partition.
Basic settings
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
1436
tHCatalogOperation
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
HCatalog version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Templeton hostname Fill this field with the URL of Templeton Webservice.
Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.
Templeton port Fill this field with the port of URL of Templeton Webservice.
By default, the value for this field is 50111.
Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
1437
tHCatalogOperation
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
Operation Select an action from the list for the DB operation. For
further information about the DB operation in HDFS, see
https://cwiki.apache.org/Hive/.
Create the table only it doesn't exist already Select this check box to avoid creating duplicate table when
you create a table.
Note:
This check box is enabled only when you have selected
Table from the Operation on list.
Database Fill this field with the name of the database in which the
HCatalog managed tables are placed.
1438
tHCatalogOperation
Note:
This field is enabled only when you have selected Table
from the Operation on list. For further information about
the operation on Table, see https://cwiki.apache.org/Hiv
e/.
Partition Fill this field to specify one or more partitions for the
partition operation on a specified table. When you specify
multiple partitions, use comma to separate every two
partitions and use double quotation marks to quote the
partition string.
If you are reading a non-partitioned table, leave this field
empty.
Note:
This field is enabled only when you select Partition from
the Operation on list. For further information about the
operation on Partition, see https://cwiki.apache.org/Hiv
e/.
Username Fill this field with the username for the DB authentication.
Database location Fill this field with the location of the database file in HDFS.
Note:
This field is enabled only when you select Database from
the Operation on list.
Note:
This field is enabled only when you select Database from
the Operation on list.
Create an external table Select this field to create an external table in an alternative
path defined in the Set HDFS location field in the Advanced
settings view. For further information about creating
external table, see https://cwiki.apache.org/Hive/.
Note:
This check box is enabled only when you select Table
from the Operation on list and Create/Drop and create/
Drop if exist and create from the Operation list.
Format Select a file format from the list to specify the format of the
external table you want to create:
TEXTFILE: Plain text files.
RCFILE: Record Columnar files. For further information
about RCFILE, see https://cwiki.apache.org/confluence/
display/Hive/RCFile.
1439
tHCatalogOperation
Note:
RCFILE is only available starting with Hive 0.6.0. This
list is enabled only when you select Table from the
Operation on list and Create/Drop and create/Drop if
exist and create from the Operation list.
Set partitions Select this check box to set the partition schema by clicking
the Edit schema to the right of Set partitions check box.
The partition schema is either built-in or remote in the
Repository.
Note:
This check box is enabled only when you select
Table from the Operation on list and Create/Drop and
create/Drop if exist and create from the Operation list.
You must follow the rules of using partition schema in
HCatalog managed tables. For more information about
the rules in using partition schema, see https://cwiki.
apache.org/confluence/display/Hive/HCatalog.
Set the user group to use Select this check box to specify the user group.
Note:
This check box is enabled only when you select
Drop/Drop if exist/Drop and create/Drop if exist and
create from the Operation list. By default, the value for
this field is root. For more information about the user
group in the server, contact your system administrator.
Note:
This list is enabled only when you select Database from
the Operation on list and Drop/Drop if exist/Drop and
create/Drop if exist and create from the Operation list.
For more information about Drop operation on database,
see https://cwiki.apache.org/Hive/.
Set the permissions to use Select this check box to specify the permissions needed by
the operation you select from the Operation list.
1440
tHCatalogOperation
Note:
This check box is enabled only when you select
Drop/Drop if exist/Drop and create/Drop if exist and
create from the Operation list. By default, the value for
this field is rwxrw-r-x. For more information on user
permissions, contact your system administrator.
Set File location Enter the directory in which partitioned data is stored.
Note:
This check box is enabled only when you select
Partition from the Operation on list and Create/Drop and
create/Drop if exist and create from the Operation list.
For further information about storing partitioned data in
HDFS, see https://cwiki.apache.org/Hive/.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
Comment Fill this field with the comment for the table you want to
create.
Note:
This field is enabled only when you select Table from
the Operation on list and Create/Drop and create/Drop
if exist and create from the Operation list in the Basic
settings view.
Set HDFS location Select this check box to specify an HDFS location to which
the table you want to create is saved. Deselect it to save the
table you want to create in the warehouse directory defined
in the key hive.metastore.warehouse.dir in Hive configuration
file hive-site.xml.
Note:
This check box is enabled only when you select
Table from the Operation on list and Create/Drop and
create/Drop if exist and create from the Operation list
in the Basic settings view. For further information about
saving data in HDFS, see https://cwiki.apache.org/Hive/.
Set row format(terminated by) Select this check box to use and define the row formats
when you want to create a table:
Field: Select this check box to use Field as the row format.
The default value for this field is "\u0001". You can also
specify a customized char in this field.
Collection Item: Select this check box to use Collection
Item as the row format. The default value for this field is
"\u0002". You can also specify a customized char in this
field.
1441
tHCatalogOperation
Map Key: Select this check box to use Map Key as the row
format. The default value for this field is "\u0003". You can
also specify a customized char in this field.
Line: Select this check box to use Line as the row format.
The default value for this field is "\n". You can also specify a
customized char in this field.
Note:
This check box is enabled only when you select
Table from the Operation on list and Create/Drop and
create/Drop if exist and create from the Operation list
in the Basic settings view. For further information about
row formats in the HCatalog managed table, see https://
cwiki.apache.org/Hive/.
Properties Click [+] to add one or more lines to define table properties.
The table properties allow you to tag the table definition
with your own metadata key/value pairs. Make sure that
values in both Key row and Value row must be quoted in
double quotation marks.
Note:
This table is enabled only when you select
Database/Table from the Operation on list and
Create/Drop and create/Drop if exist and create from
the Operation list in the Basic settings view. For further
information about table properties, see https://cwiki.
apache.org/Hive/.
Retrieve the HCatalog logs Select this check box to retrieve log files generated during
HCatalog operations.
Standard Output Folder Browse to, or enter the directory where the log files are
stored.
Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.
Error Output Folder Browse to, or enter the directory where the error log files
are stored.
Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
1442
tHCatalogOperation
Usage
1443
tHCatalogOperation
Note:
Knowledge of Hive Data Definition Language and HCatalog Data Definition Language is required.
For further information about Hive Data Definition Language, see https://cwiki.apache.org/
confluence/display/Hive/LanguageManual+DDL. For further information about HCatalog Data
Definition Language, see https://cwiki.apache.org/confluence/display/HCATALOG/Design
+Document+-+Java+APIs+for+HCatalog+DDL+Commands.
1444
tHCatalogOperation
2. Click Edit schema to define the schema for the table to be created.
1445
tHCatalogOperation
3. Click [+] to add at least one column to the schema and click OK when you finish setting the
schema. In this scenario, the columns added to the schema are: name, country and age.
4. Fill the Templeton hostname field with URL of the Templeton webservice you are using. In this
scenario, fill this field with "192.168.0.131".
5. Fill the Templeton port field with the port for Templeton hostname. By default, the value for this
field is "50111"
6. Select Table from the Operation on list and Drop if exist and create from the Operation list to
create a table in HDFS.
7. Fill the Database field with an existing database name in HDFS. In this scenario, the database
name is "talend".
8. Fill the Table field with the name of the table to be created. In this scenario, the table name is
"Customer".
9. Fill the Username field with the username for the DB authentication.
10. Select the Set the user group to use check box to specify the user group. The default user group is
"root", you need to specify the value for this field according to real practice.
11. Select the Set the permissions to use check box to specify the user permission. The default value
for this field is "rwxrwxr-x".
12. Select the Set partitions check box to enable the partition schema.
13. Click the Edit schema button next to the Set partitions check box to define the partition schema.
14. Click [+] to add one column to the schema and click OK when you finish setting the schema. In
this scenario, the column added to the partition schema is: match_age.
2. Click Edit schema to define a same schema as the one you defined in tHCatalogOperation.
3. Fill the Number of rows field with integer 8.
1446
tHCatalogOperation
7. Click Sync columns to retrieve the schema defined in the preceding component.
8. Fill the NameNode URI field with the URI to the NameNode. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber; WebHDFS with SSL is not supported yet.
9. Fill the File name field with the HDFS location of the file you write data to. In this scenario, the
file location is "/user/hdp/Customer/Customer.csv".
10. Select Overwrite from the Action list.
11. Fill the Templeton hostname field with URL of the Templeton webservice you are using. In this
scenario, fill this field with "192.168.0.131".
12. Fill the Templeton port field with the port for Templeton hostname. By default, the value for this
field is "50111"
13. Fill the Database field, the Table field, the Username field with the same value you specified in
tHCatalogOperation.
14. Fill the Partition field with "match_age=27".
15. Fill the File location field with the HDFS location to which the table will be saved. In this
example, use "hdfs://192.168.0.131:8020/user/hdp/Customer".
1447
tHCatalogOperation
2. Click Edit schema to define the schema of the table to be read from the database.
1448
tHCatalogOperation
3. Click [+] to add at least one column to the schema. In this scenario, the columns added to the
schema are age and name.
4. Fill the Partition field with "match_age=26".
5. Do the rest of the settings in the same way as configuring tHCatalogOperation.
Outputting the data read from the table in HDFS to the console
Procedure
1. Double-click tLogRow to open its Basic settings view.
2. Click Sync columns to retrieve the schema defined in the preceding component.
3. Select Table from the Mode area.
Job execution
Press CTRL+S to save your Job and F6 to execute it.
1449
tHCatalogOperation
The data of the restricted table read from the HDFS is displayed onto the console.
Type in http://talend-hdp:50075/browseDirectory.jsp?dir=/user/hdp/Customer&namenodeInfoPort=50070
to the address bar of your browser to view the table you created:
1450
tHCatalogOperation
Click the Customer.csv link to view the content of the table you created.
1451
tHCatalogOperation
1452
tHCatalogOutput
tHCatalogOutput
Receives data from its incoming flow and writes this data into an HCatalog managed table.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
1453
tHCatalogOutput
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
1454
tHCatalogOutput
HCatalog version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
File name Browse to, or enter the location of the file which you write
data to. This file is created automatically if it does not exist.
1455
tHCatalogOutput
Templeton hostname Fill this field with the URL of Templeton Webservice.
Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.
Templeton port Fill this field with the port of URL of Templeton Webservice.
By default, this value is 50111.
Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.
Partition Fill this field to specify one or more partitions for the pa
rtition operation on the specified table. When you specify
multiple partitions, use commas to separate every two
partitions and use double quotation marks to quote the
partition string.
If you are reading a non-partitioned table, leave this field
empty.
Note:
For further information about Partition, see https://cwiki.
apache.org/Hive/.
Username Fill this field with the username for the DB authentication.
File location Fill this field with the path to which source data file is sto
red.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
1456
tHCatalogOutput
Advanced settings
Custom encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
Retrieve the HCatalog logs Select this check box to retrieve log files generated during
HCatalog operations.
Standard Output Folder Browse to, or enter the directory where the log files are st
ored.
Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.
Error Output Folder Browse to, or enter the directory where the error log files
are stored.
Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.
1457
tHCatalogOutput
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
1458
tHCatalogOutput
Related scenario
For a related scenario, see Managing HCatalog tables on Hortonworks Data Platform on page 1444.
1459
tHDFSCompare
tHDFSCompare
Compares two files in HDFS and based on the read-only schema, generates a row flow that presents
the comparison information.
tHDFSCompare helps to control the quality of the data processed.
Basic settings
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
1460
tHDFSCompare
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
1461
tHDFSCompare
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
1462
tHDFSCompare
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
The schema of this component is read-only. You can click
Edit schema to view the schema.
File to compare Browse, or enter the path to the file in HDFS you need to
check for quality control.
Reference file Browse, or enter the path to the file in HDFS the comparison
is based on.
If differences detected, display and If no differences Type in a message to be displayed in the Run console based
detected, display on the result of the comparison.
Print to console Select this check box to display the message in the Run
console.
Advanced settings
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
1463
tHDFSCompare
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1464
tHDFSCompare
Related scenarios
No scenario is available for the Standard version of this component yet.
1465
tHDFSConnection
tHDFSConnection
Connects to a given HDFS so that the other Hadoop components can reuse the connection it creates
to communicate with this HDFS.
tHDFSConnection provides connection to the Hadoop distributed file system (HDFS) of interest at
runtime.
Basic settings
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
1466
tHDFSConnection
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
1467
tHDFSConnection
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Inspect the classpath for configurations Select this check box to allow the component to check the
configuration files in the directory you have set with the
$HADOOP_CONF_DIR variable and directly read parameters
from these files in this directory. This feature allows you to
easily change the Hadoop configuration for the component
to switch between different environments, for example,
from a test environment to a production environment.
In this situation, the fields or options used to configure
Hadoop connection and/or Kerberos security are hidden.
If you want to use certain parameters such as the Kerberos
parameters but these parameters are not included in these
Hadoop configuration files, you need to create a file called
talend-site.xml and put this file into the same directory
defined with $HADOOP_CONF_DIR. This talend-site.xml file
should read as follows:
1468
tHDFSConnection
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
1469
tHDFSConnection
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
Use datanode hostname Select the Use datanode hostname check box to allow the
Job to access datanodes via their hostnames. This actually
sets the dfs.client.use.datanode.hostname property to true.
Setup HDFS encryption configurations If the HDFS transparent encryption has been enabled
in your cluster, select the Setup HDFS encryption
configurations check box and in the HDFS encryption key
provider field that is displayed, enter the location of the
KMS proxy.
For further information about the HDFS transparent
encryption and its KMS proxy, see Transparent Encryption in
HDFS.
1470
tHDFSConnection
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
1471
tHDFSConnection
Related scenarios
No scenario is available for the Standard version of this component yet.
1472
tHDFSCopy
tHDFSCopy
copies a source file or folder into a target directory in HDFS and removes this source if required.
Basic settings
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
1473
tHDFSCopy
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
1474
tHDFSCopy
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
1475
tHDFSCopy
Source file or directory Browse to, or enter the path pointing to the data to be used
in the file system.
Target location Browse to, or enter the directory in HDFS to which you need
to copy the data.
Copy merge Select this check box to merge the part files generated at
the end of a MapReduce computation.
Once selecting it, you need to enter the name of the final
merged file in the Merge name field.
Remove source Select this check box to remove the source file or folder
once this source is copied to the target location.
Override target file (This option does not override the Select this check box to override the file already existing in
directory) the target location. This option does not override the folder.
Advanced settings
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
1476
tHDFSCopy
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1477
tHDFSCopy
Related scenario
Related topic, see Procedure on page 990
Related topic, see Iterating on a HDFS directory on page 1523
1478
tHDFSDelete
tHDFSDelete
Deletes a file located on a given Hadoop distributed file system (HDFS).
Basic settings
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
1479
tHDFSDelete
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
1480
tHDFSDelete
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
1481
tHDFSDelete
File or Directory Path Browse to, or enter the path to the file or folder to be
deleted on HDFS.
Advanced settings
Hadoop properties If you need to use custom configuration for the Hadoop of
interest, complete this table with the property or properties
to be customized. Then at runtime, the customized property
or properties will override those corresponding ones
defined earlier for the same Hadoop.
For further information about the properties required by
Hadoop, see the Hadoop documentation.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables DELETE_PATH: the path to the deleted file or folder. This is
an After variable and it returns a string.
CURRENT_STATUS: the execution result of the component.
This is an After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
1482
tHDFSDelete
Related scenarios
No scenario is available for the Standard version of this component yet.
1483
tHDFSExist
tHDFSExist
Checks whether a file exists in a specific directory in HDFS.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
1484
tHDFSExist
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
1485
tHDFSExist
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
1486
tHDFSExist
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.
HDFS directory Browse to, or enter the path pointing to the data to be used
in the file system.
File name or relative path Enter the name of the file you want to check whether this
file exists. Or if needs be, browse to the file or enter the
path to the file, relative to the directory you entered in
HDFS directory.
Advanced settings
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Global Variables EXISTS: the result of whether a specified file exists. This is a
Flow variable and it returns a boolean.
FILENAME: the name of the file processed. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
1487
tHDFSExist
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1488
tHDFSExist
Launch the Hadoop distribution in which you want to check the existence of a particular file. Then,
proceed as follows:
1489
tHDFSExist
2. In the Version area, select the Hadoop distribution you are connecting to and its version.
3. In the Connection area, enter the values of the parameters required to connect to the HDFS.
In the real-world practice, you may use tHDFSConnection to create a connection and reuse it from
the current component. For further information, see tHDFSConnection on page 1466.
4. In the HDFS Directory field, browse to, or enter the path to the folder where the file to be checked
is. In this example, browse to /user/ychen/data/hdfs/out/dest.
5. In the File name or relative path field, enter the name of the file you want to check the existence.
For example, output.csv.
2. In the Title field, enter the title to be used for the pop-up message box to be created.
3. In the Buttons list, select OK. This defines the button to be displayed on the message box.
4. In the Icon list, select Icon information.
5. In the Message field, enter the message you want to displayed once the file checking is done. In
this example, enter "This file does not exist!".
1490
tHDFSExist
2. In the Condition box, press Ctrl+Space to access the variable list and select the global variable
EXISTS. Type an exclamation mark before the variable to negate the meaning of the variable.
Results
Once done, a message box pops up to indicate that this file called output.csv does not exist in the
directory you defined earlier.
In the HDFS we check the existence of the file, browse to this directory specified, you can see that this
file does not exist.
1491
tHDFSExist
1492
tHDFSGet
tHDFSGet
Copies files from Hadoop distributed file system(HDFS), pastes them in a user-defined directory and if
needs be, renames them.
tHDFSGet connects to Hadoop distributed file system, helping to obtain large-scale files with
optimized performance.
Basic settings
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
1493
tHDFSGet
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
1494
tHDFSGet
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
1495
tHDFSGet
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.
HDFS directory Browse to, or enter the path pointing to the data to be used
in the file system.
Local directory Browse to, or enter the local directory to store the files
obtained from HDFS.
Overwrite file Options to overwrite or not the existing file with the new
one.
Append Select this check box to add the new rows at the end of the
records.
Include subdirectories Select this check box if the selected input source type
includes sub-directories.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
1496
tHDFSGet
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
1497
tHDFSGet
1498
tHDFSGet
1499
tHDFSGet
2. Click the [...] button next to the File Name field and browse to the output file you want to write
data in, in.txt in this example.
1500
tHDFSGet
2. Select, for example, Apache 0.20.2 from the Hadoop version list.
3. In the NameNode URI, the Username and the Group fields, enter the connection parameters to
the HDFS. If you are using WebHDFS, the location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.
4. Next to the Local directory field, click the three-dot [...] button to browse to the folder with
the file to be loaded into the HDFS. In this scenario, the directory has been specified while
configuring tFileOutputDelimited: C:/hadoopfiles/putFile/.
5. In the HDFS directory field, type in the intended location in HDFS to store the file to be loaded. In
this example, it is /testFile.
6. Click the Overwrite file field to stretch the drop-down.
7. From the menu, select always.
8. In the Files area, click the plus button to add a row in which you define the file to be loaded.
9. In the File mask column, enter *.txt to replace newLine between quotation marks and leave the
New name column as it is. This allows you to extract all the .txt files in the specified directory
without changing their names. In this example, the file is in.txt.
1501
tHDFSGet
2. Select, for example, Apache 0.20.2 from the Hadoop version list.
3. In the NameNode URI, the Username, the Group fields, enter the connection parameters to the
HDFS. If you are using WebHDFS, the location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.
4. In the HDFS directory field, type in location storing the loaded file in HDFS. In this example, it is /
testFile.
5. Next to the Local directory field, click the three-dot [...] button to browse to the folder intended to
store the files that are extracted out of the HDFS. In this scenario, the directory is: C:/hadoopfiles/
getFile/.
6. Click the Overwrite file field to stretch the drop-down.
7. From the menu, select always.
8. In the Files area, click the plus button to add a row in which you define the file to be extracted.
9. In the File mask column, enter *.txt to replace newLine between quotation marks and leave the
New name column as it is. This allows you to extract all the .txt files from the specified directory
in the HDFS without changing their names. In this example, the file is in.txt.
Reading data from the HDFS and saving the data locally
Procedure
1. Double-click tFileInputDelimited to define the component in its Basic settings view.
1502
tHDFSGet
1503
tHDFSGet
The file is also extracted from the HDFS by tHDFSGet and is read by tFileInputDelimited.
1504
tHDFSInput
tHDFSInput
Extracts the data in a HDFS file for other components to process it.
tHDFSInput reads a file located on a given Hadoop distributed file system (HDFS) and puts the data of
interest from this file into a Talend schema. Then it passes the data to the component that follows.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
1505
tHDFSInput
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.
1506
tHDFSInput
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
1507
tHDFSInput
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.
File Name Browse to, or enter the path pointing to the data to be used
in the file system.
If the path you set points to a folder, this component will
read all of the files stored in that folder. Furthermore, if
sub-folders exist in that folder and you need to read files in
the sub-folders, select the Include sub-directories if path is
directory check box in the Advanced settings view.
Type Select the type of the file to be processed. The type of the
file may be:
• Text file.
• Sequence file: a Hadoop sequence file consists of
binary key/value pairs and is suitable for the Map/
Reduce framework. For further information, see http://
wiki.apache.org/hadoop/SequenceFile.
1508
tHDFSInput
Header Set values to ignore the header of the transferred data. For
example, enter 0 to ignore no rows for the data without
header and set 1 for the data with header at the first row.
This field is not available for a Sequence file.
Custom encoding You may encounter encoding issues when you process
the stored data. In that situation, select this check box to
display the Encoding list.
Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
This option is not available for a Sequence file.
Advanced settings
Include sub-directories if path is directory Select this check box to read not only the folder you have
specified in the File name field but also the sub-folders in
that folder.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
1509
tHDFSInput
tStatCatcher Statistics Select this check box to collect log data at the component
level. Note that this check box is not available in the Map/
Reduce version of the component.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1510
tHDFSInput
Procedure
1. Create your Azure Data Lake Storage Gen2 account if you do not have it yet.
1511
tHDFSInput
• For more details, see Create an Azure Data Lake Storage Gen2 account from the Azure
documentation.
2. Create an Azure Active Directory application on your Azure portal. For more details about how to
do this, see the "Create an Azure Active Directory application" section in Azure documentation:
Use portal to create an Azure Active Directory application.
3. Obtain the application ID, object ID and the client secret of the application to be used from the
portal.
a) On the list of the registered applications, click the application you created and registered in
the previous step to display its information blade.
b) Click Overview to open its blade, and from the top section of the blade, copy the Object ID and
the application ID displayed as Application (client) ID. Keep them somewhere safe for later
use.
c) Click Certificates & secrets to open its blade and then create the authentication key (client
secret) to be used on this blade in the Client secrets section.
4. Back to the Overview blade of the application to be used, click Endpoints on the top of this blade,
copy the value of OAuth 2.0 token endpoint (v1) from the endpoint list that appears and keep it
somewhere safe for later use.
5. Set the read and write permissions to the ADLS Gen2 filesystem to be used for the service
principal of your application.
It is very likely that the administrator of your Azure system has included your account and your
applications in the group that has access to a given ADLS Gen2 storage account and a given ADLS
Gen2 filesystem. In this case, ask your administrator to ensure that you have the proper access and
then ignore this step.
a) Start your Microsoft Azure Storage Explorer and find your ADLS Gen2 storage account on the
Storage Accounts list.
If you have not installed Microsoft Azure Storage Explorer, you can download it from the
Microsoft Azure official site.
b) Expand this account and the Blob Containers node under it; then click the ADLS Gen2
hierarchical filesystem to be used under this node.
1512
tHDFSInput
Example
The filesystem in this image is for demonstration purposes only. Create the filesystem to be
used under the Blob Containers node in your Microsoft Azure Storage Explorer, if you do not
have one yet.
c) On the blade that is opened, click Manage Access to open its wizard.
d) At the bottom of this wizard, add the object ID of your application to the Add user or group
field and click Add.
e) Select the object ID just added from the Users and groups list and select all the permission for
Access and Default.
f) Click Save to validate these changes and close this wizard.
1513
tHDFSInput
Results
Configuring the HDFS components to work with Azure Data Lake Storage
Procedure
1. Double-click tFixedFlowInput to open its Component view to provide sample data to the Job.
The sample data to be used contains only one row with two column: id and name.
2. Click the [...] button next to Edit schema to open the schema editor.
3. Click the [+] button to add the two columns and rename them to id and name.
4. Click OK to close the schema editor and validate the schema.
5. In the Mode area, select Use single table.
The id and the name columns automatically appear in the Value table and you can enter the
values you want within double quotation marks in the Value column for the two schema values.
6. Double-click tHDFSOutput to open its Component view.
1514
tHDFSInput
Example
7. In the Version area, select Hortonworks or Cloudera depending on the distribution you are using.
In the Standard framework, only these two distributions with ADLS are supported by the HDFS
components.
8. From the Scheme drop-down list, select ADLS. The ADLS related parameters appear in the
Component view.
9. In the URI field, enter the NameNode service of your application. The location of this service is
actually the address of your Data Lake Store.
For example, if your Data Lake Storage name is data_lake_store_name, the NameNode URI
to be used is adl://data_lake_store_name.azuredatalakestore.net.
10. In the Client ID and the Client key fields, enter, respectively, the authentication ID and the
authentication key generated upon the registration of the application that the current Job you are
developing uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate permissions to access Azure Data Lake.
You can check this on the Required permissions view of this application on Azure. For further
information, see Azure documentation Assign the Azure AD application to the Azure Data Lake
Storage account file or folder.
This application must be the one to which you assigned permissions to access your Azure Data
Lake Storage in the previous step.
11. In the Token endpoint field, copy-paste the OAuth 2.0 token endpoint that you can obtain from
the Endpoints list accessible on the App registrations page on your Azure portal.
1515
tHDFSInput
12. In the File name field, enter the directory to be used to store the sample data on Azure Data Lake
Storage.
13. From the Action drop-down list, select Create if the directory to be used does not exist yet on
Azure Data Lake Storage; if this folder already exists, select Overwrite.
14. Do the same configuration for tHDFSInput.
15. If you run your Job on Windows, following this procedure to add the winutils.exe program to your
Job.
16. Press F6 to run your Job.
1516
tHDFSList
tHDFSList
tHDFSList retrieves a list of files or folders based on a filemask pattern and iterates on each unity.
Basic settings
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
1517
tHDFSList
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
1518
tHDFSList
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
1519
tHDFSList
HDFS Directory Browse to, or enter the path pointing to the data to be used
in the file system.
FileList Type Select the type of input you want to iterate on from the list:
Files if the input is a set of files,
Directories if the input is a set of directories,
Both if the input is a set of the above two types.
Include subdirectories Select this check box if the selected input source type
includes sub-directories.
Case Sensitive Set the case mode from the list to either create or not
create case sensitive filter on filenames.
Use Glob Expressions as Filemask This check box is selected by default. It filters the results
using a Global Expression (Glob Expressions).
Files Click the plus button to add as many filter lines as needed:
Filemask: in the added filter lines, type in a filename or a
filemask using special characters or regular expressions.
Order by The folders are listed first of all, then the files. You can
choose to prioritise the folder and file order either:
By default: alphabetical order, by folder then file;
By file name: alphabetical order or reverse alphabetical
order;
By file size: smallest to largest or largest to smallest;
By modified date: most recent to least recent or least recent
to most recent.
Note:
If ordering by file name, in the event of identical file
names then modified date takes precedence. If ordering
by file size, in the event of identical file sizes then file
name takes precedence. If ordering by modified date,
in the event of identical dates then file name takes
precedence.
Order action Select a sort order by clicking one of the following radio
buttons:
ASC: ascending order;
DESC: descending order;
Advanced settings
Use Exclude Filemask Select this check box to enable Exclude Filemask field to
exclude filtering condition based on file type:
1520
tHDFSList
Note:
File types in this field should be quoted with double
quotation marks and seperated by comma.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
1521
tHDFSList
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1522
tHDFSList
1523
tHDFSList
You can design a Job in the Studio to create the two files. For further information, see tHDFSPut on
page 1548 or tHDFSOutput on page 1528.
1524
tHDFSList
2. In the Version area, select the Hadoop distribution you are connecting to and its version.
3. In the Connection area, enter the values of the parameters required to connect to the HDFS.
In the real-world practice, you may use tHDFSConnection to create a connection and reuse it from
the current component. For further information, see tHDFSConnection on page 1466.
4. In the HDFS Directory field, enter the path to the folder where the files to be iterated on are. In
this example, as presented earlier, the directory is /user/ychen/data/hdfs/out/.
5. In the FileList Type field, select File.
6. In the Files table, click to add one row and enter * between the quotation marks to iterate on
any files existing.
1525
tHDFSList
2. In the Version area, select the Hadoop distribution you are connecting to and its version.
3. In the Connection area, enter the values of the parameters required to connect to the HDFS.
In the real-world practice, you may have used tHDFSConnection to create a connection; then you
can reuse it from the current component. For further information, see tHDFSConnection on page
1466.
4. In the HDFS directory field, enter the path to the folder holding the files to be retrieved.
To do this with the auto-completion list, place the mouse pointer in this field, then, press Ctrl
+Space to display the list and select the tHDFSList_1_CURRENT_FILEDIRECTORY variable to reuse
the directory you have defined in tHDFSList. In this variable, tHDFSList_1 is the label of the
component. If you label it differently, select the variable accordingly.
Once selecting this variable, the directory reads, for example, ((String)globalMap.get("tHDF
SList_1_CURRENT_FILEDIRECTORY")) in this field.
For further information about how to label a component, see the Talend Studio User Guide.
5. In the Local directory field, enter the path, or browse to the folder you want to place the selected
files in. This folder will be created if it does not exist. In this example, it is C:/hdfsFiles.
6. In the Overwrite file field, select always.
7. In the Files table, click to add one row and enter * between the quotation marks in the
Filemask column in order to get any files existing.
1526
tHDFSList
Results
Once done, you can check the files created in the local directory.
1527
tHDFSOutput
tHDFSOutput
Writes data flows it receives into a given Hadoop distributed file system (HDFS).
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
1528
tHDFSOutput
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
1529
tHDFSOutput
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
1530
tHDFSOutput
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.
File Name Browse to, or enter the location of the file which you write
data to. This file is created automatically if it does not exist.
Type Select the type of the file to be processed. The type of the
file may be:
• Text file.
• Sequence file: a Hadoop sequence file consists of
binary key/value pairs and is suitable for the Map/
Reduce framework. For further information, see http://
wiki.apache.org/hadoop/SequenceFile.
Once you select the Sequence file format, the Key
column list and the Value column list appear to allow
you to select the keys and the values of that Sequence
file to be processed.
1531
tHDFSOutput
Custom encoding You may encounter encoding issues when you process
the stored data. In that situation, select this check box to
display the Encoding list.
Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
This option is not available for a Sequence file.
Compression Select the Compress the data check box to compress the
output data.
Hadoop provides different compression formats that help
reduce the space needed for storing files and speed up data
transfer. When reading a compressed file, the Studio needs
to uncompress it before being able to feed it to the input
flow.
Note that when the type of the file to be written is
Sequence File, the compression algorithm is embedded
within the container files (the part- files) of this sequence
file. These files can be read by a Talend component
such as tHDFSInput within MapReduce Jobs and other
applications that understand the sequence file format.
Alternatively, when the type is Text File, the output files
can be accessed with standard compression utilities that
understand the bzip2 or gzip container files.
Include header Select this check box to output the header of the data.
This option is not available for a Sequence file.
Advanced settings
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
1532
tHDFSOutput
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1533
tHDFSOutput
Related scenario
• Related topic, see Writing data in a delimited file on page 1116.
• Related topic, see Computing data with Hadoop distributed file system on page 1498.
1534
tHDFSOutputRaw
tHDFSOutputRaw
Transfers data of different formats such as hierarchical data in the form of a single column into a
given HDFS file system.
tHDFSOutputRaw receives a single-column input flow and writes the data into HDFS.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
The schema of this component is read-only. You can click
Edit schema to view the schema.
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
1535
tHDFSOutputRaw
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
1536
tHDFSOutputRaw
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
1537
tHDFSOutputRaw
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
Use Datanode hostname Select the Use datanode hostname check box to allow the
Job to access datanodes via their hostnames. This actually
sets the dfs.client.use.datanode.hostname property to true.
When connecting to a S3N filesystem, you must select this
check box.
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.
File Name Browse to, or enter the location of the file which you write
data to. This file is created automatically if it does not exist.
Custom encoding You may encounter encoding issues when you process
the stored data. In that situation, select this check box to
display the Encoding list.
Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
This option is not available for a Sequence file.
Compression Select the Compress the data check box to compress the
output data.
1538
tHDFSOutputRaw
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables FILENAME_PATH: the path of the input file. This is an After
variable and it returns a string.
1539
tHDFSOutputRaw
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
1540
tHDFSOutputRaw
Related Scenario
Once you have properly configured the connection to HDFS for this component, this component works
exactly the same way as tFileOutputRaw.
For further information about tFileOutputRaw, see tFileOutputRaw on page 1153.
1541
tHDFSProperties
tHDFSProperties
Creates a single row flow that displays the properties of a file processed in HDFS.
Basic settings
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
1542
tHDFSProperties
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
1543
tHDFSProperties
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
1544
tHDFSProperties
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
The schema of this component is read-only. You can click
Edit schema to view the schema.
File Browse to, or enter the path pointing to the data to be used
in the file system.
Get file checksum Select this check box to generate and output the MD5
information of the file processed.
Note that this is an HDFS only checksum and not a true
MD5 hash that can be compared with the MD5 value
obtained, for example, from tFileInputProperties, For further
information about this component, see tFileInputProperties
on page 1079.
Advanced settings
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
1545
tHDFSProperties
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1546
tHDFSProperties
Related scenario
Related topic, see Procedure on page 1159
Related topic, see Iterating on a HDFS directory on page 1523
1547
tHDFSPut
tHDFSPut
Connects to Hadoop distributed file system to load large-scale files into it with optimized
performance.
tHDFSPut copies files from an user-defined directory, pastes them into a given Hadoop distributed
file system(HDFS) and if needs be, renames these files.
Basic settings
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
1548
tHDFSPut
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
1549
tHDFSPut
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
1550
tHDFSPut
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.
Local directory Local directory where are stored the files to be loaded into
HDFS.
HDFS directory Browse to, or enter the path pointing to the data to be used
in the file system.
Overwrite file Options to overwrite or not the existing file with the new
one.
Use Perl5 Regex Expression as Filemask Select this check box if you want to use Perl5 regular
expressions in the Files field as file filters. This is useful
when the name of the file to be used contains special
characters such as parentheses.
For information about Perl5 regular expression syntax, see
Perl5 Regular Expression Syntax.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
1551
tHDFSPut
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
1552
tHDFSPut
Related scenario
For related scenario, see Computing data with Hadoop distributed file system on page 1498.
1553
tHDFSRename
tHDFSRename
Renames the selected files or specified directory on HDFS.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
1554
tHDFSRename
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
1555
tHDFSRename
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
1556
tHDFSRename
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.
HDFS directory Browse to, or enter the path pointing to the data to be used
in the file system.
Overwrite file Select the options to overwrite or not the existing file with
the new one.
Files Click the [+] button to add the lines you want to use as
filters:
Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions.
New name: name to give to the HDFS file after the transfer.
Die on error This check box is selected by default. Clear the check box to
skip the row in error and complete the process for error-free
rows.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
1557
tHDFSRename
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1558
tHDFSRename
Related scenario
For related scenario, see Computing data with Hadoop distributed file system on page 1498.
1559
tHDFSRowCount
tHDFSRowCount
Reads a file in HDFS row by row in order to determine the number of rows this file contains.
tHDFSRowCount counts the number of rows in a file in HDFS. If the file to be processed is a Hadoop
sequence file type or a large dataset, it is recommended to use a tAggregateRow to count the records.
Basic settings
Property Type Built-In: You create and store the schema locally for this
component only.
Repository: You have already created the schema and stored
it in the Repository. You can reuse it in various projects and
Job designs.
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
1560
tHDFSRowCount
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
1561
tHDFSRowCount
NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.
Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.
1562
tHDFSRowCount
User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.
File name Browse to, or enter the path pointing to the data to be used
in the file system.
Ignore empty rows Select this check box to skip the empty rows.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
Advanced settings
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
1563
tHDFSRowCount
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
Global Variables COUNT: the number of rows in a file. This is a Flow variable
and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
System.out.pri
nt(((Integer)globalMap.get("
tHDFSRowCount_1_COUNT")));
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
1564
tHDFSRowCount
Related scenarios
No scenario is available for the Standard version of this component yet.
1565
tHiveClose
tHiveClose
Closes connection to a Hive database.
tHiveClose closes an active connection to a database.
Basic settings
Component list If there is more than one connection used in the Job, select
tHiveConnection from the list.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at a
component level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
1566
tHiveClose
Related scenarios
No scenario is available for the Standard version of this component yet.
1567
tHiveConnection
tHiveConnection
Establishes a Hive connection to be reused by other Hive components in your Job.
tHiveConnection opens a connection to a Hive database.
Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:
API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.
Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.
Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.
1568
tHiveConnection
Region From this drop-down list, select the Google Cloud region
to be used.
Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.
Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.
HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.
• The Password is defined when creating your
HDInsight cluster for authentication to this cluster.
Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.
In the Hostname field, enter the Primary Blob Service
Endpoint of your Azure Storage account without the
https:// part. You can find this endpoint in the Properties
blade of this storage account.
1569
tHiveConnection
Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.
Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.
Note:
This field is not available when you select Embedded
from the Connection mode list.
Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave
both the Force MapR ticket authentication check box
and the Use Kerberos authentication check box clear,
and then MapR should be able to automatically find
that ticket on the fly.
1570
tHiveConnection
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.
For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the
keytab file to be used.
Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog
box enter the password between double quotes and
click OK to save the settings.
This feature is available only to the HiveServer2 in the
Standalone mode of the following distributions:
1571
tHiveConnection
Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their
hostnames. This actually sets the dfs.client.use
.datanode.hostname property to true. When
connecting to a S3N filesystem, you must select this
check box.
For further information about these parameters, see
the documentation or contact the administrator of the
Hadoop cluster to be used.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.
Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you
have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.
1572
tHiveConnection
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
1573
tHiveConnection
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Inspect the classpath for configurations Select this check box to allow the component to check the
configuration files in the directory you have set with the
$HADOOP_CONF_DIR variable and directly read parameters
from these files in this directory. This feature allows you to
easily change the Hadoop configuration for the component
to switch between different environments, for example,
from a test environment to a production environment.
In this situation, the fields or options used to configure
Hadoop connection and/or Kerberos security are hidden.
If you want to use certain parameters such as the Kerberos
parameters but these parameters are not included in these
Hadoop configuration files, you need to create a file called
talend-site.xml and put this file into the same directory
defined with $HADOOP_CONF_DIR. This talend-site.xml file
should read as follows:
1574
tHiveConnection
</property>
<property>
<name>talend.encryption </
name>
<value>none </value>
<description> Set the
encryption method to use. Valid
values are: none or ssl. </
description>
</property>
<property>
<name>talend.s
sl.trustStore.path </name>
<value>ssl </value>
<description> Set
SSL trust store path. </
description>
</property>
<property>
<name>talend.s
sl.trustStore.password </name>
<value>ssl </value>
<description> Set SSL
trust store password. </
description>
</property>
</configuration>
The parameters read from these configuration files override
the default ones used by the Studio. When a parameter
does not exist in these configuration files, the default one is
used.
Note that this option is available only in Hive Standalone
mode with Hive 2.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.
Execution engine Select this check box and from the drop-down list, select
the framework you need to use to run the Job.
This list is available only when you are using the Embedded
mode for the Hive connection and the distribution you are
working with is:
• Custom: this option allows you connect to a
distribution supporting Tez but not officially supported
by Talend .
Before using Tez, ensure that the Hadoop cluster you are
using supports Tez. You will need to configure the access to
1575
tHiveConnection
Store by HBase Select this check box to display the parameters to be set to
allow the Hive components to access HBase tables:
• Once this access is configured, you will be able to use,
in tHiveRow and tHiveInput, the Hive QL statements to
read and write data in HBase.
• If you are using the Kerberos authentication, you
need to define the HBase related principals in the
corresponding fields that are displayed.
For further information about this access involving Hive and
HBase, see Apache's Hive documentation about Hive/HBase
integration.
Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.
Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.
Define the jars to register for HBase Select this check box to display the Register jar for HBase
table, in which you can register any missing jar file required
by HBase, for example, the Hive Storage Handler, by default,
registered along with your Hive installation.
Register jar for HBase Click the [+] button to add rows to this table, then, in the
Jar name column, select the jar file(s) to be registered and
in the Jar path column, enter the path(s) pointing to that or
those jar file(s).
Advanced settings
1576
tHiveConnection
• Lib jar: this table appears when you have selected Auto
install from the Tez lib list and the distribution you are
using is Custom. In this table, you need to add the Tez
libraries to be uploaded.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones. For further information for
Hive dedicated properties, see https://cwiki.apache.org/con
fluence/display/Hive/AdminManual+Configuration.
• If you need to use Tez to run your Hive Job, add
hive.execution.engine to the Properties column and
Tez to the Value column, enclosing both of these
strings in double quotation marks.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are
both 1000 which are normally appropriate for running the
computations.
1577
tHiveConnection
Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.
tStatCatcher Statistics Select this check box to collect the log data at a component
level.
Global Variables
Usage
1578
tHiveConnection
After selecting this Custom option, click the button to display the Import custom definition dialog
box and proceed as follows:
Procedure
1. Depending on your situation, select Import from existing version or Import from zip to configure
the custom Hadoop distribution to be connected to.
• If you have the zip file of the custom Hadoop distribution you need to connect to, select
Import from zip. Talend community provides this kind of zip files that you can download
from http://www.talendforge.org/exchange/index.php.
• Otherwise, select Import from existing version to import an officially supported Hadoop
distribution as base so as to customize it by following the wizard.
Note that the check boxes in the wizard allow you to select the Hadoop element(s) you need to
import. All the check boxes are not always displayed in your wizard depending on the context in
which you are creating the connection. For example, if you are creating this connection for a Hive
component, then only the Hive check box appears.
2. Whether you have selected Import from existing version or Import from zip, verify that each check
box next to the Hadoop element you need to import has been selected..
3. Click OK and then in the pop-up warning, click Yes to accept overwriting any custom setup of jar
files previously implemented.
1579
tHiveConnection
Once done, the Custom Hadoop version definition dialog box becomes active.
This dialog box lists the Hadoop elements and their jar files you are importing.
4. If you have selected Import from zip, click OK to validate the imported configuration.
If you have selected Import from existing version as base, you should still need to add more jar
files to customize that version. Then from the tab of the Hadoop element you need to customize,
for example, the HDFS/HCatalog tab, click the [+] button to open the Select libraries dialog box.
5. Select the External libraries option to open its view.
6. Browse to and select any jar file you need to import.
7. Click OK to validate the changes and to close the Select libraries dialog box.
Once done, the selected jar file appears on the list in the tab of the Hadoop element being
configured.
Note that if you need to share the custom Hadoop setup with another Studio, you can
export this custom connection from the Custom Hadoop version definition window using the
1580
tHiveConnection
button.
8. In the Custom Hadoop version definition dialog box, click OK to validate the customized
configuration. This brings you back to the Distribution list in the Basic settings view of the
component.
Results
Now that the configuration of the custom Hadoop version has been set up and you are back to the
Distribution list, you are able to continue to enter other parameters required by the connection.
If the custom Hadoop version you need to connect to contains YARN and you want to use it, select the
Use YARN check box next to the Distribution list.
A video is available in the following link to demonstrate, by taking HDFS as example, how to set up
the connection to a custom Hadoop cluster, also referred to as an unsupported Hadoop distribution:
How to add an unsupported Hadoop distribution to the Studio.
1581
tHiveConnection
The sample data to be used in this scenario is employee information of a company, reading as follows:
1;Lyndon;Fillmore;21-05-2008;US
2;Ronald;McKinley;15-08-2008
3;Ulysses;Roosevelt;05-10-2008
4;Harry;Harrison;23-11-2007
5;Lyndon;Garfield;19-07-2007
6;James;Quincy;15-07-2008
7;Chester;Jackson;26-02-2008
8;Dwight;McKinley;16-07-2008
9;Jimmy;Johnson;23-12-2007
10;Herbert;Fillmore;03-04-2008
The information contains some employees' names and the dates when they are registered in a HR
system. Since these employees work for the US subsidiary of the company, you will create a US
partition for this sample data.
Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to
access the Hive database to be used.
Note that if you are using the Windows operating system, you have to create a tmp folder at the root
of the disk where the Studio is installed.
Then proceed as follows:
1582
tHiveConnection
For further information about how to create a Job, see the chapter describing how to designing a
Job in Talend Studio User Guide.
2. Drop tHiveConnection, tHiveCreateTable and tHiveLoad onto the workspace.
3. Connect them using the Trigger > On Subjob OK link.
Procedure
1. Double-click tHiveConnection to open its Component view.
2. From the Property type list, select Built-in. If you have created the connection to be used in
Repository, then select Repository, click the button to open the Repository content dialog
box and select that connection. This way, the Studio will reuse that set of connection information
for this Job.
For further information about how to create a Hadoop connection in Repository, see the chapter
describing the Hadoop cluster node of the Talend Open Studio for Big Data Getting Started Guide .
3. In the Version area, select the Hadoop distribution to be used and its version. If you cannot find
from the list the distribution corresponding to yours, select Custom so as to connect to a Hadoop
distribution not officially supported in the Studio.
For a step-by-step example about how to use this Custom option, see Connecting to a custom
Hadoop distribution on page 1579.
4. In the Connection area, enter the connection parameters to the Hive database to be used.
1583
tHiveConnection
5. In the Name node field, enter the location of the master node, the NameNode, of the distribution
to be used. For example, talend-hdp-all:50300. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS with SSL is not supported yet.
6. In the Job tracker field, enter the location of the JobTracker of your distribution. For example,
hdfs://talend-hdp-all:8020.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs
described in Apache's documentation on http://hadoop.apache.org/.
Configuring tHiveConnection
Procedure
1. Double-click tHiveConnection to open its Component view.
2. From the Property type list, select Built-in. If you have created the connection to be used in
Repository, then select Repository, click the button to open the Repository content dialog
box and select that connection. This way, the Studio will reuse that set of connection information
for this Job.
For further information about how to create a Hadoop connection in Repository, see the chapter
describing the Hadoop cluster node of the Talend Open Studio for Big Data Getting Started Guide .
3. In the Version area, select the Hadoop distribution to be used and its version. If you cannot find
from the list the distribution corresponding to yours, select Custom so as to connect to a Hadoop
distribution not officially supported in the Studio.
For a step-by-step example about how to use this Custom option, see Connecting to a custom
Hadoop distribution on page 1579.
4. In the Connection area, enter the connection parameters to the Hive database to be used.
1584
tHiveConnection
5. In the Name node field, enter the location of the master node, the NameNode, of the distribution
to be used. For example, talend-hdp-all:50300. If you are using WebHDFS, the location should be
webhdfs://masternode:portnumber; WebHDFS with SSL is not supported yet.
6. In the Job tracker field, enter the location of the JobTracker of your distribution. For example,
hdfs://talend-hdp-all:8020.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs
described in Apache's documentation on http://hadoop.apache.org/.
Procedure
1. Double-click tHiveCreateTable to open its Component view.
2. Select the Use an existing connection check box and from Component list, select the connection
configured in the tHiveConnection component you are using for this Job.
3. Click the button next to Edit schema to open the schema editor.
4. Click the button four times to add four rows and in the Column column, rename them to Id,
FirstName, LastName and Reg_date, respectively.
1585
tHiveConnection
Note that you cannot use the Hive reserved keywords to name the columns, such as location or
date.
5. In the Type column, select the type of the data in each column. In this scenario, Id is of the Integer
type, Reg_date is of the Date type and the others are of the String type.
6. In the DB type column, select the Hive type of each column corresponding to their data types you
have defined. For example, Id is of INT and Reg_date is of TIMESTAMP.
7. In the Data pattern column, define the pattern corresponding to that of the raw data. In this
example, use the default one.
8. Click OK to validate these changes.
Procedure
1. In Table name field, enter the name of the Hive table to be created. In this scenario, it is
employees.
2. From the Action on table list, select Create table if not exists.
3. From the Format list, select the data format that this Hive table in question is created for. In this
scenario, it is TEXTFILE.
4. Select the Set partitions check box to add the US partition as explained at the beginning of this
scenario. To define this partition, click the button next to Edit schema that appears.
5. Leave the Set file location check box clear to use the default path for Hive table.
6. Select the Set Delimited row format check box to display the available options of row format.
7. Select the Field check box and enter a semicolon (;) as field separator in the field that appears.
8. Select the Line check box and leave the default value as line separator.
1586
tHiveConnection
Procedure
1. Double-click tHiveLoad to open its Component view.
2. Select the Use an existing connection check box and from Component list, select the connection
configured in the tHiveConnection component you are using for this Job.
3. From the Load action field, select LOAD to write data from the file holding the sample data that is
presented at the beginning of this scenario.
4. In the File path field, enter the directory where the sample data is stored. In this example, the
data is stored in the HDFS system to be used.
In the real-world practice, you can use tHDFSOutput to write data into the HDFS system and you
need to ensure that the Hive application has the appropriate rights and permissions to read or
even move the data.
For further information about tHDFSOutput, see tHDFSOutput on page 1528.
for further information about the related rights and permissions, see the documentation or
contact the administrator of the Hadoop cluster to be used.
Note if you need to read data from a local file system other than the HDFS system, ensure that the
data to be read is stored in the local file system of the machine in which the Job is run and then
select the Local check box in this Basic settings view. For example, when the connection mode to
Hive is Standalone, the Job is run in the machine where the Hive application is installed and thus
the data should be stored in that machine.
5. In the Table name field, enter the name of the target table you need to load data in. In this
scenario, it is employees.
6. From the Action on file list, select APPEND.
7. Select the Set partitions check box and in the field that appears, enter the partition you need to
add data to. In this scenario, this partition is country='US'.
Configuring tHiveLoad
Procedure
1. Double-click tHiveLoad to open its Component view.
1587
tHiveConnection
2. Select the Use an existing connection check box and from Component list, select the connection
configured in the tHiveConnection component you are using for this Job.
3. From the Load action field, select LOAD to write data from the file holding the sample data that is
presented at the beginning of this scenario.
4. In the File path field, enter the directory where the sample data is stored. In this example,
the data is stored in the HDFS system to be used. In the real-world practice, you can use
tHDFSOutput to write data into the HDFS system and you need to ensure that the Hive application
has the appropriate rights and permissions to read or even move the data.
For further information about the related rights and permissions, see the documentation or
contact the administrator of the Hadoop cluster to be used.
Note if you need to read data from a local file system other than the HDFS system, ensure that the
data to be read is stored in the local file system of the machine in which the Job is run and then
select the Local check box in this Basic settings view. For example, when the connection mode to
Hive is Standalone, the Job is run in the machine where the Hive application is installed and thus
the data should be stored in that machine.
5. In the Table name field, enter the name of the target table you need to load data in. In this
scenario, it is employees.
6. From the Action on file list, select APPEND.
7. Select the Set partitions check box and in the field that appears, enter the partition you need to
add data to. In this scenario, this partition is country='US'.
1588
tHiveConnection
If you need to obtain more details about the Job, it is recommended to use the web console of the
Jobtracker provided by the Hadoop distribution you are using.
1589
tHiveConnection
Prerequisites
Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to
access the Hive database to be used.
Procedure
1. In the Repository view, extend the Metadata drop-down menu.
2. Click Db Connections, and then right-click Create Connection .
3. Give a name to your connection.
4. Click Next.
5. Set up the connection configuration similarly to the following table:
1590
tHiveConnection
1591
tHiveConnection
6. Click Test Connection to ensure the Talend Studio connects successfully to the cluster.
1592
tHiveConnection
1593
tHiveConnection
1594
tHiveConnection
18. Connect the tPostJob component to the tHiveClose component using an On Component Ok
connection to close the connection opened.
19. From the Run tab, click Run to run the Job and ensure of a successful connection to Hive on
HDInsight and of the readability of the table data.
1595
tHiveCreateTable
tHiveCreateTable
Creates Hive tables that fit a wide range of Hive data formats.
A proper Hive data format such as RC or ORC allows you to obtain a better performance in processing
data with Hive.
tHiveCreateTable connects to the Hive database to be used and creates a Hive table that is dedicated
to data of the format you specify.
Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:
API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.
Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.
Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.
1596
tHiveCreateTable
Region From this drop-down list, select the Google Cloud region
to be used.
Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.
Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.
HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.
• The Password is defined when creating your
HDInsight cluster for authentication to this cluster.
Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.
In the Hostname field, enter the Primary Blob Service
Endpoint of your Azure Storage account without the
https:// part. You can find this endpoint in the Properties
blade of this storage account.
1597
tHiveCreateTable
Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.
Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.
Note:
This field is not available when you select Embedded
from the Connection mode list.
Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave
both the Force MapR ticket authentication check box
and the Use Kerberos authentication check box clear,
and then MapR should be able to automatically find
that ticket on the fly.
1598
tHiveCreateTable
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.
For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the
keytab file to be used.
Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog
box enter the password between double quotes and
click OK to save the settings.
This feature is available only to the HiveServer2 in the
Standalone mode of the following distributions:
1599
tHiveCreateTable
Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their
hostnames. This actually sets the dfs.client.use
.datanode.hostname property to true. When
connecting to a S3N filesystem, you must select this
check box.
For further information about these parameters, see
the documentation or contact the administrator of the
Hadoop cluster to be used.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.
Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you
have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.
1600
tHiveCreateTable
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
1601
tHiveCreateTable
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
1602
tHiveCreateTable
Built-In: You create and store the schema locally for this
component only.
Action on table Select the action to be carried out for creating a table.
Inputformat class and Outputformat class These fields appear only when you have selected
INPUTFORMAT and OUTPUTFORMAT from the Format list.
These fields allow you to enter the name of the jar files to
be used for the data formats not available in the Format list.
Storage class Enter the name of the storage handler to be used for
creating a non-native table (Hive table stored and managed
in other systems than Hive, for example, Cassandra or
MongoDB).
This field is available only when you have selected
STORAGE from the Format list.
For further information about a storage handler, see https://
cwiki.apache.org/confluence/display/Hive/StorageHandlers.
Set partitions Select this check box to add partition columns to the table
to be created. Once selecting it, you need to define the
schema of the partition columns you need to add.
1603
tHiveCreateTable
Set file location If you want to create a Hive table in a directory other
than the default one, select this check box and enter the
directory in HDFS you want to use to hold the table content.
This is typical useful when you need to create an external
Hive table by selecting the Create an external table check
box in the Advanced settings tab.
Use S3 endpoint The Use S3 endpoint check box is displayed when you
have selected the Set file location check box to create an
external Hive table.
Once this Use S3 endpoint check box is selected, you need
to enter the following parameters in the fields that appear:
• S3 bucket: enter the name of the bucket in which you
need to create the table.
• Bucket name: enter the name of the bucket in which
you want to store the dependencies of your Job. This
bucket must already exist on S3.
• Temporary resource folder: enter the directory in
which you want to store the dependencies of your Job.
For example, enter temp_resources to write the
dependencies in the /temp_resources folder in the
bucket.
If this folder already exists at runtime, its contents are
overwritten by the upcoming dependencies; otherwise,
this folder is automatically created.
• Access key and Secret key: enter the authentication
information required to connect to the Amazon S3
bucket to be used.
To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog box
enter the password between double quotes and click
OK to save the settings.
Note that the format of the S3 file is S3N (S3 Native
Filesystem).
Since a Hive table created in S3 is actually an external
table, this Use S3 endpoint check box must be used with
the Create an external table case being selected.
Advanced settings
Like table Select this check box and enter the name of the Hive table
you want to copy. This allows you to copy the definition of
an existing table without copying its data.
For further information about the Like parameter, see
Apache's information about Hive's Data Definition
Language.
Create an external table Select this check box to make the table to be created an
external Hive table. This kind of Hive table leaves the raw
data where it is if the data is in HDFS.
An external table is usually the better choice for accessing
shared data existing in a file system.
For further information about an external Hive table, see
Apache's documentation about Hive.
1604
tHiveCreateTable
Table comment Enter the description you want to use for the table to be cre
ated.
As select Select this check box and enter the As select state
ment for creating a Hive table that is based on a Select
statement.
Set clustered_by or skewed_by statement Enter the Clustered by statement to cluster the data of
a table or a partition into buckets, or/and enter the Skewed
by statement to allow Hive to extract the heavily skewed
data and put it into separate files. This is typically used for
obtaining better performance during queries.
SerDe properties If you are using the SerDe row format, you can add any
custom SerDe properties to override the default ones used
by the Hadoop engine of the Studio.
Table properties Add any custom Hive table properties you want to override
the default ones used by the Hadoop engine of the Studio.
Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones. For further information for
Hive dedicated properties, see https://cwiki.apache.org/con
fluence/display/Hive/AdminManual+Configuration.
1605
tHiveCreateTable
Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are
both 1000 which are normally appropriate for running the
computations.
The memory parameters to be set are Map (in Mb), Reduce
(in Mb) and ApplicationMaster (in Mb). These fields allow
you to dynamically allocate memory to the map and the
reduce computations and the ApplicationMaster of YARN.
Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
1606
tHiveCreateTable
Die on error
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1607
tHiveCreateTable
Related scenario
For a related scenario, see Creating a partitioned Hive table on page 1582.
1608
tHiveInput
tHiveInput
Extracts data from Hive and sends the data to the component that follows.
tHiveInput is the dedicated component to the Hive database (the Hive data warehouse system). It can
execute a given HiveQL query in order to extract the data from Hive.
When ACID is enabled on the Hive side, a Spark Job cannot delete or update a table and unless data is
compacted, this Job cannot correctly read aggregated data from a Hive table, either. This is a known
limitation described in the Spark bug tracking system: https://issues.apache.org/jira/browse/SPAR
K-15348.
Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:
API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.
Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.
Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.
1609
tHiveInput
If you are not certain about your project ID, check it in the
Manage Resources page of your Google Cloud Platform
services.
Region From this drop-down list, select the Google Cloud region
to be used.
Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.
Access Key and Secret Key Enter the authentication information obtained from
Google for tHiveInput to read temporary data from
Google Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the
project from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter
the password between double quotes and click OK to
save the settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.
HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.
1610
tHiveInput
Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.
In the Hostname field, enter the Primary Blob Service
Endpoint of your Azure Storage account without the
https:// part. You can find this endpoint in the Properties
blade of this storage account.
In the Username field, enter the name of the Azure
Storage account to be used.
In the Password field, enter the access key of the Azure
Storage account to be used. This key can be found in the
Access keys blade of this storage account.
Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.
Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.
Note:
This field is not available when you select Embedded
from the Connection mode list.
1611
tHiveInput
Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave
both the Force MapR ticket authentication check box
and the Use Kerberos authentication check box clear,
and then MapR should be able to automatically find
that ticket on the fly.
The values of the following parameters can be found in
the hive-site.xml file of the Hive system to be used.
1. Hive principal uses the value of hive.metastore
.kerberos.principal. This is the service principal of the
Hive Metastore.
2. HiveServer2 local user principal uses the value of
hive.server2.authentication.kerberos.principal.
3. HiveServer2 local user keytab uses the value of
hive.server2.authentication.kerberos.keytab
4. Metastore URL uses the value of javax.jdo.opti
on.ConnectionURL. This is the JDBC connection string
to the Hive Metastore.
5. Driver class uses the value of javax.jdo.opti
on.ConnectionDriverName. This is the name of the
driver for the JDBC connection.
6. Username uses the value of javax.jdo.opti
on.ConnectionUserName. This, as well as the
Password parameter, is the user credential for
connecting to the Hive Metastore.
7. Password uses the value of javax.jdo.opti
on.ConnectionPassword.
For the other parameters that are displayed, please
consult the Hadoop configuration files they belong to.
For example, the Namenode principal can be found in
the hdfs-site.xml file or the hdfs-default.xml file of the
distribution you are using.
This check box is available depending on the Hadoop
distribution you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.
1612
tHiveInput
Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog
box enter the password between double quotes and
click OK to save the settings.
This feature is available only to the HiveServer2 in the
Standalone mode of the following distributions:
• Hortonworks Data Platform 2.0 +
• Cloudera CDH4 +
• Pivotal HD 2.0 +
• Amazon EMR 4.0.0 +
Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their
1613
tHiveInput
Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you
have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
1614
tHiveInput
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
1615
tHiveInput
Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.
Guess schema Click this button to retrieve the schema from the table.
This query uses Parquet objects When available, select this check box to indicate that the
table to be handled uses the PARQUET format and thus
make the component to call the required jar file.
Note that when the file format to be used is PARQUET, you
might be prompted to find the specific PARQUET jar file and
install it into the Studio.
• When the connection mode to Hive is Embedded,
the Job is run in your local machine and calls this jar
installed in the Studio.
• When the connection mode to Hive is Standalone,
the Job is run in the server hosting Hive and this jar
file is sent to the HDFS system of the cluster you are
connecting to. Therefore, ensure that you have properly
defined the NameNode URI in the corresponding field
of the Basic settings view.
This jar file can be downloaded from Apache's site. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).
1616
tHiveInput
Execution engine Select this check box and from the drop-down list, select
the framework you need to use to run the Job.
This list is available only when you are using the Embedded
mode for the Hive connection and the distribution you are
working with is:
• Custom: this option allows you connect to a
distribution supporting Tez but not officially supported
by Talend .
Before using Tez, ensure that the Hadoop cluster you are
using supports Tez. You will need to configure the access to
the relevant Tez libraries via the Advanced settings view of
this component.
For further information about Hive on Tez, see Apache's
related documentation in https://cwiki.apache.org/con
fluence/display/Hive/Hive+on+Tez. Some examples are
presented there to show how Tez can be used to gain
performance over MapReduce.
Advanced settings
1617
tHiveInput
Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
Note:
Clear the Trim all the String/Char columns check box to
enable Trim column in this field.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones. For further information for
Hive dedicated properties, see https://cwiki.apache.org/con
fluence/display/Hive/AdminManual+Configuration.
• If you need to use Tez to run your Hive Job, add
hive.execution.engine to the Properties column and
Tez to the Value column, enclosing both of these
strings in double quotation marks.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
1618
tHiveInput
Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are
both 1000 which are normally appropriate for running the
computations.
The memory parameters to be set are Map (in Mb), Reduce
(in Mb) and ApplicationMaster (in Mb). These fields allow
you to dynamically allocate memory to the map and the
reduce computations and the ApplicationMaster of YARN.
Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
1619
tHiveInput
Note:
Available only when the Use an existing connection
check box is clear
Zookeeper quorum
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1620
tHiveInput
Related scenarios
For a scenario about how an input component is used in a Job, see Writing columns from a MySQL
database to an output file using tMysqlInput on page 2440.
You need to keep in mind the parameters required by Hadoop, such as NameNode and Jobtracker,
when configuring this component since the component needs to connect to a Hadoop distribution.
1621
tHiveLoad
tHiveLoad
Writes data of different formats into a given Hive table or to export data from a Hive table to a
directory.
tHiveLoad connects to a given Hive database and copies or moves data into an existing Hive table or
a directory you specify.
The tHiveLoad component first prepares the lines to be written to Hive before eventually writing
them to Hive. This approach is more efficient with regard to Hive than the line-bye-line approach
typically employed by an output component. For this reason, tHiveOutput does not exist in a Job
designed in the Standard framework.
Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:
API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.
Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.
Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.
1622
tHiveLoad
If you are not certain about your project ID, check it in the
Manage Resources page of your Google Cloud Platform
services.
Region From this drop-down list, select the Google Cloud region
to be used.
Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.
Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.
HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.
• The Password is defined when creating your
HDInsight cluster for authentication to this cluster.
Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.
1623
tHiveLoad
Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.
Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.
Note:
This field is not available when you select Embedded
from the Connection mode list.
Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave
1624
tHiveLoad
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.
For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the
keytab file to be used.
Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog
1625
tHiveLoad
Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their
hostnames. This actually sets the dfs.client.use
.datanode.hostname property to true. When
connecting to a S3N filesystem, you must select this
check box.
For further information about these parameters, see
the documentation or contact the administrator of the
Hadoop cluster to be used.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.
Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you
1626
tHiveLoad
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
1627
tHiveLoad
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Load action Select the action you need to carry for writing data into the
specified destination.
• When you select LOAD, you are moving or copying data
from a directory you specify.
• When you select INSERT, you are moving or copying
data based on queries.
Execution engine Select this check box and from the drop-down list, select
the framework you need to use to perform the INSERT
action.
This list is available only when you are using the Embedded
mode for the Hive connection and the distribution you are
working with is:
• Custom: this option allows you connect to a
distribution supporting Tez but not officially supported
by Talend .
1628
tHiveLoad
Before using Tez, ensure that the Hadoop cluster you are
using supports Tez. You will need to configure the access to
the relevant Tez libraries via the Advanced settings view of
this component.
For further information about Hive on Tez, see Apache's
related documentation in https://cwiki.apache.org/con
fluence/display/Hive/Hive+on+Tez. Some examples are
presented there to show how Tez can be used to gain
performance over MapReduce.
Target type This drop-down list appears only when you have selected
INSERT from the Load action list.
Select from this list the type of the location you need to
write data in.
• If you select Table as destination, you can still choose
to append data to or overwrite the contents in the
specified table.
• If you select Directory as destination, you are
overwriting the contents in the specified directory
Table name Enter the name of the Hive table you need to write data in.
Note that with the INSERT action, this field is available only
when you have selected Table from the Target type list.
File path Enter the directory you need to read data from or write data
in, depending on the action you have selected from the
Load action list.
• If you have selected LOAD: this is the path to the data
you want to copy or move into the specified Hive table.
• If you have selected INSERT: this is the directory to
which you want to export data from a Hive table. With
this action, the File path field is available only when
you have selected Directory from the Target type list.
The target table uses the Parquet format If the table in which you need to write data is a PARQUET
table, select this check box.
Note that when the file format to be used is PARQUET, you
might be prompted to find the specific PARQUET jar file and
install it into the Studio.
• When the connection mode to Hive is Embedded,
the Job is run in your local machine and calls this jar
installed in the Studio.
• When the connection mode to Hive is Standalone,
the Job is run in the server hosting Hive and this jar
file is sent to the HDFS system of the cluster you are
connecting to. Therefore, ensure that you have properly
defined the NameNode URI in the corresponding field
of the Basic settings view.
This jar file can be downloaded from Apache's site. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).
Then from the Compression list that appears, select the
compression mode you need to use to handle the PARQUET
file. The default mode is Uncompressed.
Action on file Select the action to be carried out for writing data.
1629
tHiveLoad
Query This field appears when you have selected INSERT from the
Load action list.
Enter the appropriate query for selecting the data to be
exported to the specified Hive table or directory.
Local Select this check box to use the Hive LOCAL statement for
accessing a local directory. Note that this local directory is
actually in the machine in which the Job is run. Therefore,
when the connection mode to Hive is Standalone, the Job is
run in the machine where the Hive application is installed
and thus this local directory is in that machine.
This statement is used along with the directory you have
defined in the File path field. Therefore, this Local check
box is available only when the File path field is available.
• If you are using the LOAD action, tHiveLoad copies the
local data to the target table.
• If you are using the INSERT action, tHiveLoad copies
data to a local directory.
• If you leave this Local check box clear, the directory
defined in the File path field is assumed to be in the
HDFS system to be used and data will be moved to the
target location.
For further information about this LOCAL statement, see
Apache's documentation about Hive's Language.
Set partitions Select this check box to use the Hive Partition clause in
loading or inserting data in a Hive table. You need to enter
the partition keys and their values to be used in the field
that appears.
For example, enter contry='US', state='CA'. This makes a
partition clause reading Partition (contry='US',
state='CA'), that is to say, a US and CA partition.
Also, it is recommended to select the Create partition if not
exist check box that appears to ensure that you will not
create a duplicate partition.
Die on error Select this check box to kill the Job when an error occurs.
Advanced settings
1630
tHiveLoad
• Lib jar: this table appears when you have selected Auto
install from the Tez lib list and the distribution you are
using is Custom. In this table, you need to add the Tez
libraries to be uploaded.
Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.
Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones. For further information for
Hive dedicated properties, see https://cwiki.apache.org/con
fluence/display/Hive/AdminManual+Configuration.
• If you need to use Tez to run your Hive Job, add
hive.execution.engine to the Properties column and
Tez to the Value column, enclosing both of these
strings in double quotation marks.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce
1631
tHiveLoad
Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
1632
tHiveLoad
Related scenario
For a related scenario, see Creating a partitioned Hive table on page 1582
1633
tHiveRow
tHiveRow
Acts on the actual DB structure or on the data without handling data itself, depending on the nature
of the query and the database.
tHiveRow executes the HiveQL query stated in the specified database. The row suffix means the
component implements a flow in the Job design although it does not provide output.
The SQLBuilder tool helps you write your HiveQL statements easily.
This component can also perform queries in a HBase database once the Store by HBase check box is
available and you have selected this check box.
Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:
API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.
Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.
Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.
1634
tHiveRow
If you are not certain about your project ID, check it in the
Manage Resources page of your Google Cloud Platform
services.
Region From this drop-down list, select the Google Cloud region
to be used.
Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.
Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.
HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.
• The Password is defined when creating your
HDInsight cluster for authentication to this cluster.
Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.
1635
tHiveRow
Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.
Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.
Note:
This field is not available when you select Embedded
from the Connection mode list.
Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave
1636
tHiveRow
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.
For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the
keytab file to be used.
Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog
1637
tHiveRow
Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their
hostnames. This actually sets the dfs.client.use
.datanode.hostname property to true. When
connecting to a S3N filesystem, you must select this
check box.
For further information about these parameters, see
the documentation or contact the administrator of the
Hadoop cluster to be used.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.
Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you
1638
tHiveRow
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
1639
tHiveRow
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Execution engine Select this check box and from the drop-down list, select
the framework you need to use to run the Job.
This list is available only when you are using the Embedded
mode for the Hive connection and the distribution you are
working with is:
• Custom: this option allows you connect to a
distribution supporting Tez but not officially supported
by Talend .
Before using Tez, ensure that the Hadoop cluster you are
using supports Tez. You will need to configure the access to
the relevant Tez libraries via the Advanced settings view of
this component.
For further information about Hive on Tez, see Apache's
related documentation in https://cwiki.apache.org/con
fluence/display/Hive/Hive+on+Tez. Some examples are
1640
tHiveRow
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.
This query uses Parquet objects When available, select this check box to indicate that the
table to be handled uses the PARQUET format and thus
make the component to call the required jar file.
Note that when the file format to be used is PARQUET, you
might be prompted to find the specific PARQUET jar file and
install it into the Studio.
• When the connection mode to Hive is Embedded,
the Job is run in your local machine and calls this jar
installed in the Studio.
• When the connection mode to Hive is Standalone,
the Job is run in the server hosting Hive and this jar
file is sent to the HDFS system of the cluster you are
connecting to. Therefore, ensure that you have properly
defined the NameNode URI in the corresponding field
of the Basic settings view.
1641
tHiveRow
This jar file can be downloaded from Apache's site. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Store by HBase Select this check box to display the parameters to be set to
allow the Hive components to access HBase tables:
• Once this access is configured, you will be able to use,
in tHiveRow and tHiveInput, the Hive QL statements to
read and write data in HBase.
• If you are using the Kerberos authentication, you
need to define the HBase related principals in the
corresponding fields that are displayed.
For further information about this access involving Hive and
HBase, see Apache's Hive documentation about Hive/HBase
integration.
Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.
Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.
Define the jars to register for HBase Select this check box to display the Register jar for HBase
table, in which you can register any missing jar file required
by HBase, for example, the Hive Storage Handler, by default,
registered along with your Hive installation.
Register jar for HBase Click the [+] button to add rows to this table, then, in the
Jar name column, select the jar file(s) to be registered and
in the Jar path column, enter the path(s) pointing to that or
those jar file(s).
1642
tHiveRow
Advanced settings
Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component is
usually followed by tParseRecordSet.
Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
1643
tHiveRow
Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones. For further information for
Hive dedicated properties, see https://cwiki.apache.org/con
fluence/display/Hive/AdminManual+Configuration.
• If you need to use Tez to run your Hive Job, add
hive.execution.engine to the Properties column and
Tez to the Value column, enclosing both of these
strings in double quotation marks.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are
both 1000 which are normally appropriate for running the
computations.
Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
1644
tHiveRow
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1645
tHiveRow
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.
1646
tHiveRow
• Ensure that you have installed the MapR client in the machine where the Studio is, and added the
MapR client library to the PATH variable of that machine. According to MapR's documentation,
the library or libraries of a MapR client corresponding to each OS version can be found under
MAPR_INSTALL\ hadoop\hadoop-VERSION\lib\native. For example, the library for Windows is
\lib\native\MapRClient.dll in the MapR client jar file. For further information, see the following
link from MapR: http://www.mapr.com/blog/basic-notes-on-configuring-eclipse-as-a-hadoop-
development-environment-for-mapr.
Without adding the specified library or libraries, you may encounter the following error: no
MapRClient in java.library.path.
• This section explains only the authentication parameters to be used to connect to MapR. You still
need to define the other parameters required by your Job.
For further information, see the documentation about each component you are using.
Procedure
1. Select the Force MapR ticket authentication check box to display the related parameters to be
defined.
2. In the Username field, enter the username to be authenticated and in the Password field, specify
the password used by this user.
To enter the password, click the [...] button next to the password field, and then in the pop-up
dialog box enter the password between double quotes and click OK to save the settings.
A MapR security ticket is generated for this user by MapR and stored in the machine where the Job
you are configuring is executed.
3. If the Group field is available in this tab, you need to enter the name of the group to which the
user to be authenticated belongs.
4. In the Cluster name field, enter the name of the MapR cluster you want to use this username to
connect to.
This cluster name can be found in the mapr-clusters.conf file located in /opt/mapr/conf of the
cluster.
5. In the Ticket duration field, enter the length of time (in seconds) during which the ticket is valid.
Setting the environment variable for a custom MapR ticket location (optional)
If the default MapR ticket location, /tmp/maprticket_<uid>, has been changed, set
MAPR_TICKETFILE_LOCATION environment variable accordingly in the machine in which your Job is
executed.
As MapR does not provide any API to specify a MapR ticket, setting the environment variable is the
only way to use a custom MapR ticket location in your Job. For further information about this issue,
see this post from the MapR forum.
This procedure is necessary only when you are storing the MapR tickets in a custom location. If you
use the default MapR ticket location, skip this procedure.
1647
tHiveRow
Setting the environment variable for a custom MapR ticket location on Mac (optional)
Procedure
1. In the machine in which your Job is executed, add these lines to ~/.bashrc:
Example
export MAPR_TICKETFILE_LOCATION=/Users/$USER/maprticket_$UID
launchctl setenv MAPR_TICKETFILE_LOCATION /Users/$USER/maprticket_$UID
2. Shutdown your Studio if it is open and each and every time you boot your Mac workstation, open a
terminal session before starting the Studio.
Setting the environment variable for a custom MapR ticket location on other operating systems
(optional)
Procedure
1. In the machine in which your Job is executed, run the following command in a commandline
terminal to set the MAPR_TICKETFILE_LOCATION variable in memory.
Example
set MAPR_TICKETFILE_LOCATION=<your_custom_location>
2. Shutdown your Studio if it is open and use the same terminal to restart your Studio.
If you use a Talend JobServer to run your Job, use the same terminal to restart this JobServer.
This way, your Job retrieves this custom location from memory.
If the default security configuration of your MapR cluster has been changed, you need to configure the
Job to be executed to take this custom security configuration into account.
MapR specifies its security configuration in the mapr.login.conf file located in /opt/mapr/conf of the
cluster. For further information about this configuration file and the Java service it uses behind, see
mapr.login.conf and JAAS.
If no change has been made in the mapr.login.conf file, skip this procedure.
1648
tHiveRow
Procedure
1. Verify what has been changed about this mapr.login.conf file.
You should be able to obtain the related information from the administrator or the developer of
your MapR cluster.
2. If the location of the MapR configuration files has been changed to somewhere else in the
cluster, that is to say, the MapR Home directory has been changed, select the Set the MapR Home
directory check box and enter the new Home directory. Otherwise, leave this check box clear and
the default Home directory is used.
3. If the login module to be used in the mapr.login.conf file has been changed, select the Specify the
Hadoop login configuration check box and enter the module to be called from the mapr.login.conf
file. Otherwise, leave this check box clear and the default login module is used.
For example, enter kerberos to call the hadoop_kerberos module or hybrid to call the hadoop_hybrid
module.
Related scenarios
For related topics, see:
• Combining two flows for selective output on page 2503.
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.
You need to keep in mind the parameters required by Hadoop, such as NameNode and Jobtracker,
when configuring this component since the component needs to connect to a Hadoop distribution.
1649
tHSQLDbInput
tHSQLDbInput
Executes a DB query with a strictly defined order which must correspond to the schema definition and
then it passes on the field list to the next component via a Main row link.
tHSQLDbInput reads a database and extracts fields based on a query.
Basic settings
Running Mode Select on the list the Server Mode corresponding to your DB
setup among the four propositions :
HSQLDb Server, HSQLDb WebServer, HSQLDb In Process
Persistent, HSQLDb In Memory.
Use TLS/SSL sockets Select this check box to enable the secured mode if req
uired.
1650
tHSQLDbInput
Note:
By default, if the database you specify in this field does
not exist, it will be created automatically. If you want
to change this default setting, modify the connection
parameter set in the Additional JDBC parameter field in
the Advanced settings view
Db name Enter the database name that you want to connect to. This
field is available only to the HSQLDb In Process Persistent
running mode and the HSQLDb In Memory running mode.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
Advanced settings
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
1651
tHSQLDbInput
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Note:
A Flow variable means it functions during the execution
of a component while an After variable means it
functions after the execution of a component.
Usage
Usage rule This component covers all possible SQL queries for
HSQLDb databases.
Related scenarios
For related topics, see:
1652
tHSQLDbOutput
tHSQLDbOutput
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tHSQLDbOutput writes, updates, makes changes or suppresses entries in a database.
Basic settings
Running Mode Select on the list the Server Mode corresponding to your DB
setupamong the four propositions :
HSQLDb Server, HSQLDb WebServer, HSQLDb In Process
Persistent, HSQLDb In Memory.
Use TLS/SSL sockets Select this check box to enable the secured mode if req
uired.
1653
tHSQLDbOutput
Note:
By default, if the database you specify in this field does
not exist, it will be created automatically. If you want
to change this default setting, modify the connection
parameter set in the Additional JDBC parameter field in
the Advanced settings view
Db name Enter the database name that you want to connect to. This
field is available only to the HSQLDb In Process Persistent
running mode and the HSQLDb In Memory running mode.
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.
Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
1654
tHSQLDbOutput
Built-In: You create and store the schema locally for this
component only.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Note:
You can press Ctrl+Space to access a list of predefined
global variables.
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
1655
tHSQLDbOutput
Use field options Select this check box to customize a request, especially
when there is double action on data.
Debug query mode Select this check box to display each step during processing
entries in a database.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
1656
tHSQLDbOutput
Related scenarios
For related topics, see
• Inserting a column and altering data using tMysqlOutput on page 2466.
1657
tHSQLDbRow
tHSQLDbRow
Acts on the actual DB structure or on the data (although without handling data), depending on the
nature of the query and the database.
The SQLBuilder tool helps you write easily your SQL statements.
tHSQLDbRow is the specific component for this database query. It executes the SQL query stated onto
the specified database. The row suffix means the component implements a flow in the job design
although it doesn't provide output.
Basic settings
Running Mode Select on the list the Server Mode corresponding to your DB
setup among the four propositions :
HSQLDb Server, HSQLDb WebServer, HSQLDb In Process
Persistent, HSQLDb In Memory.
Use TLS/SSL sockets Select this check box to enable the secured mode if req
uired.
1658
tHSQLDbRow
Note:
By default, if the database you specify in this field does
not exist, it will be created automatically. If you want
to change this default setting, modify the connection
parameter set in the Additional JDBC parameter field in
the Advanced settings view
Database Enter the database name that you want to connect to. This
field is available only to the HSQLDb In Process Persistent
running mode and the HSQLDb In Memory running mode.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
1659
tHSQLDbRow
Advanced settings
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Note:
A Flow variable means it functions during the execution
of a component while an After variable means it
functions after the execution of a component.
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
1660
tHSQLDbRow
Related scenarios
For related topics, see:
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.
1661
tHttpRequest
tHttpRequest
Sends an HTTP request to the server and outputs the response information locally.
tHttpRequest sends an HTTP request to the server end and gets the corresponding response
information from the server end.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.
Sync columns Click this button to retrieve the schema from the preceding
component.
1662
tHttpRequest
Post parameters from file Browse to, or enter the path to the file that is used to
provide parameters (request body) to the POST method.
Write response content to file Select this check box to save the HTTP response to a local
file. You can either type in the file path in the input field or
click the three-dot button to browse to the file path.
Create directory if not exists Select this check box to create the directory defined in the
Write response content to file field if it does not exist.
This check box appears only when the Write response
content to file check box is selected and is cleared by
default.
Need authentication Select this check box to fill in a user name and a password
in the corresponding fields if authentication is needed:
user: Fill in the user name for the authentication.
password: Fill in the password for the authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows.
Advanced settings
Set timeout Select this check box to specify the connect and read
timeout values in the following two fields:
• Connect timeout(s): Enter the connect timeout value in
seconds. An exception will occur if the timeout expires
before the connection can be established. The value of
0 indicates an infinite time out. By default, the connect
timeout value is 30.
• Read timeout(s): Enter the read timeout value in
seconds. An exception will occur if the timeout expires
before there is data available for read. By default, the
read timeout value is 0, which indicates an infinite
time out.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level and at each component level.
1663
tHttpRequest
Global Variables
Usage
3. Connect the tHttpRequest component to the tLogRow component using a Row > Main connection.
1664
tHttpRequest
2. Fill in the URI field with "http://192.168.0.63:8081/testHttpRequest/build.xml". Note that this URI is
for demonstration purposes only and it is not a live address.
3. From the Method list, select GET.
4. Select the Write response content to file check box and fill in the input field on the right with the
file path by manual entry, D:/test.txt for this use case.
5. Select the Need authentication check box and fill in the user and password, both tomcat in this
use case.
Procedure
1. If you want to configure how the result is presented by tLogRow, double-click the component to
open its Component view and in the Mode area, select the Table (print values in cells of a table)
check box.
2. Press F6 to run this Job.
Results
Once done, the response information from the server is saved and displayed.
1665
tHttpRequest
{"echo":
[
{
"data":"e=hello"
}
]
}
From that file, tFileInputJSON reads the e parameter and its value hello and tHttpRequest sends
the pair to http://echo.itcuties.com/, an URL provided for demonstration by an online programming
community, www.itcuties.com.
Note that the e parameter is required by http://echo.itcuties.com/.
1666
tHttpRequest
4. Click the [+] button to add one row and name it, for example, to data.
5. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog
box.
6. In the Filename field, browse, or enter the path to the source JSON file in which the parameter to
be sent is stored.
7. In the Mapping table, the data column you defined in the previous step in the component schema
has been automatically added. In the JSONPath query column of this table, enter the JSON path,
in double quotation marks, to extract the parameter to be sent. In this scenario, the path is
echo[0].data.
1667
tHttpRequest
2. In the File name field, browse, or enter the path to the flat file in which you want to write the
extracted parameter. This file will be created if it does not exist. In this example, it is C:/tmp/
postParamsFile.txt.
2. In the URI field, enter the server address to which the parameter is to be sent. In this scenario, it is
http://echo.itcuties.com/.
3. From the Method list, select POST.
4. In the Post parameters from file field, browse, or enter the path to the flat file that contains the
parameter to be used. As defined earlier with the tFileOutputDelimited component, this path is C:/
tmp/postParamsFile.txt.
1668
tHttpRequest
You can read that the site receiving the parameter returns answers.
1669
tImpalaClose
tImpalaClose
Closes connection to an Impala database.
tImpalaClose closes an active connection to a given Impala database.
Basic settings
Component list If there is more than one connection used in the Job, select
tImpalaConnection from the list.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at a
component level.
Global Variables
Usage
Usage rule This component is to be used along with the other Impala
components, especially with tImpalaConnection as
tImpalaConnection allows you to open a connection for the
transaction which is underway.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
1670
tImpalaClose
Related scenarios
No scenario is available for the Standard version of this component yet.
1671
tImpalaConnection
tImpalaConnection
Establishes an Impala connection to be reused by other Impala components in your Job.
tImpalaConnection opens a connection to an Impala database.
Basic settings
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
1672
tImpalaConnection
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Impala version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
1673
tImpalaConnection
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at a
component level.
Global Variables
Usage
1674
tImpalaConnection
Related scenario
This component is used in the similar way as a tHiveConnection component is. For further
information, see Creating a partitioned Hive table on page 1582.
1675
tImpalaCreateTable
tImpalaCreateTable
Creates Impala tables that fit a wide range of Impala data formats.
tImpalaCreateTable connects to the Impala database to be used and creates an Impala table that is
dedicated to data of the format you specify.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
1676
tImpalaCreateTable
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Impala version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
1677
tImpalaCreateTable
Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Built-In: You create and store the schema locally for this
component only.
Action on table Select the action to be carried out for creating a table.
1678
tImpalaCreateTable
Set partitions Select this check box to add partition columns to the table
to be created. Once selecting it, you need to define the
schema of the partition columns you need to add.
Set file location If you want to create an Impala table in a directory other
than the default one, select this check box and enter the
directory in HDFS you want to use to hold the table content.
This is typical useful when you need to create an external
Impala table by selecting the Create an external table
check box in the Advanced settings tab.
Use S3 endpoint The Use S3 endpoint check box is displayed when you
have selected the Set file location check box to create an
external Impala table.
Once this Use S3 endpoint check box is selected, you need
to enter the following parameters in the fields that appear:
• S3 bucket: enter the name of the bucket in which you
need to create the table.
• Bucket name: enter the name of the bucket in which
you want to store the dependencies of your Job. This
bucket must already exist on S3.
• Temporary resource folder: enter the directory in
which you want to store the dependencies of your Job.
For example, enter temp_resources to write the
dependencies in the /temp_resources folder in the
bucket.
If this folder already exists at runtime, its contents are
overwritten by the upcoming dependencies; otherwise,
this folder is automatically created.
• Access key and Secret key: enter the authentication
information required to connect to the Amazon S3
bucket to be used.
To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog box
enter the password between double quotes and click
OK to save the settings.
1679
tImpalaCreateTable
Advanced settings
Like table Select this check box and enter the name of the Impala
table you want to copy. This allows you to copy the
definition of an existing table without copying its data.
For further information about the Like parameter, see
Cloudera's information about Impala's Data Definition
Language.
Create an external table Select this check box to make the table to be created an
external Impala table. This kind of Impala table leaves the
raw data where it is if the data is in HDFS.
An external table is usually the better choice for accessing
shared data existing in a file system.
For further information about an external Impala table, see
Cloudera's documentation about Impala.
Table comment Enter the description you want to use for the table to be cre
ated.
As select Select this check box and enter the As select statement
for creating an Impala table that is based on a Select
statement.
Table properties Add any custom Impala table properties you want to
override the default ones used by the Hadoop engine of the
Studio.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
1680
tImpalaCreateTable
Usage
Die on error
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1681
tImpalaCreateTable
Related scenario
This component is used in the similar way as a tHiveCreateTable component is. For further
information, see Creating a partitioned Hive table on page 1582.
1682
tImpalaInput
tImpalaInput
Executes the select queries to extract the corresponding data and sends the data to the component
that follows.
tImpalaInput is the dedicated component to the Impala database (the Impala data warehouse system).
It executes the given Impala SQL query in order to extract the data of interest from Impala. It provides
the SQLBuilder tool to help you write your Impala SQL statements easily.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
1683
tImpalaInput
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Impala version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
1684
tImpalaInput
Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
1685
tImpalaInput
Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.
Guess schema Click this button to retrieve the schema from the table.
Advanced settings
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
Note:
Clear the Trim all the String/Char columns check box to
enable Trim column in this field.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
1686
tImpalaInput
Related scenarios
For a scenario about how an input component is used in a Job, see Writing columns from a MySQL
database to an output file using tMysqlInput on page 2440.
1687
tImpalaLoad
tImpalaLoad
Writes data of different formats into a given Impala table or to export data from an Impala table to a
directory.
tImpalaLoad connects to a given Impala database and copies or moves data into an existing Impala
table or a directory you specify.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
1688
tImpalaLoad
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Impala version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
1689
tImpalaLoad
Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Load action Select the action you need to carry for writing data into the
specified destination.
• When you select LOAD, you are moving or copying data
from a directory you specify.
• When you select INSERT, you are moving or copying
data based on queries.
Target type This drop-down list appears only when you have selected
INSERT from the Load action list.
Select from this list the type of the location you need to
write data in.
• If you select Table as destination, you can still choose
to append data to or overwrite the contents in the
specified table. This is the only option in the current
release.
Table name Enter the name of the Hive table you need to write data in.
Note that with the INSERT action, this field is available only
when you have selected Table from the Target type list.
File path Enter the directory you need to read data from.
Query This field appears when you have selected INSERT from the
Load action list.
Enter the appropriate query for selecting the data to be
exported to the specified Impala table or directory.
Set partitions Select this check box to use the Impala Partition clause
in loading or inserting data in a Impala table. You need to
enter the partition keys and their values to be used in the
field that appears.
1690
tImpalaLoad
Die on error Select this check box to kill the Job when an error occurs.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
1691
tImpalaLoad
Related scenario
This component is used in the similar way as a tHiveLoad component is. For further information, see
Creating a partitioned Hive table on page 1582.
1692
tImpalaOutput
tImpalaOutput
Executes the action defined on the data contained in the table, based on the flow incoming from the
preceding component in the Job.
tImpalaOutput connects to an Impala database (the Impala data warehouse system) and writes data in
an Impala table.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
1693
tImpalaOutput
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Impala version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
1694
tImpalaOutput
Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Table Name Name of the table you need to write data in.
1695
tImpalaOutput
Extended insert Select this check box to combine multiple rows of data
into one single INSERT action. This can speed up the insert
operation.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1696
tImpalaOutput
Related scenarios
For a scenario about how an output component is used in a Job, see Inserting a column and altering
data using tMysqlOutput on page 2466.
1697
tImpalaRow
tImpalaRow
Acts on the actual DB structure or on the data (although without handling data).
The SQLBuilder tool helps you write your Impala SQL statements easily. tImpalaRow is the dedicated
component for this database. It executes the Impala SQL query stated in the specified database. The
Row suffix means the component implements a flow in the Job design although it does not provide
output.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
1698
tImpalaRow
Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.
Impala version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.
1699
tImpalaRow
Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
1700
tImpalaRow
Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
1701
tImpalaRow
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related topics, see:
• Combining two flows for selective output on page 2503.
1702
tImpalaRow
1703
tInfiniteLoop
tInfiniteLoop
Executes a task or a Job automatically, based on a loop.
tInfiniteLoop runs an infinite loop on a task.
Basic settings
Wait at each iteration (in milliseconds) Enter the time delay between iterations.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at a component
level.
Global Variables
Usage
1704
tInfiniteLoop
Row: Iterate;
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error; Synchronize;
Parallelize.
Related scenario
For an example of the kind of scenario in which tInifniteLoop might be used, see Procedure on page
1980, regarding the tLoop component.
1705
tInformixBulkExec
tInformixBulkExec
Executes Insert operations in Informix databases.
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT operation used
to feed a database. These two steps are fused together in the tInformixOutputBulkExec component.
The advantage of using two components is that data can be transformed before it is loaded in the
database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
1706
tInformixBulkExec
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Table Name of the table to be written. Note that only one table
can be written at a time.
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
1707
tInformixBulkExec
Action on data On the data of the table defined, you can perform the
following operations:
Insert: Add new data to the table. If duplicates are found,
the job stops.
Update: Update the existing table data.
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Delete the entry data which corresponds to the
input flow.
Warning:
You must specify at least one key upon which the Update
and Delete operations are to be based. It is possible to
define the columns which should be used as the key from
the schema, from both the Basic Settings and the Advanced
Settings, to optimise these operations.
Advanced settings
1708
tInformixBulkExec
Set DBMONEY Select this check box to define the decimal separator in the
Decimal separator field.
Set DBDATE Select the date format that you want to apply.
Rows Before Commit Enter the number of rows to be processed before the
commit.
Bad Rows Before Abort Enter the number of rows in error at which point the Job
should stop.
tStat Catcher Statistics Select this check box to colelct the log data at component
level.
Global Variables
Usage
Usage rule This component offers database query flexibility and covers
all possible DB2 queries which may be required.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
1709
tInformixBulkExec
Related scenario
For a scenario in which tInformixBulkExec might be used, see:
• Inserting transformed data in MySQL database on page 2482.
• Truncating and inserting file data into an Oracle database on page 2681.
1710
tInformixClose
tInformixClose
Closes connection to Informix databases.
tInformixClose closes an active connection to a database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Component list If there is more than one connection used in the Job, select
tInformixConnection from the list.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at a component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1711
tInformixClose
Related scenario
This component is for use with tInformixConnection and tInformixRollback. They are generally used
along with tInformixConnection as the latter allows you to open a connection for the transaction
which is underway.
To see a scenario in which tInformixClose might be used, see tMysqlConnection on page 2425.
1712
tInformixCommit
tInformixCommit
Makes a global commit just once instead of commiting every row or batch of rows separately.
This component improves performance and is closely related to tInformixConnection and
tInformixRollback. They are generally used to execute transactions together.
tInformixCommit validates data processed in a job from a connected database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Component list If there is more than one connection in the Job, select
tInformixConnection from the list.
Close connection This check box is selected by default. It means that the
database conenction will be closed once the commit has
been made. Clear the check box to continue using the
connection once the component has completed its task.
Warning:
If you are using a Row > Main type connection to link
tInformixCommit to your Job, your data will be committed
row by row. If this is the case, do not select this check bx
otherwise the conenction will be closed before the commit
of your first row is finalized.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at a component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
1713
tInformixCommit
Related Scenario
This component is for use with tInformixConnection and tInformixRollback. They are generally used
along with tInformixConnection as the latter allows you to open a connection for the transaction
which is underway
To see a scenario in which tInformixCommit might be used, see Inserting data in mother/daughter
tables on page 2426.
1714
tInformixConnection
tInformixConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tInformixConnection is closely related to tInformixCommit and tInformixRollback. They are generally
used along with tInformixConnection, with tInformixConnection opening the connection for the
transaction.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
1715
tInformixConnection
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.
Advanced settings
Use Transaction Clear this check box when the database is configured in
NO_LOG. mode. If the check box is selected, you can choose
whether to activate the Auto Commit option.
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
tStatCatcher Statistics Select this check box to collect the log data at a
component level.
Usage
Related scenario
For a scenario in which the tInformixConnection, might be used, see Inserting data in mother/
daughter tables on page 2426.
1716
tInformixInput
tInformixInput
Reads a database and extracts fields based on a query.
tInformixInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
1717
tInformixInput
Built-In: You create and store the schema locally for this
component only.
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
Global Variables
Usage
Usage rule This component covers all possible SQL queries for DB2 da
tabases.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
1718
tInformixInput
Related scenarios
For related topics, see:
See also scenario for tContextLoad: Reading data from different MySQL databases using dynamically
loaded connection parameters on page 497.
1719
tInformixOutput
tInformixOutput
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tInformixOutput writes, updates, makes changes or suppresses entries in a database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
1720
tInformixOutput
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Truncate table: Truncate the table.
Warning:
A commit operation will be carried out after the table is t
runcated.
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
1721
tInformixOutput
Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
1722
tInformixOutput
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Use alternate schema Select this option to use a schema other than the one
specified by the component that establishes the database
connection (that is, the component selected from the
Component list drop-down list in Basic settings view).
After selecting this option, provide the name of the desired
schema in the Schema field.
This option is available when Use an existing connection is
selected in Basic settings view.
Note:
You can press Ctrl+Space to access a list of predefined
global variables.
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
Use field options Select this check box to customize a request, especially
when there is double action on data.
Debug query mode Select this check box to display each step during processing
entries in a database.
Use Batch Select this check box to activate the batch mode for data
processing.
1723
tInformixOutput
This field appears only when the Use batch mode check box
is selected.
Optimize the batch insertion Ensure the check box is selected, to optimize the insertion
of batches of data.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a Informix database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
1724
tInformixOutput
Related scenarios
For tInformixOutput related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.
1725
tInformixOutputBulk
tInformixOutputBulk
Prepares the file to be used as a parameter in the INSERT query used to feed Informix databases.
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT operation used
to feed a database. These two steps are fused together in the tInformixOutputBulkExec component.
The advantage of using two components is that data can be transformed before it is loaded in the
database.
Writes a file composed of columns, based on a defined delimiter and on Informix standards.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Append Select this check box to append new rows to the end of the
file.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
1726
tInformixOutputBulk
Advanced settings
Set DBMONEY Select this box if you want to define the decimal separator
in the corresponding field.
Set DBDATE Select the date format that you want to apply.
Create directory if not exists This check box is selected automatically. The option allows
you to create a folder for the output file if it doesn't already
exist.
Custom the flush buffer size Select this box in order to customize the memory size used
to store the data temporarily. In the Row number field enter
the number of rows at which point the memory should be
freed.
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
1727
tInformixOutputBulk
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenario
For a scenario in which tInformixOutputBulk might be used, see:
• Inserting transformed data in MySQL database on page 2482.
• Inserting data in bulk in MySQL database on page 2489.
1728
tInformixOutputBulkExec
tInformixOutputBulkExec
Carries out Insert operations in Informix databases using the data provided.
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT operation used
to feed a database. These two steps are fused together in the tInformixOutputBulkExec component.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1729
tInformixOutputBulkExec
Table Name of the table to be written. Note that only one table
can be written at a time and the table must already exist for
the insert operation to be authorised.
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
1730
tInformixOutputBulkExec
Append Select this check box to add rows to the end of the file.
Advanced settings
Note:
You can press Ctrl+Space to access a list of predefined
global variables.
Set DBMONEY Select this check box to define the decimal separator used
in the corresponding field.
Rows Before Commit Enter the number of rows to be processed before the
commit.
Bad Rows Before Abort Enter the number of rows in error at which point the Job
should stop.
Create directory if not exists This check box is selected by default. It creates a directory
to hold the output table if required.
Custom the flush buffer size Select this box in order to customize the memory size used
to store the data temporarily. In the Row number field enter
the number of rows at which point the memory should be
freed.
1731
tInformixOutputBulkExec
Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.
tStatCatcher Statistics Select this check box to collect the log data at a
component level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenario
For a scenario in which tInformixOutputBulkExec might be used, see:
• Inserting transformed data in MySQL database on page 2482.
• Inserting data in bulk in MySQL database on page 2489.
1732
tInformixRollback
tInformixRollback
Prevents involuntary transaction commits by canceling transactions in connected databases.
tInformixRollback is closely related to tInformixCommit and tInformixConnection. They are generally
used together to execute transactions.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection Clear this check box if you want to continue to use the
connection once the component has completed its task.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at a component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
1733
tInformixRollback
Related Scenario
For a scenario in which tInformixRollback might be used, see Rollback from inserting data in mother/
daughter tables on page 2429.
1734
tInformixRow
tInformixRow
Acts on the actual DB structure or on the data (although without handling data) thanks to the
SQLBuilder that helps you write easily your SQL statements.
tInformixRow executes the SQL query stated onto the specified database. The Row suffix means the
component implements a flow in the job design although it doesn't provide output.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1735
tInformixRow
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
1736
tInformixRow
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
1737
tInformixRow
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related topics, see:
• Combining two flows for selective output on page 2503
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.
1738
tInformixSCD
tInformixSCD
Tracks and shows changes which have been made to Informix SCD dedicated tables
tInformixSCD addresses Slowly Changing Dimension transformation needs, by regularly reading a data
source and listing the modifications in an SCD dedicated table.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1739
tInformixSCD
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
SCD Editor The SCD editor helps to build and configure the data flow
for slowly changing dimension outputs.
For more information, see SCD management methodology
on page 2511.
Use memory saving Mode Select this check box to improve system performance.
Source keys include Null Select this check box to allow the source key columns to
have Null values.
Warning:
Special attention should be paid to the uniqueness of the
source key(s) values when this option is selected.
1740
tInformixSCD
Use Transaction Select this check box when the database is configured in
NO_LOG mode.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
Advanced settings
End date time details Specify the time value of the SCD end date time setting in
the format of HH:mm:ss. The default value for this field is
12:00:00.
This field appears only when SCD Type 2 is used and Fixed
year value is selected for creating the SCD end date.
Debug mode Select this check box to display each step of the process by
which data is written in the database.
tStatCatcher Statistics Select this check box to collect the log data at a
component level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
1741
tInformixSCD
Limitation This component does not support using SCD type 0 together
with other SCD types.
Related scenario
For a scenario in which tInformixSCD might be used, see tMysqlSCD on page 2508.
1742
tInformixSP
tInformixSP
Centralises and calls multiple and complex queries in a database.
tInformixSP calls procedures stored in a database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1743
tInformixSP
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Is Function / Return result in Select this check box if only one value must be returned.
From the list, select the the schema column upon which the
value to be obtained is based.
Parameters Click the Plus button and select the various Schema
Columns that will be required by the procedures. Note
that the SP schema can hold more columns than there are
parameters used in the procedure.
Select the Type of parameter:
IN: Input parameter.
OUT: Output parameter/return value.
IN OUT: Input parameters is to be returned as value, likely
after modification through the procedure (function).
RECORDSET: Input parameters is to be returned as a set of
values, rather than single value.
1744
tInformixSP
Note:
Check Inserting data in mother/daughter tables on page
2426, if you want to analyze a set of records from a
database table or DB query and return single records.
Use Transaction Clear this check box if the database is configured in the
NO_LOG mode.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at a component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related scenarios, see:
1745
tInformixSP
1746
tIngresBulkExec
tIngresBulkExec
Inserts data in bulk to a table in the Ingres DBMS for performance gain.
tIngresOutputBulk and tIngresBulkExec are generally used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used
to feed a database. These two steps are fused together in the tIngresOutputBulkExec component.
The advantage of using two components is that data can be transformed before it is loaded in the
database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Warning:
This file should be located on the same machine as the d
atabase server.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
1747
tIngresBulkExec
Built-In: You create and store the schema locally for this
component only.
Delete Working Files After Use Select this check box to delete the files that are created
during the execution.
Advanced settings
Reject Row File Path and name of the file that holds the rejected rows.
Available when Continue is selected from the On Error list.
1748
tIngresBulkExec
Fill Factor Specify the percentage (from 1 to 100) of each primary data
page that must be filled with rows, under ideal conditions.
For example, if you specify a fillfactor of 40, the DBMS
Server fills 40% of each of the primary data pages in the
restructured table with rows.
Leaf Fill A bulk copy from can specify a leaffill value. This clause
specifies the percentage (from 1 to 100) of each B-tree leaf
page that must be filled with rows during the copy. This
clause can be used only on tables with a B-tree storage
structure.
Non Leaf Fill A bulk copy from can specify a nonleaffill value. This clause
specifies the percentage (from 1 to 100) of each B-tree non-
leaf index page that must be filled with rows during the
copy. This clause can be used only on tables with a B-tree
storage structure.
Trailing WhiteSpace Selected by default, this check box is designed to trim the
trailing white spaces and applies only to such data types as
VARCHAR, NVARCHAR and TEXT.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
1749
tIngresBulkExec
Usage
Related scenarios
For related topics, see:
• Loading data to a table in the Ingres DBMS on page 1772
1750
tIngresClose
tIngresClose
Closes the transaction committed in the connected Ingres database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1751
tIngresClose
Related scenarios
No scenario is available for the Standard version of this component yet.
1752
tIngresCommit
tIngresCommit
Commits in one go, using a unique connection, a global transaction instead of doing that on every row
or every batch and thus provides gain in performance.
tIngresCommit validates the data processed through the Job into the connected database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.
Warning:
If you want to use a Row > Main connection to link
tIngresCommit to your Job, your data will be committed
row by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component is more commonly used with other tIngres*
components, especially with the tIngresConnection and
tIngresRollback components.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
1753
tIngresCommit
Related scenario
For tIngresCommit related scenario, see Inserting data in mother/daughter tables on page 2426.
1754
tIngresConnection
tIngresConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tIngresConnection opens a connection to the database for a current transaction.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
1755
tIngresConnection
Advanced settings
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.
Usage
Usage rule This component is more commonly used with other tIngres*
components, especially with the tIngresCommit and
tIngresRollback components.
Related scenarios
For tIngresConnection related scenario, see Loading data to a table in the Ingres DBMS on page 1772.
1756
tIngresInput
tIngresInput
Reads an Ingres database and extracts fields based on a query.
tIngresInput executes a DB query with a strictly defined order which must correspond to the schema
definition. Then it passes on the field list to the next component via a Main row link.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
1757
tIngresInput
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
1758
tIngresInput
Advanced settings
Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Usage rule This component covers all possible SQL queries for Ingres
databases.
Related scenarios
For related topics, see:
1759
tIngresInput
See also the scenario for tContextLoad: Reading data from different MySQL databases using
dynamically loaded connection parameters on page 497.
1760
tIngresOutput
tIngresOutput
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tIngresOutput writes, updates, makes changes or suppresses entries in a database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
1761
tIngresOutput
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.
1762
tIngresOutput
Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
1763
tIngresOutput
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
Use field options Select this check box to customize a request, especially
when there is double action on data.
Debug query mode Select this check box to display each step during processing
entries in a database.
Use Batch Select this check box to activate the batch mode for data
processing.
Note:
This check box is available only when you have selected
the Insert, Update, or Delete option in the Action on data
option.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
1764
tIngresOutput
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a Ingres database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.
Related scenarios
For related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.
1765
tIngresOutputBulk
tIngresOutputBulk
Prepares the file whose data is inserted in bulk to the Ingres DBMS for performance gain.
tIngresOutputBulk and tIngresBulkExec are generally used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used to
feed a database. These two steps are fused together in the tIngresOutputBulkExec component.
tIngresOutputBulk prepares a file with the schema defined and the data coming from the preceding
component.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Warning:
This file is generated on the local machine or a shared
folder on the LAN.
Append the File Select this check box to add the new rows at the end of the
file.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
1766
tIngresOutputBulk
Advanced settings
Include Header Select this check box to include the column header in the fi
le.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
1767
tIngresOutputBulk
Usage
Related scenarios
For related topics, see:
• Loading data to a table in the Ingres DBMS on page 1772,
1768
tIngresOutputBulkExec
tIngresOutputBulkExec
Inserts data in bulk to a table in the Ingres DBMS for performance gain.
tIngresOutputBulk and tIngresBulkExec are generally used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used to
feed a database. These two steps are fused together in the tIngresOutputBulkExec component.
tIngresOutputBulkExec prepares an output file and uses it to feed a table in the Ingres DBMS.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
1769
tIngresOutputBulkExec
Warning:
This file is generated on the machine specified by the
VNode field so it should be on the same machine as the
database server.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Delete Working Files After Use Select this check box to delete the files that are created
during the execution.
Advanced settings
Reject Row File Path and name of the file that holds the rejected rows.
Available when Continue is selected from the On Error list.
1770
tIngresOutputBulkExec
Fill Factor Specify the percentage (from 1 to 100) of each primary data
page that must be filled with rows, under ideal conditions.
For example, if you specify a fillfactor of 40, the DBMS
Server fills 40% of each of the primary data pages in the
restructured table with rows.
Leaf Fill A bulk copy from can specify a leaffill value. This clause
specifies the percentage (from 1 to 100) of each B-tree leaf
page that must be filled with rows during the copy. This
clause can be used only on tables with a B-tree storage
structure.
Non Leaf Fill A bulk copy from can specify a nonleaffill value. This clause
specifies the percentage (from 1 to 100) of each B-tree non-
leaf index page that must be filled with rows during the
copy. This clause can be used only on tables with a B-tree
storage structure.
Trailing WhiteSpace Selected by default, this check box is designed to trim the
trailing white spaces and applies only to such data types as
VARCHAR, NVARCHAR and TEXT.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Usage
1771
tIngresOutputBulkExec
2. In the Server field, enter the address of the server where the Ingres DBMS resides, for example
"localhost".
1772
tIngresOutputBulkExec
6. Select the source file by clicking the [...] button next to the File name/Stream field.
7. Click the [...] button next to the Edit schema field to open the schema editor.
8. Click the [+] button to add four columns, for example name, age, job and dept, with the data type
as string, Integer, string and string respectively.
Click OK to close the schema editor.
Click Yes on the pop-up window that asks whether to propagate the changes to the subsequent
component.
Leave other default settings unchanged.
9. Double-click tIngresOutputBulkExec to open its Basic settings view in the Component tab.
1773
tIngresOutputBulkExec
10. In the Table field, enter the name of the table for data insertion.
11. In the VNode and Database fields, enter the names of the VNode and the database.
12. In the File Name field, enter the full path of the file that will hold the data of the source file.
As shown above, the employee data is written to the table employee in the database research on
the node talendbj. Meanwhile, the output file employee_research.csv has been generated at C:/
Users/talend/Desktop.
Related scenarios
For related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.
1774
tIngresRollback
tIngresRollback
Avoids to commit part of a transaction involuntarily by canceling the transaction committed in the
connected database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Usage rule This component is more commonly used with other tIngres*
components, especially with the tIngresConnection and
tIngresCommit components.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
1775
tIngresRollback
Related scenarios
For tIngresRollback related scenario, see Rollback from inserting data in mother/daughter tables on
page 2429.
1776
tIngresRow
tIngresRow
Acts on the actual DB structure or on the data (although without handling data) using the SQLBuilder
tool to write easily your SQL statements.
tIngresRow executes the SQL query stated onto the specified database. The Row suffix means the
component implements a flow in the job design although it doesn't provide output.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1777
tIngresRow
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced Settings
Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.
1778
tIngresRow
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
1779
tIngresRow
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).
Related scenarios
For related topics, see:
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.
1780
tIngresSCD
tIngresSCD
Reflects and tracks changes in a dedicated Ingres SCD table.
tIngresSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and
logging the changes into a dedicated SCD table.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1781
tIngresSCD
Table Name of the table to be written. Note that only one table
can be written at a time.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
SCD Editor The SCD editor helps to build and configure the data flow
for slowly changing dimension outputs.
For more information, see SCD management methodology
on page 2511.
Use memory saving Mode Select this check box to maximize system performance.
Source keys include Null Select this check box to allow the source key columns to
have Null values.
Warning:
Special attention should be paid to the uniqueness of the
source key(s) values when this option is selected.
Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.
1782
tIngresSCD
Advanced settings
Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.
End date time details Specify the time value of the SCD end date time setting in
the format of HH:mm:ss. The default value for this field is
12:00:00.
This field appears only when SCD Type 2 is used and Fixed
year value is selected for creating the SCD end date.
Debug mode Select this check box to display each step during
processing entries in a database.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Limitation This component does not support using SCD type 0 together
with other SCD types.
Related scenario
For related scenarios, see tMysqlSCD on page 2508.
1783
tInterbaseClose
tInterbaseClose
Closes the transaction committed in the connected Interbase database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1784
tInterbaseClose
Related scenarios
No scenario is available for the Standard version of this component yet.
1785
tInterbaseCommit
tInterbaseCommit
Commits in one go a global transaction instead of doing that on every row or every batch and thus
provides gain in performance.
tInterbaseCommit validates the data processed through the Job into the connected DB.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.
Warning:
If you want to use a Row > Main connection to link
tInterbaseCommit to your Job, your data will be committed
row by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
1786
tInterbaseCommit
Related scenario
For tInterbaseCommit related scenario, see Inserting data in mother/daughter tables on page 2426.
1787
tInterbaseConnection
tInterbaseConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tInterbaseConnection opens a connection to the database for a current transaction.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.
1788
tInterbaseConnection
Advanced settings
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.
Usage
Related scenarios
For tInterbaseConnection related scenario, see tMysqlConnection on page 2425
1789
tInterbaseInput
tInterbaseInput
Reads an Interbase database and extracts fields based on a query.
tInterbaseInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
1790
tInterbaseInput
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
1791
tInterbaseInput
Advanced settings
Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Usage rule This component covers all possible SQL queries for
Interbase databases.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
1792
tInterbaseInput
Related scenarios
For related topics, see:
See also the related topic in tContextLoad: Reading data from different MySQL databases using
dynamically loaded connection parameters on page 497.
1793
tInterbaseOutput
tInterbaseOutput
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tInterbaseOutput writes, updates, makes changes or suppresses entries in a database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
1794
tInterbaseOutput
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.
1795
tInterbaseOutput
Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Clear data in table Wipes out data from the selected table before action.
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
1796
tInterbaseOutput
Advanced settings
Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
Use field options Select this check box to customize a request, especially
when there is double action on data.
Debug query mode Select this check box to display each step during processing
entries in a database.
Use Batch Select this check box to activate the batch mode for data
processing.
Note:
This check box is available only when you have selected
the Insert, Update, or Delete option in the Action on data
option.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
1797
tInterbaseOutput
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of a
table in a Interbase database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
1798
tInterbaseOutput
Related scenarios
For related topics, see
• Inserting a column and altering data using tMysqlOutput on page 2466.
1799
tInterbaseRollback
tInterbaseRollback
Avoids to commit part of a transaction involuntarily by canceling the transaction committed in the
connected Interbase database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
1800
tInterbaseRollback
Related scenarios
For tInterbaseRollback related scenario, see Rollback from inserting data in mother/daughter tables
on page 2429.
1801
tInterbaseRow
tInterbaseRow
Acts on the actual database structure or on the data (although without handling data) using the
SQLBuilder tool to write easily your SQL statements.
tInterbaseRow executes the SQL query stated onto the specified database. The Row suffix means the
component implements a flow in the job design although it does not provide output.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.
1802
tInterbaseRow
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.
1803
tInterbaseRow
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
1804
tInterbaseRow
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related scenarios, see:
• Combining two flows for selective output on page 2503
• For tDBSQLRow related scenario: see Procedure on page 622
• For tMySQLRow related scenario: see Removing and regenerating a MySQL table index on page
2497.
1805
tIntervalMatch
tIntervalMatch
Returns a value based on a Join relation.
tIntervalMatch receives a main flow and aggregates it based on join to a lookup flow. Then it matches
a specified value to a range of values and returns related information.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Search Column Select the main flow column containing the values to be
matched with a range of values
Column (LOOKUP) Select the lookup flow column containing the values to be
returned when the Join is ok.
Lookup Column (min) / Include the bound (min) Select the column containing the minimum value of the
range. Select the check box to include the minimum value
of the range in the match.
1806
tIntervalMatch
Lookup Column (max) / Include the bound (max) Select the column containing the maximum value of the
range. Select the check box to include the maximum value
of the range in the match.
Advanced settings
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
1807
tIntervalMatch
Procedure
1. Drop the components onto the design workspace.
2. Connect the components using Row > Main connection.
Note that the connection from the second tFileInputDelimited component to the tIntervalMatch
component will appear as a Lookup connection.
2. Browse to the file to be used as the main input, which provides a list of servers and their IP
addresses:
Server;IP
Server1;057.010.010.010
Server2;001.010.010.100
Server3;057.030.030.030
Server4;053.010.010.100
3. Click the [...] button next to Edit schema to open the Schema dialog box and define the input
schema. According to the input file structure, the schema is made of two columns, respectively
Server and IP, both of type String. Then click OK to close the dialog box.
1808
tIntervalMatch
4. Define the number of header rows to be skipped, and keep the other settings as they are.
5. Define the properties of the second tFileInputDelimited component similarly.
The file to be used as the input to the lookup flow in this example lists some IP address ranges
and the corresponding countries:
StartIP;EndIP;Country
001.000.000.000;001.255.255.255;USA
002.006.190.056;002.006.190.063;UK
011.000.000.000;011.255.255.255;USA
057.000.000.000;057.255.255.255;France
012.063.178.060;012.063.178.063;Canada
053.000.000.000;053.255.255.255;Germany
Accordingly, the schema of the lookup flow should have the following structure:
1809
tIntervalMatch
7. From the Search Column list, select the main flow column containing the values to be matched
with the range values. In this example, we want to match the servers' IP addresses with the range
values from the lookup flow.
8. From the Column (LOOKUP) list, select the lookup column that holds the values to be returned. In
this example, we want to get the names of countries where the servers are hosted.
9. Set the min and max lookup columns corresponding to the range bounds defined in the lookup
schema, StartIP and EndIP respectively in this example.
1810
tIterateToFlow
tIterateToFlow
Transforms non processable data into a processable flow. tIterateToFlow transforms a list into a data
flow that can be processed.
Basic settings
Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either Built-in or remote in the
Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Advanced Settings
tStatCatcher Statistics Select this check box to collect the log data at a component
level.
1811
tIterateToFlow
Global Variables
Usage
• Drop the following components: tFileList, tIterateToFlow and tLogRow from the Palette to the
design workspace.
• Connect the tFileList to the tIterateToFlow using an iterate link and connect the Job to the
tLogRow using a Row main connection.
• In the tFileList Component view, set the directory where the list of files is stored.
1812
tIterateToFlow
• In this example, the files are three simple .txt files held in one directory: Countries.
• No need to care about the case, hence clear the Case sensitive check box.
• Leave the Include Subdirectories check box unchecked.
• Then select the tIterateToFlow component et click Edit Schema to set the new schema
• Add two new columns: Filename of String type and Date of date type. Make sure you define the
correct pattern in Java.
• Click OK to validate.
• Notice that the newly created schema shows on the Mapping table.
• In each cell of the Value field, press Ctrl+Space bar to access the list of global and user-specific
variables.
• For the Filename column, use the global variable: tFileList_1CURRENT_FILEPATH. It
retrieves the current filepath in order to catch the name of each file, the Job iterates on.
• For the Date column, use the Talend routine: Talend Date.getCurrentDate() (in Java)
• Then on the tLogRow component view, select the Print values in cells of a table check box.
• Save your Job and press F6 to execute it.
1813
tIterateToFlow
The filepath displays on the Filename column and the current date displays on the Date column.
1814
tJasperOutput
tJasperOutput
Creates a report in rich formats using Jaspersoft's iReport.
This component is closely related to Jaspersoft's report designer -- iReport. It reads and processes
data from an input flow to create a report against a .jrxml report template defined via iReport.
tJasperOutput reads and processes data from an input flow to create a report against a .jrxml report
template defined via iReport.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
1815
tJasperOutput
Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the output component.
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Specify Locale Select this check box to choose a locale from the Report
Locale list.
Note:
The first line of the Report Locale list is empty. You can
click it to customize a locale.
Encoding Select an encoding mode from this list. You can select
Custom from the list to enter an encoding method in the
field that appears.
Global Variables
Usage
1816
tJasperOutput
Note:
You can select Repository from the Property Type drop-down list to fill in the relevant fields
automatically if the relevant metadata has been stored locally in the Repository. For more
information about Metadata, see the Talend Studio User Guide.
3. Fill in the File name/Stream field to give the path and name of the source file, e.g. "C:/Documents
and Settings/Andy ZHANG/nom.csv".
4. Keep the default settings for the Row Separator and Field Separator fields. You can also change
them as needed.
1817
tJasperOutput
5. Set 1 in the Header field and 0 in the Footer field. Leave the Limit field empty. You can also
change them as needed.
6. Select Built-In from the Schema drop-down list and click Edit schema to define the data structure
of the input file. In this case, the input file has 2 columns: Nom and Prenom.
2. Enter the full path of the report template file created via Jaspersoft's iReport in the Jrxml file
field. You can click the three-dot button to browse.
Note:
The schema of the file, which is used to create a .jrxml template file via iReport, should be the
same as that of the source file that is used to create the report.
3. Enter the path for the temporary files generated during the job execution in the Temp path field.
You can click the three-dot button to browse.
4. Enter the path for the final report file generated during the job execution in the Destination path
field. You can click the three-dot button to browse.
5. Enter the name for the final report file generated during the job execution in the File name/
Stream field.
6. Select the format for the final report file generated during the job execution in the Report type
field.
7. Click Sync columns to retrieve the schema from the previous component.
1818
tJasperOutput
8. Enter the path of execution file of Jaspersoft's iReport in the iReport field, e.g. replacing
__IREPORT_PATH__\ with E:\Program Files\Jaspersoft\iReport-4.1.1\bin\. You can click the Launch
button to run iReport.
Note:
This step is not mandatory. Yet, this helps you conveniently access the iReport software for
relevant operations, e.g. creating a report template, etc.
Job execution
Procedure
1. Press CTRL+S to save your Job.
2. Press F6 to execute it.
You can find the file out.pdf in the folder specified in the Destination path field.
1819
tJasperOutputExec
tJasperOutputExec
Creates a report in rich formats using Jaspersoft's iReport and offers a performance gain as it functions
as a combination of an input component and a tJasperOutput component.
This component is closely related to Jaspersoft's report designer -- iReport. It reads and processes
data from a source file to create a report against a .jrxml report template defined via iReport.
tJasperOutputExec is used as a combination of an input component and a tJasperOutput component.
The advantage of using two separate components is that data can be transformed before being used
to generate a report and the input sources can be various and rich.
Reads and processes data from a source file to create a report against a .jrxml report template
defined via iReport.
Basic settings
Use Default Output Name Select this check box to use the default name for the report
generated, which takes the source file's name.
Note:
This field does not appear if the Use Default Output
Name box has been selected.
1820
tJasperOutputExec
Advanced settings
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Specify Locale Select this check box to choose a locale from the Report
Locale list.
Note:
The first line of the Report Locale list is empty. You can
click it to customize a locale.
Encoding Select an encoding mode from this list. You can select
Custom from the list to enter an encoding method in the
field that appears.
Global Variables
Usage
Related Scenario
For related scenarios, see Generating a report against a .jrxml template on page 1817.
1821
tJava
tJava
Extends the functionalities of a Talend Job using custom Java commands.
tJava enables you to enter personalized code in order to integrate it in Talend program. You can
execute this code only once.
Basic settings
Code Type in the Java code you want to execute according to the
task you need to perform. For further information about Java
functions syntax specific to Talend , see Talend Studio
Help Contents (Help > Developer Guide > API Reference).
For a complete Java reference, check http://docs.oracle.com/
javaee/6/api/
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.
Global Variables
1822
tJava
Usage
1823
tJava
2. Define the path to the input file in the File name field.
The input file used in this example is a simple text file made of two columns: Names and their
respective Emails.
3. Click the Edit Schema button, and set the two-column schema. Then click OK to close the dialog
box.
4. When prompted, click OK to accept the propagation, so that the tFileOutputExcel component gets
automatically set with the input schema.
1824
tJava
In this example, the Sheet name is Email and the Include Header box is selected.
In this use case, we use the NB_Line variable. To access the global variable list, press Ctrl + Space
bar on your keyboard and select the relevant global parameter.
1825
tJava
Results
The content gets passed on to the Excel file defined and the Number of lines processed are displayed
on the Run console.
1826
tJavaDBInput
tJavaDBInput
Reads a database and extracts fields based on a query
tJavaDBInput executes a DB query with a strictly defined order which must correspond to the schema
definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
1827
tJavaDBInput
Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.
Advanced settings
Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.
tStatCatcher Statistics Select this check box to collect log data at the component
level.
Global Variables
Usage
Usage rule This component covers all possible SQL database queries.
1828
tJavaDBInput
Related scenarios
For related topics, see:
See also the related topic in tContextLoad: Reading data from different MySQL databases using
dynamically loaded connection parameters on page 497.
1829
tJavaDBOutput
tJavaDBOutput
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tJavaDBOutput writes, updates, makes changes or suppresses entries in a database.
Basic settings
Table Name of the table to be written. Note that only one table
can be written at a time
Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
1830
tJavaDBOutput
Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.
Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
1831
tJavaDBOutput
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
Use field options Select this check box to customize a request, especially
when there is double action on data.
Debug query mode Select this check box to display each step during processing
entries in a database.
tStat Catcher Statistics Select this check box to collect log data at the component
level.
Global Variables
1832
tJavaDBOutput
Usage
Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of a
table in a Java database. It also allows you to create a reject
flow using a Row > Rejects link to filter data in error. For an
example of tMysqlOutput in use, see Retrieving data in error
with a Reject link on page 2474.
Related scenarios
For related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.
1833
tJavaDBRow
tJavaDBRow
Acts on the actual database structure or on the data (although without handling data) using the
SQLBuilder tool to write easily your SQL statements.
tJavaDBRow executes the SQL query stated onto the specified database. The Row suffix means the
component implements a flow in the job design although it doesn't provide output.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
1834
tJavaDBRow
Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.
Advanced settings
Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.
Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased
tStat Catcher Statistics Select this check box to collect log data at the component
level.
1835
tJavaDBRow
Global Variables
Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
Usage
Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
Related scenarios
For related topics, see:
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.
1836
tJavaFlex
tJavaFlex
Provides a Java code editor that lets you enter personalized code in order to integrate it in Talend
program.
tJavaFlex enables you to add Java code to the Start/Main/End code sections of this component itself.
With tJavaFlex, you can enter the three java-code parts (start, main and end) that constitute a kind of
component dedicated to do a desired operation.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Sync columns to retrieve the schema from the previous
component in the Job.
Built-In: You create and store the schema locally for this
component only.
1837
tJavaFlex
Data Auto Propagate Select this check box to automatically propagate the data to
the component that follows.
Start code Enter the Java code that will be called during the
initialization phase.
Main code Enter the Java code to be applied for each line in the data
flow.
End code Enter the Java code that will be called during the closing
phase.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a job level as well as at each component level.
Global Variables
Usage
1838
tJavaFlex
2. Click the three-dot button next to Edit schema to open the corresponding dialog box where you
can define the data structure to pass to the component that follows.
3. Click the [+] button to add two columns: key and value and then set their types to Integer and
String respectively.
4. Click OK to validate your changes and close the dialog box.
1839
tJavaFlex
5. In the Basic settings view of tJavaFlex, select the Data Auto Propagate check box to automatically
propagate data to the component that follows.
In this example, we do not want to do any transformation on the retrieved data.
6. In the Start code field, enter the code to be executed in the initialization phase.
In this example, the code indicates the initialization of tJavaFlex by displaying the START
message and sets up the loop and the variables to be used afterwards in the Java code:
System.out.println("## START\n#");
String [] valueArray = {"Miss", "Mrs", "Mr"};
7. In the Main code field, enter the code you want to apply on each of the data rows.
In this example, we want to display each key with its value:
row1.key = i;
row1.value = valueArray[i];
Warning:
In the Main code field, "row1" corresponds to the name of the link that comes out of tJavaFlex. If you
rename this link, you have to modify the code of this field accordingly.
8. In the End code field, enter the code that will be executed in the closing phase.
In this example, the brace (curly bracket) closes the loop and the code indicates the end of the
execution of tJavaFlex by displaying the END message:
}
System.out.println("#\n## END");
9. If needed, double-click tLogRow and in its Basic settings view, click the [...] button next to Edit
schema to make sure that the schema has been correctly propagated.
1840
tJavaFlex
The three personal titles are displayed on the console along with their corresponding keys.
1841
tJavaFlex
2. Click the plus button to add four columns: number, txt, date and flag.
3. Define the schema and set the parameters to the four columns according to the above capture.
4. In the Functions column, select the three-dot function [...] for each of the defined columns.
5. In the Parameters column, enter 10 different parameters for each of the defined columns.
These 10 parameters corresponds to the data that will be randomly generated when executing
tRowGenerator.
6. Click OK to validate your changes and close the editor.
2. Click Sync columns to retrieve the schema from the preceding component.
3. In the Start code field, enter the code to be executed in the initialization phase.
In this example, the code indicates the initialization of the tJavaFlex component by displaying the
START message and defining the variable to be used afterwards in the Java code:
System.out.println("## START\n#");
int i = 0;
4. In the Main code field, enter the code to be applied on each line of data.
1842
tJavaFlex
In this example, we want to show the number of each line starting from 0 and then the number
and the random text transformed to upper case and finally the random date set in the editor of
tRowGenerator. Then, we create a condition to show if the status is true or false and we increment
the number of the line:
i++;
Warning:
In the Main code field, "row1" corresponds to the name of the link that connects to tJavaFlex. If you
rename this link, you have to modify the code.
5. In the End code field, enter the code that will be executed in the closing phase.
In this example, the code indicates the end of the execution of tJavaFlex by displaying the END
message:
System.out.println("#\n## END");
1843
tJavaFlex
The console displays the randomly generated data that was modified by the java command set
through tJavaFlex.
1844
tJavaRow
tJavaRow
Provides a code editor that lets you enter the Java code to be applied to each row of the flow.
tJavaRow allows you to enter customized code which you can integrate in a Talend program.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Built-In: You create and store the schema locally for this
component only.
1845
tJavaRow
Code Enter the Java code to be applied to each line of the data
flow.
Advanced settings
tStatCatcher Statistics Select this check box to collect the log data at a component
level..
Global Variables
Usage
1846
tJavaRow
2. In the File name/Stream field, type in the path to the input file in double quotation marks, or
browse to the path by clicking the [...] button, and define the first line of the file as the header.
In this example, the input file has the following content:
City;Population;LandArea;PopDensity
Beijing;10233000;1418;7620
Moscow;10452000;1081;9644
Seoul;10422000;605;17215
Tokyo;8731000;617;14151
New York;8310000;789;10452
3. Click the [...] button next to Edit schema to open the Schema dialog box, and define the data
structure of the input file. Then, click OK to validate the schema setting and close the dialog box.
1847
tJavaRow
4. Double-click the tJavaRow component to display its Basic settings view in the Component tab.
5. Click Sync columns to make sure that the schema is correctly retrieved from the preceding
component.
6. In the Code field, enter the code to be applied on each line of data based on the defined schema
columns.
In this example, we want to transform the city names to upper case, group digits of numbers
larger than 1000 using the thousands separator for ease of reading, and print the data on the
console:
Note:
In the Code field, input_row refers to the link that connects to tJavaRow.
1848
tJavaRow
1849
tJDBCClose
tJDBCClose
Closes an active JDBC connection to release the occupied resources.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Connection Component Select the component that opens the connection you need
to close from the drop-down list.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
1850
tJDBCClose
Related scenarios
No scenario is available for the Standard version of this component yet.
1851
tJDBCColumnList
tJDBCColumnList
Lists all column names of a given JDBC table.
tJDBCColumList iterates on all columns of a given table through a defined JDBC connection.
Basic settings
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.
Advanced settings
tStatCatcher Select this check box to collect log data at the component level.
Statistics
Global Variables
1852
tJDBCColumnList
Usage
Related scenario
For tJDBCColumnList related scenario, see Iterating on a DB table and listing its column names on
page 2419.
1853
tJDBCCommit
tJDBCCommit
Commits in one go a global transaction instead of doing that on every row or every batch and thus
provides gain in performance.
tJDBCCommit validates the data processed through the Job into the connected DB.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Connection Component Select the component that opens the database connection
to be reused by this component.
Close Connection Select this check box to close the database connection once
the component has performed its task.
Clear this check box to continue to use the selected
connection once the component has performed its task.
If this component is linked to your Job via a Row > Main
connection, your data will be committed row by row. In this
case, do not select the Close connection check box or your
connection will be closed before the end of the first row
commit.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Usage rule This component is more commonly used with other tJDBC*
components, especially with the tJDBCConnection and
tJDBCRollback components.
1854
tJDBCCommit
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenario
For tJDBCCommit related scenario, see Inserting data in mother/daughter tables on page 2426.
1855
tJDBCConnection
tJDBCConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.
Driver Class Enter the class name for the specified driver between
double quotation marks. For example, for the
RedshiftJDBC41-1.1.13.1013.jar driver, the name
to be entered is com.amazon.redshift.jdbc41.D
river.
1856
tJDBCConnection
Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.
This check box is not available when the Specify a data
source alias check box is selected.
Specify a data source alias Select this check box and in the Data source alias field
displayed, specify the alias of a data source created on
Talend Runtime side to use the shared connection pool
defined in the data source configuration. This option works
only when you deploy and run your Job in Talend Runtime.
This check box is not available when the Use or register a
shared DB Connection check box is selected.
Advanced settings
Use Auto-Commit Select this check box to activate the auto commit mode.
Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
1857
tJDBCConnection
Usage
Procedure
1. If the library to be imported isn't available on your machine, either download and install it using
the Modules view or download and store it in a local directory.
2. In the Drivers table, add one row to the table by clicking the [+] button.
3. Click the newly added row and click the [...] button to open the Module dialog box where you can
import the external library.
1858
tJDBCConnection
Note:
Changing the Maven URI for an external module will affect all the components and
metadata connections that use that module within the project.
When working on a remote project, your custom Maven URI settings will be automatically
synchronized to the Talend Artifact Repository and will be used when other users working
on the same project install the external module.
1859
tJDBCConnection
Note: You can replace or delete the imported library, or import new libraries if needed.
Related scenario
For tJDBCConnection related scenario, see tMysqlConnection on page 2425
1860
tJDBCInput
tJDBCInput
Reads any database using a JDBC API connection and extracts fields based on a query.
tJDBCInput executes a database query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Connection Component Select the component that opens the database connection
to be reused by this component.
JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.
1861
tJDBCInput
cell to open the Module dialog box from which you can
select the driver JAR to be used. For example, the driver jar
RedshiftJDBC41-1.1.13.1013.jar for the Redshift
database.
For more information, see Importing a database driver.
Driver Class Enter the class name for the specified driver between
double quotation marks. For example, for the
RedshiftJDBC41-1.1.13.1013.jar driver, the name
to be entered is com.amazon.redshift.jdbc41.D
river.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.
Table Name The name of the table from which data will be retrieved.
Query Type and Query Specify the database query statement paying particularly
attention to the properly sequence of the fields which must
correspond to the schema definition.
• Built-In: Fill in the query statement in the Query field
manually or click the [...] button next to the Query
field to build the statement graphically using the
SQLBuilder.
• Repository: Select the relevant query stored in the
Repository by clicking the [...] button next to it and
in the pop-up Repository Content dialog box, select
the query to be used, and the Query field will be
automatically filled in.
1862
tJDBCInput
Guess Query Click this button to generate query in the Query field based
on the defined table and schema.
Guess Schema Click this button to generate schema columns based on the
query defined in the Query field.
Specify a data source alias Select this check box and in the Data source alias field
displayed, specify the alias of a data source created on
Talend Runtime side to use the shared connection pool
defined in the data source configuration. This option works
only when you deploy and run your Job in Talend Runtime.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Advanced settings
Use cursor Select this check box to specify the number of rows you
want to work with at any given time. This option optimises
performance.
Trim all the String/Char columns Select this check box to remove leading whitespace and
trailing whitespace from all String/Char columns.
Check column to trim Select the check box for corresponding column to remove
leading whitespace and trailing whitespace from it.
This property is not available when the Trim all the String/
Char columns check box is selected.
Enable Mapping File for Dynamic Select this check box to use the specified metadata
mapping file when reading data from a dynamic type
column. This check box is cleared by default.
With this check box selected, you can specify the metadata
mapping file to use by selecting a type of database from the
Mapping File drop-down list.
For more information about metadata mapping files, see the
section on type conversion of Talend Studio User Guide.
Use PreparedStatement Select this check box if you want to query the database
using a prepared statement. In the Set PreparedStatem
ent Parameters table displayed, specify the value for each
parameter represented by a question mark ? in the SQL
statement defined in the Query field.
• Parameter Index: the position of the parameter in the
SQL statement.
• Parameter Type: the data type of the parameter.
• Parameter Value: the value of the parameter.
For a related use case of this property, see Using
PreparedStatement objects to query data on page 2498.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
1863
tJDBCInput
Global Variables
Usage
Usage rule This component covers all possible SQL queries for any
database using a JDBC connection.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related topics, see:
Related topic in tContextLoad: see Reading data from different MySQL databases using dynamically
loaded connection parameters on page 497.
1864
tJDBCOutput
tJDBCOutput
Executes the action defined on the data contained in the table, based on the flow incoming from the
preceding component in the Job.
tJDBCOutput writes, updates, makes changes or suppresses entries in any type of database connected
to a JDBC API.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Connection Component Select the component that opens the database connection
to be reused by this component.
JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.
1865
tJDBCOutput
Driver Class Enter the class name for the specified driver between
double quotation marks. For example, for the
RedshiftJDBC41-1.1.13.1013.jar driver, the name
to be entered is com.amazon.redshift.jdbc41.D
river.
Table Name The name of the table into which data will be written.
Warning:
It is necessary to specify at least one column as a
primary key on which the Update and Delete operations
are based. You can do that by clicking Edit Schema
and selecting the check box(es) next to the column(s)
you want to set as primary key(s). For an advanced
use, click the Advanced settings view where you can
simultaneously define primary keys for the Update
and Delete operations. To do that: Select the Use field
options check box and then in the Key in update column,
select the check boxes next to the column names you
want to use as a base for the Update operation. Do
the same in the Key in delete column for the Delete
operation.
Clear data in table Select this check box to clear data in the table before
performing the action defined.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
1866
tJDBCOutput
Guess Schema Click this button to generate schema columns based on the
settings of database table columns.
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows.
When errors are skipped, you can collect the rows on error
using a Row > Reject connection.
Specify a data source alias Select this check box and in the Data source alias field
displayed, specify the alias of a data source created on
Talend Runtime side to use the shared connection pool
defined in the data source configuration. This option works
only when you deploy and run your Job in Talend Runtime.
If you use the component's own DB configuration, your
data source connection will be closed at the end of the
component. To prevent this from happening, use a shared
DB connection with the data source alias specified.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Advanced settings
1867
tJDBCOutput
Additional Columns This option allows you to call SQL functions to perform
actions on columns, which are not insert, update or delete
actions, or actions that require particular preprocessing. It is
not offered if you create (with or without drop) the database
table.
• Name: The name of the schema column to be inserted,
or the name of the schema column used to replace an
existing column.
• SQL expression: The SQL statement to be executed in
order to insert or replace relevant column.
• Position: Select Before, After, or Replace
according to the action to be performed on the
reference column.
• Reference column: The name of the reference column
that can be used to locate the new column to be
inserted or that will be replaced.
Use field options Select this check box and in the Fields options table
displayed, select the check box for the corresponding
column to customize a request, particularly if multiple
actions are being carried out on the data.
• Key in update: Select the check box for the
corresponding column based on which data is updated.
• Key in delete: Select the check box for the
corresponding column based on which data is deleted.
• Updatable: Select the check box if data in the
corresponding column can be updated.
• Insertable: Select the check box if data in the
corresponding column can be inserted.
Debug query mode Select this check box to display each step during processing
entries in a database.
Use Batch Select this check box to activate the batch mode for data
processing, and in the Batch Size field displayed, specify the
number of records to be processed in each batch.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Enable parallel execution Select this check box to perform high-speed data processing
by treating multiple data flows simultaneously. This feature
depends on the database or the application ability to handle
multiple inserts in parallel as well as the number of CPU
affected. With this check box selected, you need to specify
the number of parallel executions desired in the Number of
parallel executions field displayed.
Global Variables
1868
tJDBCOutput
Usage
Usage rule This component offers the flexibility benefit of the database
query and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a JDBC database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For tJDBCOutput related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.
1869
tJDBCRollback
tJDBCRollback
Avoids commiting part of a transaction accidentally by canceling the transaction committed in the
connected database.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Connection Component Select the component that opens the database connection
to be reused by this component.
Close Connection Select this check box to close the database connection once
the component has performed its task.
Clear this check box to continue to use the selected
connection once the component has performed its task.
Advanced settings
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
Usage
Usage rule This component is more commonly used with other tJDBC* components, especially with the
tJDBCConnection and tJDBCCommit components.
Dynamic Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
settings your database connection dynamically from multiple connections planned in your Job. This feature
is useful when you need to access database tables having the same data structure but in different
databases, especially when you are working in an environment where you cannot change your Job
settings, for example, when your Job has to be deployed and executed independent of Talend Studio.
For examples on using dynamic parameters, see Reading data from databases through context-
based dynamic connections on page 2446 and Reading data from different MySQL databases using
1870
tJDBCRollback
dynamically loaded connection parameters on page 497. For more information on Dynamic settings
and context variables, see Talend Studio User Guide.
Related scenario
For tJDBCRollback related scenario, see tMysqlRollback on page 2491
1871
tJDBCRow
tJDBCRow
Acts on the actual DB structure or on the data (although without handling data) using the SQLBuilder
tool to write easily your SQL statements.
tJDBCRow is the component for any type database using a JDBC API. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesn't provide output.
Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.
Basic settings
Database Select a type of database from the list and click Apply.
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.
Connection Component Select the component that opens the database connection
to be reused by this component.
JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.
1872
tJDBCRow
Driver Class Enter the class name for the specified driver between
double quotation marks. For example, for the
RedshiftJDBC41-1.1.13.1013.jar driver, the name
to be entered is com.amazon.redshift.jdbc41.D
river.
Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.
Query Type and Query Specify the database query statement paying particularly
attention to the properly sequence of the fields which must
correspond to the schema definition.
• Built-In: Fill in the query statement in the Query field
manually or click the [...] button next to the Query
field to build the statement graphically using the
SQLBuilder.
• Repository: Select the relevant query stored in the
Repository by clicking the [...] button next to it and
in the pop-up Repository Content dialog box, select
the query to be used, and the Query field will be
automatically filled in.
Guess Query Click this button to generate query in the Query field based
on the defined table and schema.
Specify a data source alias Select this check box and in the Data source alias field
displayed, specify the alias of a data source created on
1873
tJDBCRow
Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows.
When errors are skipped, you can collect the rows on error
using a Row > Reject connection.
Advanced settings
Propagate QUERY's recordset Select this check box to propagate the result of the query
to the output flow. From the use column list displayed, you
need to select a column into which the query result will be
inserted.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the query's recordset should be set to
the Object type and this component is usually followed by a
tParseRecordSet component.
Use PreparedStatement Select this check box if you want to query the database
using a prepared statement. In the Set PreparedStatem
ent Parameters table displayed, specify the value for each
parameter represented by a question mark ? in the SQL
statement defined in the Query field.
• Parameter Index: the position of the parameter in the
SQL statement.
• Parameter Type: the data type of the parameter.
• Parameter Value: the value of the parameter.
For a related use case of this property, see Using
PreparedStatement objects to query data on page 2498.
tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.
Global Variables
1874
tJDBCRow
Usage
Usage rule This component offers the flexibility of the DB query for any
database using a JDBC connection and covers all possible
SQL queries.
Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.
Related scenarios
For related topics, see:
• Combining two flows for selective output on page 2503.
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.
1875
tJDBCSCDELT
tJDBCSCDELT
Tracks data changes in a source database table using SCD (Slowly Changing Dimensions) Type 1
method and/or Type 2 method and writes both the current and historical data into a specified SCD
dimension table.
Basic settings
Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.
Driver JAR Complete this table to load the driver JARs needed. To do
this, click the [+] button under the table to add as many
rows as needed, each row for a driver JAR, then select
the cell and click the [...] button at the right side of the
cell to open the Module dialog box from which you can
select the driver JAR to be used. For example, the driver jar