TalendOpenStudio Components RG en 7.3.1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3956

Talend Components

Reference Guide

7.3.1
Last updated: 2020-02-23
Contents

Copyleft........................................................................................................................ 77

tAccessBulkExec.......................................................................................................... 79
tAccessBulkExec Standard properties.......................................................................................................................79
Related scenarios............................................................................................................................................................. 81

tAccessClose................................................................................................................ 82
tAccessClose Standard properties..............................................................................................................................82
Related scenarios............................................................................................................................................................. 83

tAccessCommit............................................................................................................ 84
tAccessCommit Standard properties......................................................................................................................... 84
Related scenario............................................................................................................................................................... 85

tAccessConnection...................................................................................................... 86
tAccessConnection Standard properties.................................................................................................................. 86
Inserting data in parent/child tables........................................................................................................................87

tAccessInput.................................................................................................................91
tAccessInput Standard properties.............................................................................................................................. 91
Related scenarios............................................................................................................................................................. 94

tAccessOutput..............................................................................................................95
tAccessOutput Standard properties...........................................................................................................................95
Related scenarios...........................................................................................................................................................100

tAccessOutputBulk....................................................................................................101
tAccessOutputBulk Standard properties................................................................................................................101
Related scenarios...........................................................................................................................................................103

tAccessOutputBulkExec............................................................................................104
tAccessOutputBulkExec Standard properties...................................................................................................... 104
Related scenarios...........................................................................................................................................................107

tAccessRollback.........................................................................................................108
tAccessRollback Standard properties..................................................................................................................... 108
Related scenarios...........................................................................................................................................................109

tAccessRow................................................................................................................ 110
tAccessRow Standard properties..............................................................................................................................110
Related scenarios...........................................................................................................................................................113
tAddCRCRow..............................................................................................................114
tAddCRCRow Standard properties...........................................................................................................................114
Adding a surrogate key to a file..............................................................................................................................115

tAddLocationFromIP.................................................................................................118
tAddLocationFromIP Standard properties............................................................................................................ 118
Identifying a real-world geographic location of an IP.................................................................................... 119

tAdvancedFileOutputXML........................................................................................122
tAdvancedFileOutputXML Standard properties.................................................................................................. 122
Defining the XML tree.................................................................................................................................................125
Mapping XML data........................................................................................................................................................127
Defining the node status............................................................................................................................................ 127
Creating an XML file using a loop......................................................................................................................... 128

tAggregateRow..........................................................................................................133
tAggregateRow Standard properties...................................................................................................................... 133
Aggregating values and sorting data.....................................................................................................................135

tAggregateSortedRow.............................................................................................. 139
tAggregateSortedRow Standard properties......................................................................................................... 139
Sorting and aggregating the input data...............................................................................................................141

tAmazonAuroraClose................................................................................................ 146
tAmazonAuroraClose Standard properties........................................................................................................... 146
Related scenario.............................................................................................................................................................147

tAmazonAuroraCommit............................................................................................148
tAmazonAuroraCommit Standard properties.......................................................................................................148
Related scenario.............................................................................................................................................................149

tAmazonAuroraConnection......................................................................................150
tAmazonAuroraConnection Standard properties................................................................................................150
Related scenario.............................................................................................................................................................152

tAmazonAuroraInput................................................................................................ 153
tAmazonAuroraInput Standard properties............................................................................................................153
Handling data with Amazon Aurora....................................................................................................................... 156

tAmazonAuroraOutput............................................................................................. 163
tAmazonAuroraOutput Standard properties........................................................................................................ 163
Related scenario.............................................................................................................................................................169

tAmazonAuroraRollback.......................................................................................... 170
tAmazonAuroraRollback Standard properties..................................................................................................... 170
Related Scenario............................................................................................................................................................ 171

tAmazonEMRListInstances.......................................................................................172
tAmazonEMRListInstances Standard properties.................................................................................................172
Related scenario.............................................................................................................................................................173

tAmazonEMRManage................................................................................................174
tAmazonEMRManage Standard properties...........................................................................................................174
Managing an Amazon EMR cluster.........................................................................................................................178

tAmazonEMRResize.................................................................................................. 182
tAmazonEMRResize Standard properties..............................................................................................................182
Related scenario.............................................................................................................................................................184

tAmazonMysqlClose................................................................................................. 185
tAmazonMysqlClose Standard properties............................................................................................................. 185
Related scenarios...........................................................................................................................................................186

tAmazonMysqlCommit............................................................................................. 187
tAmazonMysqlCommit Standard properties........................................................................................................ 187
Related scenario.............................................................................................................................................................188

tAmazonMysqlConnection....................................................................................... 189
tAmazonMysqlConnection Standard properties................................................................................................. 189
Related scenario.............................................................................................................................................................191

tAmazonMysqlInput..................................................................................................192
tAmazonMysqlInput Standard properties............................................................................................................. 192
Related scenarios...........................................................................................................................................................194

tAmazonMysqlOutput...............................................................................................195
tAmazonMysqlOutput Standard properties.......................................................................................................... 195
Related scenarios...........................................................................................................................................................200

tAmazonMysqlRollback............................................................................................201
tAmazonMysqlRollback Standard properties.......................................................................................................201
Related scenario.............................................................................................................................................................202

tAmazonMysqlRow................................................................................................... 203
tAmazonMysqlRow Standard properties............................................................................................................... 203
Related scenario.............................................................................................................................................................206

tAmazonOracleClose................................................................................................ 207
tAmazonOracleClose Standard properties............................................................................................................207
Related scenario.............................................................................................................................................................208
tAmazonOracleCommit............................................................................................ 209
tAmazonOracleCommit Standard properties....................................................................................................... 209
Related scenario.............................................................................................................................................................210

tAmazonOracleConnection...................................................................................... 211
tAmazonOracleConnection Standard properties................................................................................................ 211
Related scenario.............................................................................................................................................................213

tAmazonOracleInput.................................................................................................214
tAmazonOracleInput Standard properties............................................................................................................ 214
Related scenarios...........................................................................................................................................................217

tAmazonOracleOutput..............................................................................................218
tAmazonOracleOutput Standard properties.........................................................................................................218
Related scenarios...........................................................................................................................................................223

tAmazonOracleRollback........................................................................................... 224
tAmazonOracleRollback Standard properties......................................................................................................224
Related scenario.............................................................................................................................................................225

tAmazonOracleRow.................................................................................................. 226
tAmazonOracleRow Standard properties..............................................................................................................226
Related scenarios...........................................................................................................................................................229

tAmazonRedshiftManage.........................................................................................230
tAmazonRedshiftManage Standard properties................................................................................................... 230
Related scenario.............................................................................................................................................................233

tApacheLogInput.......................................................................................................234
tApacheLogInput Standard properties...................................................................................................................234
Reading an Apache access-log file.........................................................................................................................235

tAS400Close.............................................................................................................. 237
tAS400Close Standard properties............................................................................................................................237
Related scenario.............................................................................................................................................................238

tAS400Commit.......................................................................................................... 239
tAS400Commit Standard properties....................................................................................................................... 239
Related scenario.............................................................................................................................................................240

tAS400Connection.................................................................................................... 241
tAS400Connection Standard properties................................................................................................................ 241
Related scenario.............................................................................................................................................................242

tAS400Input.............................................................................................................. 243
tAS400Input Standard properties............................................................................................................................ 243
Handling data with AS/400....................................................................................................................................... 245
Related scenarios...........................................................................................................................................................249

tAS400LastInsertId................................................................................................... 250
tAS400LastInsertId Standard properties............................................................................................................... 250
Related scenario.............................................................................................................................................................251

tAS400Output........................................................................................................... 252
tAS400Output Standard properties.........................................................................................................................252
Related scenarios...........................................................................................................................................................256

tAS400Rollback.........................................................................................................257
tAS400Rollback Standard properties..................................................................................................................... 257
Related scenarios...........................................................................................................................................................258

tAS400Row................................................................................................................ 259
tAS400Row Standard properties..............................................................................................................................259
Related scenarios...........................................................................................................................................................262

tAssert........................................................................................................................ 263
tAssert Standard properties....................................................................................................................................... 263
Viewing product orders status (on a daily basis) against a benchmark number....................................264
Setting up the assertive condition for a Job execution.................................................................................. 267

tAssertCatcher........................................................................................................... 273
tAssertCatcher Standard properties........................................................................................................................ 273
Related scenarios...........................................................................................................................................................274

tAzureAdlsGen2Input............................................................................................... 275
tAzureAdlsGen2Input Standard properties...........................................................................................................275
Related scenario.............................................................................................................................................................277

tAzureAdlsGen2Output............................................................................................ 278
tAzureAdlsGen2Output Standard properties....................................................................................................... 278
Accessing Azure ADLS Gen2 storage..................................................................................................................... 280

tAzureStorageConnection........................................................................................ 283
tAzureStorageConnection Standard properties.................................................................................................. 283
Related scenario.............................................................................................................................................................284

tAzureStorageContainerCreate............................................................................... 285
tAzureStorageContainerCreate Standard properties.........................................................................................285
Creating a container in Azure Storage.................................................................................................................. 286

tAzureStorageContainerDelete............................................................................... 291
tAzureStorageContainerDelete Standard properties.........................................................................................291
Related scenarios...........................................................................................................................................................292

tAzureStorageContainerExist.................................................................................. 293
tAzureStorageContainerExist Standard properties............................................................................................293
Related scenario.............................................................................................................................................................294

tAzureStorageContainerList.................................................................................... 295
tAzureStorageContainerList Standard properties.............................................................................................. 295
Related scenario.............................................................................................................................................................297

tAzureStorageDelete................................................................................................ 298
tAzureStorageDelete Standard properties............................................................................................................298
Related scenarios...........................................................................................................................................................300

tAzureStorageGet......................................................................................................301
tAzureStorageGet Standard properties.................................................................................................................. 301
Retrieving files from a Azure Storage container............................................................................................... 303

tAzureStorageInputTable.........................................................................................310
tAzureStorageInputTable Standard properties................................................................................................... 310
Handling data with Microsoft Azure Table storage..........................................................................................313

tAzureStorageList..................................................................................................... 320
tAzureStorageList Standard properties..................................................................................................................320
Related scenario.............................................................................................................................................................322

tAzureStorageOutputTable......................................................................................323
tAzureStorageOutputTable Standard properties................................................................................................323
Related scenario.............................................................................................................................................................326

tAzureStoragePut......................................................................................................327
tAzureStoragePut Standard properties.................................................................................................................. 327
Related scenario.............................................................................................................................................................329

tAzureStorageQueueCreate..................................................................................... 330
tAzureStorageQueueCreate Standard properties............................................................................................... 330
Related scenario.............................................................................................................................................................331

tAzureStorageQueueDelete..................................................................................... 332
tAzureStorageQueueDelete Standard properties...............................................................................................332
Related scenario.............................................................................................................................................................333

tAzureStorageQueueInput....................................................................................... 334
tAzureStorageQueueInput Standard properties................................................................................................. 334
Related scenario.............................................................................................................................................................336
tAzureStorageQueueInputLoop...............................................................................337
tAzureStorageQueueInputLoop Standard properties........................................................................................337
Related scenario.............................................................................................................................................................339

tAzureStorageQueueList.......................................................................................... 340
tAzureStorageQueueList Standard properties.....................................................................................................340
Related scenario.............................................................................................................................................................342

tAzureStorageQueueOutput.................................................................................... 343
tAzureStorageQueueOutput Standard properties.............................................................................................. 343
Related scenario.............................................................................................................................................................345

tAzureStorageQueuePurge...................................................................................... 346
tAzureStorageQueuePurge Standard properties................................................................................................ 346
Related scenario.............................................................................................................................................................347

tBarChart....................................................................................................................348
tBarChart Standard properties.................................................................................................................................. 348
Creating a bar chart from the input data.............................................................................................................350

tBigQueryBulkExec................................................................................................... 357
tBigQueryBulkExec Standard properties............................................................................................................... 357
Related Scenario............................................................................................................................................................ 360

tBigQueryInput..........................................................................................................361
tBigQueryInput Standard properties.......................................................................................................................361
Performing a query in Google BigQuery.............................................................................................................. 364

tBigQueryOutput....................................................................................................... 368
tBigQueryOutput Standard properties................................................................................................................... 368
Writing data in Google BigQuery............................................................................................................................ 371

tBigQueryOutputBulk............................................................................................... 379
tBigQueryOutputBulk Standard properties.......................................................................................................... 379
Related Scenario............................................................................................................................................................ 381

tBigQuerySQLRow.....................................................................................................382
tBigQuerySQLRow Standard properties................................................................................................................ 382

tBonitaDeploy............................................................................................................385
tBonitaDeploy Standard properties.........................................................................................................................385
Related Scenario............................................................................................................................................................ 386

tBonitaInstantiateProcess........................................................................................387
tBonitaInstantiateProcess Standard properties.................................................................................................. 387
Executing a Bonita process via a Talend Job..................................................................................................... 390
Outputting the process instance UUID over the Row > Main link.............................................................. 395

tBoxConnection.........................................................................................................398
tBoxConnection Standard properties..................................................................................................................... 398
Related scenario.............................................................................................................................................................399

tBoxCopy....................................................................................................................400
tBoxCopy Standard properties..................................................................................................................................400
Related scenarios...........................................................................................................................................................402

tBoxDelete................................................................................................................. 403
tBoxDelete Standard properties...............................................................................................................................403
Related scenarios...........................................................................................................................................................404

tBoxGet...................................................................................................................... 405
tBoxGet Standard properties..................................................................................................................................... 405
Related scenario.............................................................................................................................................................406

tBoxList...................................................................................................................... 407
tBoxList Standard properties.....................................................................................................................................407
Related scenarios...........................................................................................................................................................408

tBoxPut...................................................................................................................... 409
tBoxPut Standard properties..................................................................................................................................... 409
Uploading and downloading files from Box....................................................................................................... 411

tBufferInput............................................................................................................... 414
tBufferInput Standard properties.............................................................................................................................414
Retrieving bufferized data..........................................................................................................................................415

tBufferOutput............................................................................................................ 417
tBufferOutput Standard properties......................................................................................................................... 417
Buffering data..................................................................................................................................................................418
Buffering data to be used as a source system...................................................................................................420
Buffering output data on the webapp server..................................................................................................... 421
Calling a Job with context variables from a browser...................................................................................... 424
Calling a Job exported as Webservice in another Job..................................................................................... 426

tCassandraBulkExec..................................................................................................429
tCassandraBulkExec Standard properties............................................................................................................. 429
Related scenarios...........................................................................................................................................................430

tCassandraClose........................................................................................................ 431
tCassandraClose Standard properties.................................................................................................................... 431
Related Scenario............................................................................................................................................................ 431
tCassandraConnection..............................................................................................432
tCassandraConnection Standard properties.........................................................................................................432
Related scenario.............................................................................................................................................................433

tCassandraInput........................................................................................................ 434
Mapping tables between Cassandra type and Talend data type.................................................................434
tCassandraInput Standard properties.....................................................................................................................435
Handling data with Cassandra..................................................................................................................................439

tCassandraOutput..................................................................................................... 445
tCassandraOutput Standard properties................................................................................................................. 445
Related Scenario............................................................................................................................................................ 450

tCassandraOutputBulk..............................................................................................451
tCassandraOutputBulk Standard properties.........................................................................................................451
Related scenarios...........................................................................................................................................................454

tCassandraOutputBulkExec......................................................................................455
tCassandraOutputBulkExec Standard properties............................................................................................... 455
Related scenarios...........................................................................................................................................................458

tCassandraRow.......................................................................................................... 459
tCassandraRow Standard properties.......................................................................................................................459
Related scenario.............................................................................................................................................................460

tChangeFileEncoding................................................................................................462
tChangeFileEncoding Standard properties...........................................................................................................462
Transforming the character encoding of a file.................................................................................................. 463

tChronometerStart.................................................................................................... 465
tChronometerStart Standard properties................................................................................................................465
Related scenario.............................................................................................................................................................465

tChronometerStop.................................................................................................... 466
tChronometerStop Standard properties................................................................................................................ 466
Measuring the processing time of a subJob and part of a subJob.............................................................. 467

tCloudStart.................................................................................................................471
tCloudStart Standard properties.............................................................................................................................. 471
Related scenarios...........................................................................................................................................................473

tCloudStop................................................................................................................. 474
tCloudStop Standard properties...............................................................................................................................474
Related scenarios...........................................................................................................................................................475
tCombinedSQLAggregate.........................................................................................476
tCombinedSQLAggregate Standard properties...................................................................................................476
Filtering and aggregating table columns directly on the DBMS................................................................. 478

tCombinedSQLFilter................................................................................................. 488
tCombinedSQLFilter Standard properties.............................................................................................................488
Related Scenario............................................................................................................................................................ 489

tCombinedSQLInput................................................................................................. 490
tCombinedSQLInput Standard properties.............................................................................................................490
Related scenario.............................................................................................................................................................491

tCombinedSQLOutput...............................................................................................492
tCombinedSQLOutput Standard properties......................................................................................................... 492
Related scenario.............................................................................................................................................................493

tContextDump........................................................................................................... 494
tContextDump Standard properties........................................................................................................................494
Related scenarios...........................................................................................................................................................495

tContextLoad.............................................................................................................496
tContextLoad Standard properties.......................................................................................................................... 496
Reading data from different MySQL databases using dynamically loaded connection parameters..497

tConvertType............................................................................................................. 504
tConvertType Standard properties.......................................................................................................................... 504
Converting java types.................................................................................................................................................. 505

tCosmosDBBulkLoad................................................................................................ 510
tCosmosDBBulkLoad Standard properties............................................................................................................510

tCosmosDBConnection............................................................................................. 513
tCosmosDBConnection Standard properties........................................................................................................513

tCosmosDBInput........................................................................................................515
tCosmosDBInput Standard properties....................................................................................................................515

tCosmosDBOutput.....................................................................................................519
tCosmosDBOutput Standard properties................................................................................................................ 519

tCosmosDBRow......................................................................................................... 524
tCosmosDBRow Standard properties......................................................................................................................524

tCouchbaseDCPInput................................................................................................ 527
tCouchbaseDCPInput Standard properties........................................................................................................... 527

tCouchbaseDCPOutput............................................................................................. 529
tCouchbaseDCPOutput Standard properties........................................................................................................529

tCouchbaseInput....................................................................................................... 532
tCouchbaseInput Standard properties................................................................................................................... 532

tCouchbaseOutput.................................................................................................... 537
tCouchbaseOutput Standard properties................................................................................................................ 537

tCreateTable.............................................................................................................. 540
tCreateTable Standard properties........................................................................................................................... 540
Creating new table in a Mysql Database............................................................................................................. 544

tCreateTemporaryFile...............................................................................................546
tCreateTemporaryFile Standard properties..........................................................................................................546
Creating a temporary file and writing data into it........................................................................................... 547

tDB2BulkExec............................................................................................................553
tDB2BulkExec Standard properties.........................................................................................................................553
Related scenarios...........................................................................................................................................................558

tDB2Close.................................................................................................................. 559
tDB2Close Standard properties................................................................................................................................ 559
Related scenarios...........................................................................................................................................................560

tDB2Commit.............................................................................................................. 561
tDB2Commit Standard properties............................................................................................................................561
Related scenario.............................................................................................................................................................562

tDB2Connection........................................................................................................ 563
tDB2Connection Standard properties.....................................................................................................................563
Related scenarios...........................................................................................................................................................565

tDB2Input...................................................................................................................566
tDB2Input Standard properties.................................................................................................................................566
Related scenarios...........................................................................................................................................................569

tDB2Output................................................................................................................570
tDB2Output Standard properties............................................................................................................................. 570
Related scenarios...........................................................................................................................................................575

tDB2Rollback.............................................................................................................576
tDB2Rollback Standard properties.......................................................................................................................... 576
Related scenarios...........................................................................................................................................................577
tDB2Row.................................................................................................................... 578
tDB2Row Standard properties.................................................................................................................................. 578
Related scenarios...........................................................................................................................................................581

tDB2SCD.....................................................................................................................582
tDB2SCD Standard properties...................................................................................................................................582
Related scenarios...........................................................................................................................................................585

tDB2SCDELT.............................................................................................................. 586
tDB2SCDELT Standard properties........................................................................................................................... 586
Related Scenarios.......................................................................................................................................................... 590

tDB2SP....................................................................................................................... 591
tDB2SP Standard properties...................................................................................................................................... 591
Related scenarios...........................................................................................................................................................593

Dynamic database components.............................................................................. 595

tDBBulkExec.............................................................................................................. 596
tDBBulkExec Standard properties........................................................................................................................... 596

tDBClose.....................................................................................................................597
tDBClose Standard properties...................................................................................................................................597

tDBColumnList.......................................................................................................... 598
tDBColumnList Standard properties....................................................................................................................... 598

tDBCommit.................................................................................................................599
tDBCommit Standard properties.............................................................................................................................. 599

tDBConnection.......................................................................................................... 600
tDBConnection Standard properties....................................................................................................................... 600

tDBInput.....................................................................................................................601
tDBInput Standard properties................................................................................................................................... 601

tDBLastInsertId......................................................................................................... 603
tDBLastInsertId Standard properties...................................................................................................................... 603

tDBOutput.................................................................................................................. 604
tDBOutput Standard properties................................................................................................................................604

tDBOutputBulk.......................................................................................................... 606
tDBOutputBulk Standard properties....................................................................................................................... 606
tDBOutputBulkExec.................................................................................................. 607
tDBOutputBulkExec Standard properties..............................................................................................................607

tDBRollback............................................................................................................... 608
tDBRollback Standard properties.............................................................................................................................608

tDBRow...................................................................................................................... 609
tDBRow Standard properties.....................................................................................................................................609

tDBSCD....................................................................................................................... 610
tDBSCD Standard properties..................................................................................................................................... 610

tDBSCDELT................................................................................................................ 611
tDBSCDELT Standard properties.............................................................................................................................. 611

tDBSP..........................................................................................................................612
tDBSP Standard properties.........................................................................................................................................612

tDBTableList.............................................................................................................. 613
tDBTableList Standard properties........................................................................................................................... 613

tDBFSConnection...................................................................................................... 614
tDBFSConnection Standard properties.................................................................................................................. 614

tDBFSGet....................................................................................................................615
tDBFSGet Standard properties..................................................................................................................................615

tDBFSPut....................................................................................................................617
tDBFSPut Standard properties.................................................................................................................................. 617

tDBSQLRow................................................................................................................619
tDBSQLRow Standard properties.............................................................................................................................619
Resetting a DB auto-increment................................................................................................................................621

tDenormalize............................................................................................................. 623
tDenormalize Standard properties.......................................................................................................................... 623
Denormalizing on one column................................................................................................................................. 624
Denormalizing on multiple columns......................................................................................................................626

tDenormalizeSortedRow.......................................................................................... 629
tDenormalizeSortedRow Standard properties.....................................................................................................629
Regrouping sorted rows.............................................................................................................................................. 630

tDie............................................................................................................................. 634
tDie Standard properties.............................................................................................................................................634
Related scenarios...........................................................................................................................................................635

tDotNETInstantiate................................................................................................... 636
tDotNETInstantiate Standard properties...............................................................................................................636
Related scenario.............................................................................................................................................................637

tDotNETRow.............................................................................................................. 638
tDotNETRow Standard properties........................................................................................................................... 638
Integrating .Net into Talend Studio: Introduction............................................................................................ 640
Integrating .Net into Talend Studio: Prerequisites........................................................................................... 640
Integrating .Net into Talend Studio: configuring the Job...............................................................................641
Utilizing .NET in Talend..............................................................................................................................................643

tDropboxConnection.................................................................................................647
tDropboxConnection Standard properties............................................................................................................647
Related scenario.............................................................................................................................................................647

tDropboxDelete.........................................................................................................648
tDropboxDelete Standard properties..................................................................................................................... 648
Related scenarios...........................................................................................................................................................649

tDropboxGet.............................................................................................................. 650
tDropboxGet Standard properties............................................................................................................................650
Related scenarios...........................................................................................................................................................651

tDropboxList..............................................................................................................652
tDropboxList Standard properties........................................................................................................................... 652
Related scenarios...........................................................................................................................................................653

tDropboxPut.............................................................................................................. 654
tDropboxPut Standard properties............................................................................................................................654
Uploading files to Dropbox....................................................................................................................................... 655

tDTDValidator............................................................................................................661
tDTDValidator Standard properties.........................................................................................................................661
Validating XML files..................................................................................................................................................... 662
Validating XML files..................................................................................................................................................... 662

tDynamoDBInput.......................................................................................................665
tDynamoDBInput Standard properties...................................................................................................................665
Writing and extracting JSON documents from DynamoDB............................................................................668

tDynamoDBOutput....................................................................................................675
tDynamoDBOutput Standard properties............................................................................................................... 675
Related scenarios...........................................................................................................................................................677
tEDIFACTtoXML.........................................................................................................678
tEDIFACTtoXML Standard properties..................................................................................................................... 678
Reading an EDIFACT message file and saving it to XML...............................................................................679

tELTGreenplumInput................................................................................................ 682
tELTGreenplumInput Standard properties............................................................................................................682
Related scenarios...........................................................................................................................................................683

tELTGreenplumMap.................................................................................................. 684
tELTGreenplumMap Standard properties..............................................................................................................684
Mapping data using a simple implicit join..........................................................................................................686
Related scenarios...........................................................................................................................................................693

tELTGreenplumOutput............................................................................................. 694
tELTGreenplumOutput Standard properties........................................................................................................ 694
Related scenarios...........................................................................................................................................................696

tELTHiveInput............................................................................................................697
tELTHiveInput Standard properties........................................................................................................................ 697
Related scenarios...........................................................................................................................................................698

tELTHiveMap............................................................................................................. 699
tELTHiveMap Standard properties.......................................................................................................................... 699
Joining table columns and writing them into Hive.......................................................................................... 710
Related scenarios...........................................................................................................................................................717

tELTHiveOutput.........................................................................................................718
tELTHiveOutput Standard properties..................................................................................................................... 718
Related scenarios...........................................................................................................................................................720

tELTInput................................................................................................................... 721
tELTInput Standard properties................................................................................................................................. 721
Related scenarios...........................................................................................................................................................722

tELTMap..................................................................................................................... 723
tELTMap Standard properties................................................................................................................................... 723
Aggregating Snowflake data using context variables as table and connection names.......................725
Related scenarios...........................................................................................................................................................729

tELTOutput................................................................................................................ 730
tELTOutput Standard properties.............................................................................................................................. 730
Related scenarios...........................................................................................................................................................732

tELTMSSqlInput........................................................................................................ 733
tELTMSSqlInput Standard properties.....................................................................................................................733
Related scenarios...........................................................................................................................................................734

tELTMSSqlMap.......................................................................................................... 735
tELTMSSqlMap Standard properties.......................................................................................................................735
Related scenarios...........................................................................................................................................................737

tELTMSSqlOutput......................................................................................................738
tELTMSSqlOutput Standard properties..................................................................................................................738
Related scenarios...........................................................................................................................................................740

tELTMysqlInput......................................................................................................... 741
tELTMysqlInput Standard properties......................................................................................................................741
Related scenarios...........................................................................................................................................................742

tELTMysqlMap...........................................................................................................743
tELTMysqlMap Standard properties........................................................................................................................743
Aggregating table columns and filtering............................................................................................................. 745
Mapping date using using an Alias table............................................................................................................ 749
Related scenarios...........................................................................................................................................................753

tELTMysqlOutput...................................................................................................... 754
tELTMysqlOutput Standard properties.................................................................................................................. 754
Related scenarios...........................................................................................................................................................756

tELTNetezzaInput..................................................................................................... 757
tELTNetezzaInput Standard properties..................................................................................................................757
Related scenarios...........................................................................................................................................................758

tELTNetezzaMap....................................................................................................... 759
tELTNetezzaMap Standard properties....................................................................................................................759
Related scenarios...........................................................................................................................................................761

tELTNetezzaOutput.................................................................................................. 762
tELTNetezzaOutput Standard properties.............................................................................................................. 762
Related scenarios...........................................................................................................................................................764

tELTOracleInput........................................................................................................ 765
tELTOracleInput Standard properties.....................................................................................................................765
Related scenarios...........................................................................................................................................................766

tELTOracleMap.......................................................................................................... 767
tELTOracleMap Standard properties.......................................................................................................................767
Updating Oracle database entries...........................................................................................................................769
Related scenario.............................................................................................................................................................772

tELTOracleOutput..................................................................................................... 773
tELTOracleOutput Standard properties................................................................................................................. 773
Managing data using the Oracle MERGE function............................................................................................775

tELTPostgresqlInput................................................................................................. 780
tELTPostgresqlInput Standard properties.............................................................................................................780
Related scenarios...........................................................................................................................................................781

tELTPostgresqlMap...................................................................................................782
tELTPostgresqlMap Standard properties...............................................................................................................782
Related scenarios...........................................................................................................................................................784

tELTPostgresqlOutput.............................................................................................. 785
tELTPostgresqlOutput Standard properties......................................................................................................... 785
Related scenarios...........................................................................................................................................................787

tELTSybaseInput....................................................................................................... 788
tELTSybaseInput Standard properties....................................................................................................................788
Related scenarios...........................................................................................................................................................789

tELTSybaseMap......................................................................................................... 790
tELTSybaseMap Standard properties......................................................................................................................790
Related scenarios...........................................................................................................................................................792

tELTSybaseOutput.................................................................................................... 793
tELTSybaseOutput Standard properties................................................................................................................ 793
Related scenarios...........................................................................................................................................................795

tELTTeradataInput.................................................................................................... 796
tELTTeradataInput Standard properties................................................................................................................796
Related scenarios...........................................................................................................................................................797

tELTTeradataMap......................................................................................................798
tELTTeradataMap Standard properties..................................................................................................................798
Mapping data using a subquery.............................................................................................................................. 800
Related scenarios...........................................................................................................................................................809

tELTTeradataOutput................................................................................................. 810
tELTTeradataOutput Standard properties.............................................................................................................810
Related scenarios...........................................................................................................................................................812

tELTVerticaInput....................................................................................................... 813
tELTVerticaInput Standard properties....................................................................................................................813
Related scenarios...........................................................................................................................................................814

tELTVerticaMap.........................................................................................................815
tELTVerticaMap Standard properties......................................................................................................................815
Related scenarios...........................................................................................................................................................817

tELTVerticaOutput.................................................................................................... 818
tELTVerticaOutput Standard properties................................................................................................................ 818
Related scenarios...........................................................................................................................................................820

tESBConsumer........................................................................................................... 821
tESBConsumer Standard properties........................................................................................................................821
Using tESBConsumer to retrieve the valid email..............................................................................................826
Using tESBConsumer with custom SOAP Headers............................................................................................833

tESBProviderFault.....................................................................................................844
tESBProviderFault Standard properties................................................................................................................. 844
Requesting airport names based on country codes......................................................................................... 845

tESBProviderRequest................................................................................................857
tESBProviderRequest Standard properties........................................................................................................... 857
Sending a message without expecting a response.......................................................................................... 859

tESBProviderResponse............................................................................................. 869
tESBProviderResponse Standard properties........................................................................................................ 869
Returning Hello world response..............................................................................................................................870

tEXABulkExec............................................................................................................ 881
tEXABulkExec Standard properties......................................................................................................................... 881
Settings for different sources of import data..................................................................................................... 886
Importing data into an EXASolution database table from a local CSV file..............................................889

tEXAClose...................................................................................................................895
tEXAClose Standard properties................................................................................................................................ 895
Related scenario.............................................................................................................................................................896

tEXACommit.............................................................................................................. 897
tEXACommit Standard properties............................................................................................................................897
Related scenario.............................................................................................................................................................898

tEXAConnection........................................................................................................ 899
tEXAConnection Standard properties.....................................................................................................................899
Related scenario.............................................................................................................................................................901

tEXAInput...................................................................................................................902
tEXAInput Standard properties.................................................................................................................................902
Related scenario.............................................................................................................................................................905

tEXAOutput................................................................................................................906
tEXAOutput Standard properties............................................................................................................................. 906
Related scenario.............................................................................................................................................................911

tEXARollback............................................................................................................. 912
tEXARollback Standard properties.......................................................................................................................... 912
Related Scenario............................................................................................................................................................ 913

tEXARow.................................................................................................................... 914
tEXARow Standard properties...................................................................................................................................914
Related Scenario............................................................................................................................................................ 917

tEXistConnection.......................................................................................................918
tEXistConnection Standard properties...................................................................................................................918
Related scenarios...........................................................................................................................................................919

tEXistDelete...............................................................................................................920
tEXistDelete Standard properties............................................................................................................................ 920
Related scenarios...........................................................................................................................................................921

tEXistGet.................................................................................................................... 922
tEXistGet Standard properties.................................................................................................................................. 922
Retrieving resources from a remote eXist DB server...................................................................................... 923

tEXistList....................................................................................................................926
tEXistList Standard properties.................................................................................................................................. 926
Related scenario.............................................................................................................................................................927

tEXistPut.................................................................................................................... 928
tEXistPut Standard properties...................................................................................................................................928
Related scenarios...........................................................................................................................................................929

tEXistXQuery............................................................................................................. 930
tEXistXQuery Standard properties...........................................................................................................................930
Related scenarios...........................................................................................................................................................931

tEXistXUpdate........................................................................................................... 932
tEXistXUpdate Standard properties........................................................................................................................ 932
Related scenarios...........................................................................................................................................................933

tExternalSortRow......................................................................................................934
tExternalSortRow Standard properties.................................................................................................................. 934
Related scenario.............................................................................................................................................................936

tExtractDelimitedFields........................................................................................... 937
tExtractDelimitedFields Standard properties...................................................................................................... 937
Extracting a delimited string column of a database table............................................................................ 939
tExtractJSONFields....................................................................................................945
tExtractJSONFields Standard properties............................................................................................................... 945
Retrieving error messages while extracting data from JSON fields........................................................... 947
Collecting data from your favorite online social network............................................................................. 952
Extracting data from a JSON file through looping........................................................................................... 956

tExtractPositionalFields........................................................................................... 963
tExtractPositionalFields Standard properties......................................................................................................963
Related scenario.............................................................................................................................................................965

tExtractRegexFields..................................................................................................966
tExtractRegexFields Standard properties............................................................................................................. 966
Extracting name, domain and TLD from e-mail addresses............................................................................967

tExtractXMLField...................................................................................................... 971
tExtractXMLField Standard properties...................................................................................................................971
Extracting XML data from a field in a database table....................................................................................973
Extracting correct and erroneous data from an XML field in a delimited file........................................975

tFileArchive................................................................................................................979
tFileArchive Standard properties............................................................................................................................. 979
Zipping files using a tFileArchive........................................................................................................................... 981

tFileCompare............................................................................................................. 984
tFileCompare Standard properties.......................................................................................................................... 984
Comparing unzipped files...........................................................................................................................................985

tFileCopy.................................................................................................................... 988
tFileCopy Standard properties.................................................................................................................................. 988
Restoring files from bin.............................................................................................................................................. 990

tFileDelete................................................................................................................. 992
tFileDelete Standard properties...............................................................................................................................992
Deleting files................................................................................................................................................................... 993

tFileExist.................................................................................................................... 995
tFileExist Standard properties.................................................................................................................................. 995
Checking for the presence of a file and creating it if it does not exist.................................................... 996

tFileFetch.................................................................................................................1000
tFileFetch Standard properties.............................................................................................................................. 1000
Fetching data through HTTP.................................................................................................................................. 1003
Reusing stored cookie to fetch files through HTTP...................................................................................... 1005
Related scenario.......................................................................................................................................................... 1009
tFileInputARFF........................................................................................................ 1010
tFileInputARFF Standard properties.....................................................................................................................1010
Displaying the content of a ARFF file................................................................................................................ 1011

tFileInputDelimited................................................................................................ 1015
tFileInputDelimited Standard properties............................................................................................................1015
Reading data from a Delimited file and display the output.......................................................................1018
Reading data from a remote file in streaming mode....................................................................................1020

tFileInputExcel........................................................................................................ 1024
tFileInputExcel Standard properties.................................................................................................................... 1024
Related scenarios........................................................................................................................................................ 1027

tFileInputFullRow................................................................................................... 1028
tFileInputFullRow Standard properties...............................................................................................................1028
Reading full rows in a delimited file.................................................................................................................. 1029

tFileInputJSON........................................................................................................ 1032
tFileInputJSON Standard properties.....................................................................................................................1032
Extracting JSON data from a file using JSONPath without setting a loop node..................................1034
Extracting JSON data from a file using JSONPath..........................................................................................1037
Extracting JSON data from a file using XPath.................................................................................................1039
Extracting JSON data from a URL.........................................................................................................................1040

tFileInputLDIF......................................................................................................... 1045
tFileInputLDIF Standard properties......................................................................................................................1045
Related scenario.......................................................................................................................................................... 1047

tFileInputMail..........................................................................................................1048
tFileInputMail Standard properties...................................................................................................................... 1048
Extracting key fields from an email.................................................................................................................... 1050

tFileInputMSDelimited...........................................................................................1052
tFileInputMSDelimited Standard properties..................................................................................................... 1052
The Multi Schema Editor......................................................................................................................................... 1053
Reading a multi structure delimited file............................................................................................................1054

tFileInputMSPositional.......................................................................................... 1061
tFileInputMSPositional Standard properties..................................................................................................... 1061
Reading data from a positional file.....................................................................................................................1063

tFileInputMSXML....................................................................................................1067
tFileInputMSXML Standard properties................................................................................................................1067
Reading a multi-structure XML file..................................................................................................................... 1068
tFileInputPositional................................................................................................1072
tFileInputPositional Standard properties........................................................................................................... 1072
Reading a Positional file and saving filtered results to XML.....................................................................1075

tFileInputProperties............................................................................................... 1079
tFileInputProperties Standard properties...........................................................................................................1079
Reading and matching the keys and the values of different .properties files and outputting the
results in a glossary...................................................................................................................................................1080

tFileInputRaw..........................................................................................................1085
tFileInputRaw Standard properties.......................................................................................................................1085
Related Scenario..........................................................................................................................................................1086

tFileInputRegex...................................................................................................... 1087
tFileInputRegex Standard properties...................................................................................................................1087
Reading data using a Regex and outputting the result to Positional file............................................. 1089

tFileInputXML......................................................................................................... 1092
tFileInputXML Standard properties...................................................................................................................... 1092
Reading and extracting data from an XML structure....................................................................................1095
Extracting erroneous XML data via a reject flow...........................................................................................1096

tFileList.................................................................................................................... 1100
tFileList Standard properties.................................................................................................................................. 1100
Iterating on a file directory.....................................................................................................................................1102
Finding duplicate files between two folders....................................................................................................1104

tFileOutputARFF..................................................................................................... 1110
tFileOutputARFF Standard properties................................................................................................................. 1110
Related scenario.......................................................................................................................................................... 1112

tFileOutputDelimited............................................................................................. 1113
tFileOutputDelimited Standard properties........................................................................................................ 1113
Writing data in a delimited file.............................................................................................................................1116
Utilizing Output Stream to save filtered data to a local file......................................................................1120

tFileOutputExcel..................................................................................................... 1123
tFileOutputExcel Standard properties.................................................................................................................1123
Related scenario.......................................................................................................................................................... 1126

tFileOutputJSON..................................................................................................... 1127
tFileOutputJSON Standard properties..................................................................................................................1127
Writing a JSON structured file............................................................................................................................... 1128

tFileOutputLDIF...................................................................................................... 1131
tFileOutputLDIF Standard properties.................................................................................................................. 1131
Writing data from a database table into an LDIF file...................................................................................1133

tFileOutputMSDelimited........................................................................................1138
tFileOutputMSDelimited Standard properties.................................................................................................. 1138
Related scenarios........................................................................................................................................................ 1139

tFileOutputMSPositional....................................................................................... 1140
tFileOutputMSPositional Standard properties..................................................................................................1140
Related scenarios........................................................................................................................................................ 1141

tFileOutputMSXML................................................................................................. 1142
tFileOutputMSXML Standard properties.............................................................................................................1142
Defining the MultiSchema XML tree................................................................................................................... 1143
Mapping XML data from multiple schema sources....................................................................................... 1144
Defining the node status......................................................................................................................................... 1145
Related scenarios........................................................................................................................................................ 1146

tFileOutputPositional.............................................................................................1147
tFileOutputPositional Standard properties........................................................................................................1147
Related scenario.......................................................................................................................................................... 1150

tFileOutputProperties............................................................................................ 1151
tFileOutputProperties Standard properties....................................................................................................... 1151
Related scenarios........................................................................................................................................................ 1152

tFileOutputRaw.......................................................................................................1153
tFileOutputRaw Standard properties................................................................................................................... 1153

tFileOutputXML...................................................................................................... 1155
tFileOutputXML Standard properties...................................................................................................................1155
Related scenarios........................................................................................................................................................ 1157

tFileProperties........................................................................................................ 1158
tFileProperties Standard properties..................................................................................................................... 1158
Displaying the properties of a processed file.................................................................................................. 1159

tFileRowCount.........................................................................................................1161
tFileRowCount Standard properties..................................................................................................................... 1161
Writing a file to MySQL if the number of its records matches a reference value............................... 1162

tFileTouch................................................................................................................1166
tFileTouch Standard properties............................................................................................................................. 1166
Related scenarios........................................................................................................................................................ 1167

tFileUnarchive......................................................................................................... 1168
tFileUnarchive Standard properties......................................................................................................................1168
Related scenario.......................................................................................................................................................... 1169

tFilterColumns........................................................................................................ 1170
tFilterColumns Standard properties..................................................................................................................... 1170
Related Scenario..........................................................................................................................................................1171

tFilterRow................................................................................................................ 1172
tFilterRow Standard properties..............................................................................................................................1172
Filtering a list of names using simple conditions.......................................................................................... 1173
Filtering a list of names through different logical operations.................................................................. 1177

tFirebirdClose..........................................................................................................1179
tFirebirdClose Standard properties.......................................................................................................................1179
Related scenarios........................................................................................................................................................ 1180

tFirebirdCommit......................................................................................................1181
tFirebirdCommit Standard properties..................................................................................................................1181
Related scenario.......................................................................................................................................................... 1182

tFirebirdConnection................................................................................................1183
tFirebirdConnection Standard properties...........................................................................................................1183
Related scenarios........................................................................................................................................................ 1184

tFirebirdInput.......................................................................................................... 1185
tFirebirdInput Standard properties.......................................................................................................................1185
Related scenarios........................................................................................................................................................ 1187

tFirebirdOutput....................................................................................................... 1189
tFirebirdOutput Standard properties....................................................................................................................1189
Related scenarios........................................................................................................................................................ 1193

tFirebirdRollback.................................................................................................... 1194
tFirebirdRollback Standard properties................................................................................................................ 1194
Related scenario.......................................................................................................................................................... 1195

tFirebirdRow............................................................................................................1196
tFirebirdRow Standard properties.........................................................................................................................1196
Related scenarios........................................................................................................................................................ 1199

tFixedFlowInput......................................................................................................1200
tFixedFlowInput Standard properties..................................................................................................................1200
Related scenarios........................................................................................................................................................ 1201

tFlowMeter.............................................................................................................. 1202
tFlowMeter Standard properties............................................................................................................................1202
Related scenario.......................................................................................................................................................... 1203

tFlowMeterCatcher................................................................................................. 1204
tFlowMeterCatcher Standard properties.............................................................................................................1204
Catching flow metrics from a Job.........................................................................................................................1205

tFlowToIterate........................................................................................................ 1209
tFlowToIterate Standard properties..................................................................................................................... 1209
Transforming data flow to a list...........................................................................................................................1210

tForeach................................................................................................................... 1214
tForeach Standard properties................................................................................................................................. 1214
Iterating on a list and retrieving the values.................................................................................................... 1214

tFTPClose.................................................................................................................1217
tFTPClose Standard properties.............................................................................................................................. 1217
Related scenarios........................................................................................................................................................ 1217

tFTPConnection...................................................................................................... 1218
tFTPConnection Standard properties...................................................................................................................1218
Related scenarios........................................................................................................................................................ 1220

tFTPDelete...............................................................................................................1221
tFTPDelete Standard properties............................................................................................................................ 1221
Related scenario.......................................................................................................................................................... 1224

tFTPFileExist........................................................................................................... 1225
tFTPFileExist Standard properties........................................................................................................................ 1225
Related scenario.......................................................................................................................................................... 1227

tFTPFileList............................................................................................................. 1228
tFTPFileList Standard properties...........................................................................................................................1228
Listing and getting files/folders on an FTP directory...................................................................................1230

tFTPFileProperties..................................................................................................1236
tFTPFileProperties Standard properties..............................................................................................................1236
Related scenario.......................................................................................................................................................... 1238

tFTPGet.................................................................................................................... 1239
tFTPGet Standard properties.................................................................................................................................. 1239
Related scenario.......................................................................................................................................................... 1242

tFTPPut.................................................................................................................... 1243
tFTPPut Standard properties...................................................................................................................................1243
Putting files onto an FTP server...........................................................................................................................1246
tFTPRename............................................................................................................ 1250
tFTPRename Standard properties......................................................................................................................... 1250
Renaming a file located on an FTP server........................................................................................................1253

tFTPTruncate...........................................................................................................1256
tFTPTruncate Standard properties........................................................................................................................1256
Related scenario.......................................................................................................................................................... 1258

tFuzzyMatch............................................................................................................ 1259
tFuzzyMatch Standard properties......................................................................................................................... 1259
Checking the Levenshtein distance of 0 in first names............................................................................... 1260
Checking the Levenshtein distance of 1 or 2 in first names......................................................................1263
Checking the Metaphonic distance in first name........................................................................................... 1264

tGoogleDataprocManage....................................................................................... 1266
tGoogleDataprocManage Standard properties................................................................................................. 1266

tGoogleDriveConnection........................................................................................1268
tGoogleDriveConnection Standard properties..................................................................................................1268
OAuth methods for accessing Google Drive.....................................................................................................1270
Related scenario.......................................................................................................................................................... 1279

tGoogleDriveCopy...................................................................................................1280
tGoogleDriveCopy Standard properties...............................................................................................................1280
Related scenario.......................................................................................................................................................... 1282

tGoogleDriveCreate................................................................................................ 1283
tGoogleDriveCreate Standard properties............................................................................................................1283
Related scenario.......................................................................................................................................................... 1285

tGoogleDriveDelete................................................................................................ 1286
tGoogleDriveDelete Standard properties........................................................................................................... 1286
Related scenario.......................................................................................................................................................... 1288

tGoogleDriveGet..................................................................................................... 1289
tGoogleDriveGet Standard properties..................................................................................................................1289
Related scenario.......................................................................................................................................................... 1291

tGoogleDriveList..................................................................................................... 1292
tGoogleDriveList Standard properties................................................................................................................. 1292
Related scenario.......................................................................................................................................................... 1294

tGoogleDrivePut..................................................................................................... 1295
tGoogleDrivePut Standard properties..................................................................................................................1295
Managing files with Google Drive........................................................................................................................1297
tGPGDecrypt............................................................................................................ 1306
tGPGDecrypt Standard properties......................................................................................................................... 1306
Decrypting a GnuPG-encrypted file and display its content.......................................................................1307

tGreenplumBulkExec..............................................................................................1311
tGreenplumBulkExec Standard properties.........................................................................................................1311
Related scenarios........................................................................................................................................................ 1314

tGreenplumClose.................................................................................................... 1315
tGreenplumClose Standard properties................................................................................................................ 1315
Related scenarios........................................................................................................................................................ 1316

tGreenplumCommit................................................................................................ 1317
tGreenplumCommit Standard properties............................................................................................................1317
Related scenarios........................................................................................................................................................ 1318

tGreenplumConnection.......................................................................................... 1319
tGreenplumConnection Standard properties.................................................................................................... 1319
Related scenarios........................................................................................................................................................ 1320

tGreenplumGPLoad................................................................................................ 1321
tGreenplumGPLoad Standard properties............................................................................................................1321
Related scenario.......................................................................................................................................................... 1326

tGreenplumInput.....................................................................................................1327
tGreenplumInput Standard properties.................................................................................................................1327
Related scenarios........................................................................................................................................................ 1329

tGreenplumOutput..................................................................................................1330
tGreenplumOutput Standard properties............................................................................................................. 1330
Related scenarios........................................................................................................................................................ 1334

tGreenplumOutputBulk..........................................................................................1336
tGreenplumOutputBulk Standard properties.................................................................................................... 1336
Related scenarios........................................................................................................................................................ 1338

tGreenplumOutputBulkExec..................................................................................1339
tGreenplumOutputBulkExec Standard properties........................................................................................... 1339
Related scenarios........................................................................................................................................................ 1341

tGreenplumRollback...............................................................................................1342
tGreenplumRollback Standard properties..........................................................................................................1342
Related scenarios........................................................................................................................................................ 1343

tGreenplumRow...................................................................................................... 1344
tGreenplumRow Standard properties.................................................................................................................. 1344
Related scenarios........................................................................................................................................................ 1347

tGreenplumSCD.......................................................................................................1348
tGreenplumSCD Standard properties...................................................................................................................1348
Related scenario.......................................................................................................................................................... 1351

tGroovy.....................................................................................................................1352
tGroovy Standard properties................................................................................................................................... 1352
Related Scenarios........................................................................................................................................................1353

tGroovyFile.............................................................................................................. 1354
tGroovyFile Standard properties............................................................................................................................1354
Calling a file which contains Groovy code........................................................................................................1355

tGSBucketCreate..................................................................................................... 1357
tGSBucketCreate Standard properties................................................................................................................. 1357
Related scenario.......................................................................................................................................................... 1358

tGSBucketDelete.....................................................................................................1359
tGSBucketDelete Standard properties................................................................................................................. 1359
Related scenarios........................................................................................................................................................ 1360

tGSBucketExist........................................................................................................1361
tGSBucketExist Standard properties.................................................................................................................... 1361
Related scenario.......................................................................................................................................................... 1362

tGSBucketList.......................................................................................................... 1363
tGSBucketList Standard properties.......................................................................................................................1363
Related scenario.......................................................................................................................................................... 1364

tGSClose...................................................................................................................1365
tGSClose Standard properties.................................................................................................................................1365
Related scenario.......................................................................................................................................................... 1365

tGSConnection.........................................................................................................1366
tGSConnection Standard properties..................................................................................................................... 1366
Related scenario.......................................................................................................................................................... 1367

tGSCopy....................................................................................................................1368
tGSCopy Standard properties..................................................................................................................................1368
Related scenario.......................................................................................................................................................... 1369

tGSDelete.................................................................................................................1370
tGSDelete Standard properties.............................................................................................................................. 1370
Related scenario.......................................................................................................................................................... 1371
tGSGet...................................................................................................................... 1372
tGSGet Standard properties.....................................................................................................................................1372
Related scenarios........................................................................................................................................................ 1374

tGSList......................................................................................................................1375
tGSList Standard properties.................................................................................................................................... 1375
Related scenario.......................................................................................................................................................... 1376

tGSPut...................................................................................................................... 1377
tGSPut Standard properties.....................................................................................................................................1377
Managing files with Google Cloud Storage...................................................................................................... 1378

tHashInput............................................................................................................... 1386
tHashInput Standard properties............................................................................................................................ 1386
Reading data from the cache memory for high-speed data access......................................................... 1387
Clearing the memory before loading data to it in case an iterator exists in the same subJob....... 1391

tHashOutput............................................................................................................ 1395
tHashOutput Standard properties......................................................................................................................... 1395
Related scenarios........................................................................................................................................................ 1397

tHBaseClose.............................................................................................................1398
tHBaseClose Standard properties..........................................................................................................................1398
Related scenario.......................................................................................................................................................... 1399

tHBaseConnection.................................................................................................. 1400
tHBaseConnection Standard properties..............................................................................................................1400
Related scenario.......................................................................................................................................................... 1404

tHBaseInput.............................................................................................................1405
HBase filters.................................................................................................................................................................. 1405
tHBaseInput Standard properties..........................................................................................................................1406
Exchanging customer data with HBase..............................................................................................................1411

tHBaseOutput..........................................................................................................1419
tHBaseOutput Standard properties.......................................................................................................................1419
Related scenario.......................................................................................................................................................... 1424

tHCatalogInput........................................................................................................1425
tHCatalogInput Standard properties.................................................................................................................... 1425
Related scenario.......................................................................................................................................................... 1430

tHCatalogLoad........................................................................................................ 1431
tHCatalogLoad Standard properties.....................................................................................................................1431
Related scenario.......................................................................................................................................................... 1435
tHCatalogOperation................................................................................................1436
tHCatalogOperation Standard properties...........................................................................................................1436
Managing HCatalog tables on Hortonworks Data Platform........................................................................1444

tHCatalogOutput.....................................................................................................1453
tHCatalogOutput Standard properties.................................................................................................................1453
Related scenario.......................................................................................................................................................... 1459

tHDFSCompare........................................................................................................1460
tHDFSCompare Standard properties.................................................................................................................... 1460
Related scenarios........................................................................................................................................................ 1465

tHDFSConnection....................................................................................................1466
tHDFSConnection Standard properties............................................................................................................... 1466
Related scenarios........................................................................................................................................................ 1472

tHDFSCopy...............................................................................................................1473
tHDFSCopy Standard properties............................................................................................................................ 1473
Related scenario.......................................................................................................................................................... 1478

tHDFSDelete............................................................................................................1479
tHDFSDelete Standard properties.........................................................................................................................1479
Related scenarios........................................................................................................................................................ 1483

tHDFSExist...............................................................................................................1484
tHDFSExist Standard properties............................................................................................................................ 1484
Checking the existence of a file in HDFS......................................................................................................... 1489

tHDFSGet................................................................................................................. 1493
tHDFSGet Standard properties............................................................................................................................... 1493
Computing data with Hadoop distributed file system..................................................................................1498

tHDFSInput.............................................................................................................. 1505
tHDFSInput Standard properties........................................................................................................................... 1505
Using HDFS components to work with Azure Data Lake Storage (ADLS)..............................................1511

tHDFSList................................................................................................................. 1517
tHDFSList Standard properties...............................................................................................................................1517
Iterating on a HDFS directory................................................................................................................................ 1523

tHDFSOutput........................................................................................................... 1528
tHDFSOutput Standard properties........................................................................................................................1528
Related scenario.......................................................................................................................................................... 1534

tHDFSOutputRaw....................................................................................................1535
tHDFSOutputRaw Standard properties............................................................................................................... 1535
Related Scenario..........................................................................................................................................................1541

tHDFSProperties..................................................................................................... 1542
tHDFSProperties Standard properties..................................................................................................................1542
Related scenario.......................................................................................................................................................... 1547

tHDFSPut................................................................................................................. 1548
tHDFSPut Standard properties............................................................................................................................... 1548
Related scenario.......................................................................................................................................................... 1553

tHDFSRename......................................................................................................... 1554
tHDFSRename Standard properties......................................................................................................................1554
Related scenario.......................................................................................................................................................... 1559

tHDFSRowCount..................................................................................................... 1560
tHDFSRowCount Standard properties................................................................................................................. 1560
Related scenarios........................................................................................................................................................ 1565

tHiveClose................................................................................................................1566
tHiveClose Standard properties............................................................................................................................. 1566
Related scenarios........................................................................................................................................................ 1567

tHiveConnection..................................................................................................... 1568
tHiveConnection Standard properties................................................................................................................. 1568
Connecting to a custom Hadoop distribution.................................................................................................. 1579
Creating a partitioned Hive table......................................................................................................................... 1582
Creating a JDBC Connection to Azure HDInsight Hive................................................................................. 1589

tHiveCreateTable.................................................................................................... 1596
tHiveCreateTable Standard properties................................................................................................................1596
Related scenario.......................................................................................................................................................... 1608

tHiveInput................................................................................................................1609
tHiveInput Standard properties............................................................................................................................. 1609
Related scenarios........................................................................................................................................................ 1621

tHiveLoad.................................................................................................................1622
tHiveLoad Standard properties.............................................................................................................................. 1622
Related scenario.......................................................................................................................................................... 1633

tHiveRow................................................................................................................. 1634
tHiveRow Standard properties............................................................................................................................... 1634
Connecting to a security-enabled MapR............................................................................................................1646
Related scenarios........................................................................................................................................................ 1649
tHSQLDbInput......................................................................................................... 1650
tHSQLDbInput Standard properties......................................................................................................................1650
Related scenarios........................................................................................................................................................ 1652

tHSQLDbOutput...................................................................................................... 1653
tHSQLDbOutput Standard properties.................................................................................................................. 1653
Related scenarios........................................................................................................................................................ 1657

tHSQLDbRow...........................................................................................................1658
tHSQLDbRow Standard properties....................................................................................................................... 1658
Related scenarios........................................................................................................................................................ 1661

tHttpRequest........................................................................................................... 1662
tHttpRequest Standard properties........................................................................................................................ 1662
Sending a HTTP request to the server and saving the response information to a local file.......... 1664
Sending a POST request from a local JSON file............................................................................................. 1666

tImpalaClose........................................................................................................... 1670
tImpalaClose Standard properties........................................................................................................................ 1670
Related scenarios........................................................................................................................................................ 1671

tImpalaConnection................................................................................................. 1672
tImpalaConnection Standard properties.............................................................................................................1672
Related scenario.......................................................................................................................................................... 1675

tImpalaCreateTable................................................................................................1676
tImpalaCreateTable Standard properties........................................................................................................... 1676
Related scenario.......................................................................................................................................................... 1682

tImpalaInput............................................................................................................1683
tImpalaInput Standard properties.........................................................................................................................1683
Related scenarios........................................................................................................................................................ 1687

tImpalaLoad............................................................................................................ 1688
tImpalaLoad Standard properties..........................................................................................................................1688
Related scenario.......................................................................................................................................................... 1692

tImpalaOutput.........................................................................................................1693
tImpalaOutput Standard properties..................................................................................................................... 1693
Related scenarios........................................................................................................................................................ 1697

tImpalaRow............................................................................................................. 1698
tImpalaRow Standard properties...........................................................................................................................1698
Related scenarios........................................................................................................................................................ 1702
tInfiniteLoop............................................................................................................1704
tInfiniteLoop Standard properties.........................................................................................................................1704
Related scenario.......................................................................................................................................................... 1705

tInformixBulkExec.................................................................................................. 1706
tInformixBulkExec Standard properties..............................................................................................................1706
Related scenario.......................................................................................................................................................... 1710

tInformixClose.........................................................................................................1711
tInformixClose Standard properties..................................................................................................................... 1711
Related scenario.......................................................................................................................................................... 1712

tInformixCommit.................................................................................................... 1713
tInformixCommit Standard properties................................................................................................................ 1713
Related Scenario..........................................................................................................................................................1714

tInformixConnection.............................................................................................. 1715
tInformixConnection Standard properties......................................................................................................... 1715
Related scenario.......................................................................................................................................................... 1716

tInformixInput.........................................................................................................1717
tInformixInput Standard properties..................................................................................................................... 1717
Related scenarios........................................................................................................................................................ 1719

tInformixOutput......................................................................................................1720
tInformixOutput Standard properties.................................................................................................................. 1720
Related scenarios........................................................................................................................................................ 1725

tInformixOutputBulk.............................................................................................. 1726
tInformixOutputBulk Standard properties......................................................................................................... 1726
Related scenario.......................................................................................................................................................... 1728

tInformixOutputBulkExec...................................................................................... 1729
tInformixOutputBulkExec Standard properties................................................................................................ 1729
Related scenario.......................................................................................................................................................... 1732

tInformixRollback................................................................................................... 1733
tInformixRollback Standard properties...............................................................................................................1733
Related Scenario..........................................................................................................................................................1734

tInformixRow.......................................................................................................... 1735
tInformixRow Standard properties....................................................................................................................... 1735
Related scenarios........................................................................................................................................................ 1738

tInformixSCD........................................................................................................... 1739
tInformixSCD Standard properties........................................................................................................................1739
Related scenario.......................................................................................................................................................... 1742

tInformixSP..............................................................................................................1743
tInformixSP Standard properties...........................................................................................................................1743
Related scenarios........................................................................................................................................................ 1745

tIngresBulkExec...................................................................................................... 1747
tIngresBulkExec Standard properties.................................................................................................................. 1747
Related scenarios........................................................................................................................................................ 1750

tIngresClose.............................................................................................................1751
tIngresClose Standard properties..........................................................................................................................1751
Related scenarios........................................................................................................................................................ 1752

tIngresCommit.........................................................................................................1753
tIngresCommit Standard properties..................................................................................................................... 1753
Related scenario.......................................................................................................................................................... 1754

tIngresConnection.................................................................................................. 1755
tIngresConnection Standard properties.............................................................................................................. 1755
Related scenarios........................................................................................................................................................ 1756

tIngresInput.............................................................................................................1757
tIngresInput Standard properties.......................................................................................................................... 1757
Related scenarios........................................................................................................................................................ 1759

tIngresOutput..........................................................................................................1761
tIngresOutput Standard properties.......................................................................................................................1761
Related scenarios........................................................................................................................................................ 1765

tIngresOutputBulk.................................................................................................. 1766
tIngresOutputBulk Standard properties..............................................................................................................1766
Related scenarios........................................................................................................................................................ 1768

tIngresOutputBulkExec.......................................................................................... 1769
tIngresOutputBulkExec Standard properties.....................................................................................................1769
Loading data to a table in the Ingres DBMS................................................................................................... 1772
Related scenarios........................................................................................................................................................ 1774

tIngresRollback....................................................................................................... 1775
tIngresRollback Standard properties................................................................................................................... 1775
Related scenarios........................................................................................................................................................ 1776

tIngresRow.............................................................................................................. 1777
tIngresRow Standard properties............................................................................................................................1777
Related scenarios........................................................................................................................................................ 1780

tIngresSCD............................................................................................................... 1781
tIngresSCD Standard properties............................................................................................................................ 1781
Related scenario.......................................................................................................................................................... 1783

tInterbaseClose....................................................................................................... 1784
tInterbaseClose Standard properties................................................................................................................... 1784
Related scenarios........................................................................................................................................................ 1785

tInterbaseCommit................................................................................................... 1786
tInterbaseCommit Standard properties...............................................................................................................1786
Related scenario.......................................................................................................................................................... 1787

tInterbaseConnection.............................................................................................1788
tInterbaseConnection Standard properties........................................................................................................1788
Related scenarios........................................................................................................................................................ 1789

tInterbaseInput....................................................................................................... 1790
tInterbaseInput Standard properties....................................................................................................................1790
Related scenarios........................................................................................................................................................ 1793

tInterbaseOutput.................................................................................................... 1794
tInterbaseOutput Standard properties................................................................................................................ 1794
Related scenarios........................................................................................................................................................ 1799

tInterbaseRollback..................................................................................................1800
tInterbaseRollback Standard properties............................................................................................................. 1800
Related scenarios........................................................................................................................................................ 1801

tInterbaseRow......................................................................................................... 1802
tInterbaseRow Standard properties......................................................................................................................1802
Related scenarios........................................................................................................................................................ 1805

tIntervalMatch.........................................................................................................1806
tIntervalMatch Standard properties..................................................................................................................... 1806
Identifying server locations based on their IP addresses............................................................................ 1807

tIterateToFlow........................................................................................................ 1811
tIterateToFlow Standard properties..................................................................................................................... 1811
Transforming a list of files as data flow........................................................................................................... 1812

tJasperOutput..........................................................................................................1815
tJasperOutput Standard properties.......................................................................................................................1815
Generating a report against a .jrxml template................................................................................................ 1817
tJasperOutputExec..................................................................................................1820
tJasperOutputExec Standard properties..............................................................................................................1820
Related Scenario..........................................................................................................................................................1821

tJava......................................................................................................................... 1822
tJava Standard properties.........................................................................................................................................1822
Printing out a variable content............................................................................................................................. 1823

tJavaDBInput........................................................................................................... 1827
tJavaDBInput Standard properties........................................................................................................................ 1827
Related scenarios........................................................................................................................................................ 1829

tJavaDBOutput........................................................................................................ 1830
tJavaDBOutput Standard properties..................................................................................................................... 1830
Related scenarios........................................................................................................................................................ 1833

tJavaDBRow.............................................................................................................1834
tJavaDBRow Standard properties.......................................................................................................................... 1834
Related scenarios........................................................................................................................................................ 1836

tJavaFlex.................................................................................................................. 1837
tJavaFlex Standard properties................................................................................................................................ 1837
Generating data flow................................................................................................................................................. 1838
Processing rows of data with tJavaFlex............................................................................................................. 1841

tJavaRow..................................................................................................................1845
tJavaRow Standard properties................................................................................................................................1845
Transforming data line by line using tJavaRow.............................................................................................. 1847

tJDBCClose...............................................................................................................1850
tJDBCClose Standard properties............................................................................................................................ 1850
Related scenarios........................................................................................................................................................ 1851

tJDBCColumnList.................................................................................................... 1852
tJDBCColumnList Standard properties.................................................................................................................1852
Related scenario.......................................................................................................................................................... 1853

tJDBCCommit...........................................................................................................1854
tJDBCCommit Standard properties........................................................................................................................1854
Related scenario.......................................................................................................................................................... 1855

tJDBCConnection.................................................................................................... 1856
tJDBCConnection Standard properties.................................................................................................................1856
Importing a database driver................................................................................................................................... 1858
Related scenario.......................................................................................................................................................... 1860
tJDBCInput............................................................................................................... 1861
tJDBCInput Standard properties.............................................................................................................................1861
Related scenarios........................................................................................................................................................ 1864

tJDBCOutput............................................................................................................ 1865
tJDBCOutput Standard properties......................................................................................................................... 1865
Related scenarios........................................................................................................................................................ 1869

tJDBCRollback......................................................................................................... 1870
tJDBCRollback Standard properties...................................................................................................................... 1870
Related scenario.......................................................................................................................................................... 1871

tJDBCRow.................................................................................................................1872
tJDBCRow Standard properties.............................................................................................................................. 1872
Related scenarios........................................................................................................................................................ 1875

tJDBCSCDELT...........................................................................................................1876
tJDBCSCDELT Standard properties....................................................................................................................... 1876
Tracking data changes in a Snowflake table using the tJDBCSCDELT component............................ 1879

tJDBCSP....................................................................................................................1889
tJDBCSP Standard properties.................................................................................................................................. 1889
Related scenario.......................................................................................................................................................... 1891

tJDBCTableList........................................................................................................ 1893
tJDBCTableList Standard properties.....................................................................................................................1893
Related scenario.......................................................................................................................................................... 1894

tJIRAInput................................................................................................................ 1895
tJIRAInput Standard properties.............................................................................................................................. 1895
Retrieving the project information from JIRA application...........................................................................1896

tJIRAOutput............................................................................................................. 1899
tJIRAOutput Standard properties...........................................................................................................................1899
Creating an issue in JIRA application..................................................................................................................1900
Updating an issue in JIRA application................................................................................................................ 1903

tJMSInput.................................................................................................................1908
tJMSInput Standard properties...............................................................................................................................1908
Related scenarios........................................................................................................................................................ 1910

tJMSOutput..............................................................................................................1911
tJMSOutput Standard properties........................................................................................................................... 1911
Enqueuing/dequeuing a message on the ActiveMQ server.........................................................................1912
Related scenarios........................................................................................................................................................ 1915
tJoin.......................................................................................................................... 1916
tJoin Standard properties......................................................................................................................................... 1916
Doing an exact match on two columns and outputting the main and rejected data........................ 1917

tKafkaCommit......................................................................................................... 1922
tKafkaCommit Standard properties...................................................................................................................... 1922
Related scenarios........................................................................................................................................................ 1922

tKafkaConnection................................................................................................... 1923
tKafkaConnection Standard properties............................................................................................................... 1923
Related scenarios........................................................................................................................................................ 1924
Kafka and AVRO in a Job......................................................................................................................................... 1924

tKafkaCreateTopic.................................................................................................. 1926
tKafkaCreateTopic Standard properties..............................................................................................................1926
Related scenarios........................................................................................................................................................ 1927

tKafkaInput..............................................................................................................1928
tKafkaInput Standard properties........................................................................................................................... 1928
Related scenarios........................................................................................................................................................ 1931

tKafkaOutput...........................................................................................................1932
tKafkaOutput Standard properties........................................................................................................................1932
Related scenarios........................................................................................................................................................ 1934

tLDAPAttributesInput.............................................................................................1935
tLDAPAttributesInput Standard properties........................................................................................................ 1935
Related scenario.......................................................................................................................................................... 1938

tLDAPClose..............................................................................................................1939
tLDAPClose Standard properties........................................................................................................................... 1939
Related scenarios........................................................................................................................................................ 1939

tLDAPConnection....................................................................................................1940
tLDAPConnection Standard properties................................................................................................................1940
Related scenarios........................................................................................................................................................ 1941

tLDAPInput.............................................................................................................. 1942
tLDAPInput Standard properties............................................................................................................................1942
Displaying LDAP directory's filtered content................................................................................................... 1944

tLDAPOutput........................................................................................................... 1947
tLDAPOutput Standard properties........................................................................................................................ 1947
Editing data in a LDAP directory.......................................................................................................................... 1950
tLDAPRenameEntry................................................................................................ 1953
tLDAPRenameEntry Standard properties............................................................................................................1953
Related scenarios........................................................................................................................................................ 1955

tLibraryLoad............................................................................................................ 1956
tLibraryLoad Standard properties......................................................................................................................... 1956
Importing an external library................................................................................................................................. 1957
Checking the format of an e-mail address........................................................................................................1958

tLineChart................................................................................................................ 1961
tLineChart Standard properties..............................................................................................................................1961
Creating a line chart to ease trend analysis.................................................................................................... 1963

tLogCatcher............................................................................................................. 1970
tLogCatcher Standard properties.......................................................................................................................... 1970
Catching messages triggered by a tWarn component.................................................................................. 1971
Catching the message triggered by a tDie component................................................................................ 1973

tLogRow...................................................................................................................1977
tLogRow Standard properties.................................................................................................................................1977
Related scenarios........................................................................................................................................................ 1978

tLoop........................................................................................................................ 1979
tLoop Standard properties.......................................................................................................................................1979
Executing a Job multiple times using a loop...................................................................................................1980

tMap......................................................................................................................... 1983
tMap Standard properties........................................................................................................................................ 1983
Mapping data using a filter and a simple explicit join................................................................................ 1985
Advanced mapping with lookup reload at each row.....................................................................................2003
Mapping with join output tables.......................................................................................................................... 2010

tMapRDBClose........................................................................................................ 2015
tMapRDBClose Standard properties..................................................................................................................... 2015
Related scenario.......................................................................................................................................................... 2016

tMapRDBConnection.............................................................................................. 2017
tMapRDBConnection Standard properties......................................................................................................... 2017
Related scenario.......................................................................................................................................................... 2021

tMapRDBInput.........................................................................................................2022
tMapRDBInput Standard properties..................................................................................................................... 2022
Related scenario.......................................................................................................................................................... 2027

tMapRDBOutput......................................................................................................2028
tMapRDBOutput Standard properties.................................................................................................................. 2028
Related scenario.......................................................................................................................................................... 2032

tMapROjaiInput.......................................................................................................2033
tMapROjaiInput Standard properties................................................................................................................... 2033

tMapROjaiOutput....................................................................................................2036
tMapROjaiOutput Standard properties................................................................................................................2036
Writing candidate data in a MapR-DB OJAI database................................................................................... 2039

tMapRStreamsCommit........................................................................................... 2043
tMapRStreamsCommit Standard properties...................................................................................................... 2043
Related scenarios........................................................................................................................................................ 2043

tMapRStreamsConnection..................................................................................... 2044
tMapRStreamsConnection Standard properties............................................................................................... 2044
Related scenarios........................................................................................................................................................ 2046

tMapRStreamsCreateStream................................................................................. 2047
tMapRStreamsCreateStream Standard properties...........................................................................................2047
Related scenarios........................................................................................................................................................ 2049

tMapRStreamsInput................................................................................................2050
tMapRStreamsInput Standard properties........................................................................................................... 2050
Related scenarios........................................................................................................................................................ 2054

tMapRStreamsOutput.............................................................................................2055
tMapRStreamsOutput Standard properties........................................................................................................2055
Related scenarios........................................................................................................................................................ 2057

tMarketoBulkExec...................................................................................................2058
tMarketoBulkExec Standard properties.............................................................................................................. 2058
Related scenario.......................................................................................................................................................... 2060

tMarketoConnection...............................................................................................2061
tMarketoConnection Standard properties.......................................................................................................... 2061
Related scenario.......................................................................................................................................................... 2062

tMarketoCampaign................................................................................................. 2063
tMarketoCampaign Standard properties.............................................................................................................2063

tMarketoInput......................................................................................................... 2067
tMarketoInput Standard properties...................................................................................................................... 2067
Related Scenario..........................................................................................................................................................2072

tMarketoListOperation...........................................................................................2073
tMarketoListOperation Standard properties......................................................................................................2073
Adding a lead record to a Marketo list using SOAP API.............................................................................. 2075

tMarketoOutput...................................................................................................... 2078
tMarketoOutput Standard properties...................................................................................................................2078
Transmitting data with Marketo using REST API........................................................................................... 2081

tMarkLogicBulkLoad...............................................................................................2087
tMarkLogicBulkLoad Standard properties..........................................................................................................2087
Related scenario.......................................................................................................................................................... 2089

tMarkLogicClose..................................................................................................... 2090
tMarkLogicClose Standard properties..................................................................................................................2090
Related scenario.......................................................................................................................................................... 2091

tMarkLogicConnection........................................................................................... 2092
tMarkLogicConnection Standard properties......................................................................................................2092
Related scenario.......................................................................................................................................................... 2093

tMarkLogicInput......................................................................................................2094
tMarkLogicInput Standard properties..................................................................................................................2094
Related scenario.......................................................................................................................................................... 2096

tMarkLogicOutput...................................................................................................2097
tMarkLogicOutput Standard properties.............................................................................................................. 2097
Related scenario.......................................................................................................................................................... 2099

tMaxDBInput........................................................................................................... 2100
tMaxDBInput Standard properties........................................................................................................................ 2100
Related scenario.......................................................................................................................................................... 2102

tMaxDBOutput........................................................................................................ 2103
tMaxDBOutput Standard properties.....................................................................................................................2103
Related scenario.......................................................................................................................................................... 2106

tMaxDBRow.............................................................................................................2107
tMaxDBRow Standard properties.......................................................................................................................... 2107
Related scenario.......................................................................................................................................................... 2109

tMDMBulkLoad....................................................................................................... 2110
tMDMBulkLoad Standard properties....................................................................................................................2110
Loading records into a business entity.............................................................................................................. 2113

tMDMClose.............................................................................................................. 2118
tMDMClose Standard properties............................................................................................................................2118
Related scenario.......................................................................................................................................................... 2119
tMDMCommit.......................................................................................................... 2120
tMDMCommit Standard properties.......................................................................................................................2120
Related scenario.......................................................................................................................................................... 2121

tMDMConnection.................................................................................................... 2122
tMDMConnection Standard properties................................................................................................................2122
Related scenario.......................................................................................................................................................... 2123

tMDMDelete............................................................................................................ 2124
tMDMDelete Standard properties......................................................................................................................... 2124
Deleting master data from an MDM Hub.......................................................................................................... 2128

tMDMInput.............................................................................................................. 2135
tMDMInput Standard properties............................................................................................................................2135
Reading master data from an MDM hub........................................................................................................... 2139

tMDMOutput............................................................................................................2142
tMDMOutput Standard properties.........................................................................................................................2142
Examples of partial update operations using tMDMOutput....................................................................... 2147
Writing master data in an MDM hub...................................................................................................................2153
Removing master data partially from the MDM hub.....................................................................................2158

tMDMReceive.......................................................................................................... 2165
tMDMReceive Standard properties....................................................................................................................... 2165
Extracting information from an MDM record in XML................................................................................... 2167

tMDMRollback.........................................................................................................2171
tMDMRollback Standard properties..................................................................................................................... 2171
Related scenario.......................................................................................................................................................... 2172

tMDMRouteRecord................................................................................................. 2173
tMDMRouteRecord Standard properties............................................................................................................. 2173
Routing an update report record to Event Manager..................................................................................... 2175

tMDMSP................................................................................................................... 2179
tMDMSP Standard properties................................................................................................................................. 2179
Executing a stored procedure using tMDMSP..................................................................................................2180

tMDMTriggerInput..................................................................................................2186
tMDMTriggerInput Standard properties..............................................................................................................2186
Exchanging the event information about an MDM record..........................................................................2188

tMDMTriggerOutput...............................................................................................2197
tMDMTriggerOutput Standard properties.......................................................................................................... 2197
Related scenario.......................................................................................................................................................... 2198
tMDMViewSearch................................................................................................... 2199
tMDMViewSearch Standard properties............................................................................................................... 2199
Retrieving records from an MDM hub via an existing view....................................................................... 2203

tMemorizeRows...................................................................................................... 2206
tMemorizeRows Standard properties...................................................................................................................2206
Retrieving the different ages and lowest age data....................................................................................... 2207

tMicrosoftCrmInput................................................................................................ 2213
tMicrosoftCrmInput Standard properties............................................................................................................2213
Writing data in a Microsoft CRM database and putting conditions on columns to extract specified
rows...................................................................................................................................................................................2217

tMicrosoftCrmOutput............................................................................................. 2223
tMicrosoftCrmOutput Standard properties........................................................................................................ 2223
Related Scenario..........................................................................................................................................................2226

tMicrosoftMQInput................................................................................................. 2227
tMicrosoftMQInput Standard properties.............................................................................................................2227
Writing and fetching queuing messages from Microsoft message queue............................................. 2228

tMicrosoftMQOutput.............................................................................................. 2233
tMicrosoftMQOutput Standard properties..........................................................................................................2233
Related scenario.......................................................................................................................................................... 2234

tMomCommit...........................................................................................................2235
tMomCommit Standard properties....................................................................................................................... 2235
Related scenario.......................................................................................................................................................... 2236

tMomConnection.................................................................................................... 2237
tMomConnection Standard properties................................................................................................................ 2237
Related scenario.......................................................................................................................................................... 2239

tMomInput...............................................................................................................2240
tMomInput Standard properties............................................................................................................................ 2240
Asynchronous communication via a MOM server...........................................................................................2246
Transmitting XML files via a MOM server.........................................................................................................2249

tMomMessageIdList............................................................................................... 2255
tMomMessageIdList Standard properties...........................................................................................................2255
Related scenario.......................................................................................................................................................... 2256

tMomOutput............................................................................................................ 2257
tMomOutput Standard properties......................................................................................................................... 2257
Related scenario.......................................................................................................................................................... 2262
tMomRollback......................................................................................................... 2263
tMomRollback Standard properties......................................................................................................................2263
Related scenario.......................................................................................................................................................... 2264

tMondrianInput....................................................................................................... 2265
tMondrianInput Standard properties................................................................................................................... 2265
Extracting multi-dimenstional datasets from a MySQL database (Cross-join tables)........................2267

tMongoDBBulkLoad............................................................................................... 2270
tMongoDBBulkLoad Standard properties...........................................................................................................2270
Importing data into MongoDB database............................................................................................................2273

tMongoDBClose...................................................................................................... 2281
tMongoDBClose Standard properties...................................................................................................................2281
Related scenario.......................................................................................................................................................... 2281

tMongoDBConnection............................................................................................ 2282
tMongoDBConnection Standard properties.......................................................................................................2282
Related scenario.......................................................................................................................................................... 2284

tMongoDBGridFSDelete.........................................................................................2285
tMongoDBGridFSDelete Standard properties................................................................................................... 2285
Related scenario.......................................................................................................................................................... 2287

tMongoDBGridFSGet.............................................................................................. 2288
tMongoDBGridFSGet Standard properties..........................................................................................................2288
Related scenario.......................................................................................................................................................... 2291

tMongoDBGridFSList.............................................................................................. 2292
tMongoDBGridFSList Standard properties......................................................................................................... 2292
Related scenario.......................................................................................................................................................... 2295

tMongoDBGridFSProperties.................................................................................. 2296
tMongoDBGridFSProperties Standard properties............................................................................................ 2296
Related scenario.......................................................................................................................................................... 2299

tMongoDBGridFSPut.............................................................................................. 2300
tMongoDBGridFSPut Standard properties..........................................................................................................2300
Managing files using MongoDB GridFS..............................................................................................................2302

tMongoDBInput.......................................................................................................2311
tMongoDBInput Standard properties...................................................................................................................2311
Retrieving data from a collection by advanced queries...............................................................................2315
Related scenarios........................................................................................................................................................ 2318
tMongoDBOutput....................................................................................................2319
tMongoDBOutput Standard properties................................................................................................................2319
Creating a collection and writing data to it.....................................................................................................2323
Upserting records in a collection..........................................................................................................................2328

tMongoDBRow........................................................................................................ 2336
tMongoDBRow Standard properties.....................................................................................................................2336
Using MongoDB functions to create a collection and write data to it................................................... 2339

tMsgBox................................................................................................................... 2345
tMsgBox Standard properties................................................................................................................................. 2345
'Hello world!' type test............................................................................................................................................. 2346

tMSSqlBulkExec...................................................................................................... 2348
tMSSqlBulkExec Standard properties.................................................................................................................. 2348
Related scenarios........................................................................................................................................................ 2352

tMSSqlClose............................................................................................................ 2353
tMSSqlClose Standard properties..........................................................................................................................2353
Related scenarios........................................................................................................................................................ 2354

tMSSqlColumnList.................................................................................................. 2355
tMSSqlColumnList Standard properties..............................................................................................................2355
Related scenario.......................................................................................................................................................... 2357

tMSSqlCommit........................................................................................................ 2358
tMSSqlCommit Standard properties.....................................................................................................................2358
Related scenarios........................................................................................................................................................ 2359

tMSSqlConnection.................................................................................................. 2360
tMSSqlConnection Standard properties..............................................................................................................2360
Inserting data into a database table and extracting useful information from it.................................2362

tMSSqlInput.............................................................................................................2368
tMSSqlInput Standard properties..........................................................................................................................2368
Related scenarios........................................................................................................................................................ 2371

tMSSqlLastInsertId................................................................................................. 2372
tMSSqlLastInsertId Standard properties.............................................................................................................2372
Related scenario.......................................................................................................................................................... 2374

tMSSqlOutput..........................................................................................................2375
tMSSqlOutput Standard properties.......................................................................................................................2375
Related scenarios........................................................................................................................................................ 2381
tMSSqlOutputBulk.................................................................................................. 2382
tMSSqlOutputBulk Standard properties..............................................................................................................2382
Related scenarios........................................................................................................................................................ 2384

tMSSqlOutputBulkExec..........................................................................................2385
tMSSqlOutputBulkExec Standard properties.................................................................................................... 2385
Related scenarios........................................................................................................................................................ 2389

tMSSqlRollback.......................................................................................................2390
tMSSqlRollback Standard properties................................................................................................................... 2390
Related scenario.......................................................................................................................................................... 2391

tMSSqlRow.............................................................................................................. 2392
tMSSqlRow Standard properties............................................................................................................................2392
Related scenarios........................................................................................................................................................ 2396

tMSSqlSCD...............................................................................................................2397
tMSSqlSCD Standard properties............................................................................................................................ 2397
Related scenario.......................................................................................................................................................... 2400

tMSSqlSP................................................................................................................. 2401
tMSSqlSP Standard properties............................................................................................................................... 2401
Retrieving personal information using a stored procedure........................................................................ 2404
Related scenarios........................................................................................................................................................ 2409

tMSSqlTableList......................................................................................................2410
tMSSqlTableList Standard properties.................................................................................................................. 2410
Related scenario.......................................................................................................................................................... 2411

tMysqlBulkExec.......................................................................................................2412
tMysqlBulkExec Standard Properties................................................................................................................... 2412
Related scenarios........................................................................................................................................................ 2415

tMysqlClose............................................................................................................. 2416
tMysqlClose Standard properties.......................................................................................................................... 2416
Related scenario.......................................................................................................................................................... 2417

tMysqlColumnList...................................................................................................2418
tMysqlColumnList Standard properties...............................................................................................................2418
Iterating on a DB table and listing its column names................................................................................. 2419

tMysqlCommit......................................................................................................... 2423
tMysqlCommit Standard properties......................................................................................................................2423
Related scenario.......................................................................................................................................................... 2424
tMysqlConnection...................................................................................................2425
tMysqlConnection Standard properties...............................................................................................................2425
Inserting data in mother/daughter tables......................................................................................................... 2426
Sharing a database connection between a parent Job and child Job......................................................2430

tMysqlInput............................................................................................................. 2437
tMysqlInput Standard properties...........................................................................................................................2437
Writing columns from a MySQL database to an output file using tMysqlInput...................................2440
Using context parameters when reading a table from a database.......................................................... 2443
Reading data from databases through context-based dynamic connections....................................... 2446

tMysqlLastInsertId.................................................................................................. 2453
tMysqlLastInsertId Standard properties..............................................................................................................2453
Getting the ID for the last inserted record with tMysqlLastInsertId........................................................2455

tMysqlLookupInput................................................................................................ 2459

tMysqlOutput.......................................................................................................... 2460
tMysqlOutput Standard properties....................................................................................................................... 2460
Inserting a column and altering data using tMysqlOutput......................................................................... 2466
Updating data using tMysqlOutput...................................................................................................................... 2471
Retrieving data in error with a Reject link....................................................................................................... 2474

tMysqlOutputBulk...................................................................................................2480
tMysqlOutputBulk Standard properties...............................................................................................................2480
Inserting transformed data in MySQL database..............................................................................................2482

tMysqlOutputBulkExec...........................................................................................2486
tMysqlOutputBulkExec Standard properties..................................................................................................... 2486
Inserting data in bulk in MySQL database........................................................................................................2489

tMysqlRollback........................................................................................................2491
tMysqlRollback Standard properties.................................................................................................................... 2491

tMysqlRow............................................................................................................... 2493
tMysqlRow Standard properties.............................................................................................................................2493
Removing and regenerating a MySQL table index........................................................................................ 2497
Using PreparedStatement objects to query data............................................................................................ 2498
Combining two flows for selective output........................................................................................................2503

tMysqlSCD............................................................................................................... 2508
tMysqlSCD Standard properties............................................................................................................................. 2508
SCD management methodology............................................................................................................................2511
Tracking data changes using Slowly Changing Dimensions (type 0 through type 3)........................ 2514
tMysqlSCDELT......................................................................................................... 2522
tMysqlSCDELT Standard properties......................................................................................................................2522
Related Scenarios........................................................................................................................................................2525

tMysqlSP.................................................................................................................. 2526
tMysqlSP Standard properties................................................................................................................................ 2526
Using tMysqlSP to find a State Label using a stored procedure...............................................................2528
Related scenarios........................................................................................................................................................ 2531

tMysqlTableList...................................................................................................... 2532
tMysqlTableList Standard properties...................................................................................................................2532
Iterating on DB tables and deleting their content using a user-defined SQL template................... 2533
Related scenario.......................................................................................................................................................... 2537

tNamedPipeClose................................................................................................... 2538
tNamedPipeClose Standard properties............................................................................................................... 2538
Related scenario.......................................................................................................................................................... 2539

tNamedPipeOpen....................................................................................................2540
tNamedPipeOpen Standard properties............................................................................................................... 2540
Related scenario.......................................................................................................................................................... 2541

tNamedPipeOutput.................................................................................................2542
tNamedPipeOutput Standard properties............................................................................................................ 2542

tNeo4jBatchOutput.................................................................................................2545
tNeo4jBatchOutput Standard properties............................................................................................................ 2545

tNeo4jBatchOutputRelationship...........................................................................2548
tNeo4jBatchOutputRelationship Standard properties................................................................................... 2548
Writing information of actors and movies to Neo4j with hierarchical relationship using Neo4j
Batch components...................................................................................................................................................... 2550

tNeo4jBatchSchema............................................................................................... 2560
tNeo4jBatchSchema Standard properties.......................................................................................................... 2560

tNeo4jClose............................................................................................................. 2562
tNeo4jClose Standard properties.......................................................................................................................... 2562
Related scenarios........................................................................................................................................................ 2562

tNeo4jConnection...................................................................................................2564
tNeo4jConnection Standard properties...............................................................................................................2564
Related scenarios........................................................................................................................................................ 2565

tNeo4jImportTool................................................................................................... 2567
tNeo4jImportTool Standard properties...............................................................................................................2567

tNeo4jInput............................................................................................................. 2569
tNeo4jInput Standard properties...........................................................................................................................2569
Related scenarios........................................................................................................................................................ 2571

tNeo4jOutput.......................................................................................................... 2572
tNeo4jOutput Standard properties....................................................................................................................... 2572
Writing data to a Neo4j database and reading specific data from it...................................................... 2576
Writing family information to Neo4j and creating relationships.............................................................. 2580

tNeo4jOutputRelationship.................................................................................... 2586
tNeo4jOutputRelationship Standard properties.............................................................................................. 2586
Writing information of actors and movies to Neo4j with hierarchical relationship........................... 2589

tNeo4jRow............................................................................................................... 2599
tNeo4jRow Standard properties............................................................................................................................ 2599
Creating nodes with a label using a Cypher query........................................................................................2602
Importing data from a CSV file to Neo4j using a Cypher query................................................................2606
Importing data from a CSV file to Neo4j and creating relationships using a single Cypher query.. 2612

tNetezzaBulkExec................................................................................................... 2616
tNetezzaBulkExec Standard properties...............................................................................................................2616
Related scenarios........................................................................................................................................................ 2619

tNetezzaClose......................................................................................................... 2620
tNetezzaClose Standard properties...................................................................................................................... 2620
Related scenarios........................................................................................................................................................ 2621

tNetezzaCommit..................................................................................................... 2622
tNetezzaCommit Standard properties................................................................................................................. 2622
Related scenario.......................................................................................................................................................... 2623

tNetezzaConnection............................................................................................... 2624
tNetezzaConnection Standard properties.......................................................................................................... 2624
Related scenarios........................................................................................................................................................ 2625

tNetezzaInput..........................................................................................................2626
tNetezzaInput Standard properties...................................................................................................................... 2626
Related scenarios........................................................................................................................................................ 2629

tNetezzaNzLoad......................................................................................................2630
tNetezzaNzLoad Standard properties.................................................................................................................. 2630
Related scenario.......................................................................................................................................................... 2636

tNetezzaOutput.......................................................................................................2637
tNetezzaOutput Standard properties................................................................................................................... 2637
Related scenarios........................................................................................................................................................ 2642

tNetezzaRollback....................................................................................................2643
tNetezzaRollback Standard properties................................................................................................................2643
Related scenarios........................................................................................................................................................ 2644

tNetezzaRow........................................................................................................... 2645
tNetezzaRow Standard properties........................................................................................................................ 2645
Related scenarios........................................................................................................................................................ 2648

tNetezzaSCD............................................................................................................2649
tNetezzaSCD Standard properties.........................................................................................................................2649
Related scenario.......................................................................................................................................................... 2652

tNetsuiteConnection...............................................................................................2653
tNetsuiteConnection Standard properties..........................................................................................................2653
Related scenario.......................................................................................................................................................... 2654

tNetsuiteInput......................................................................................................... 2655
tNetsuiteInput Standard properties......................................................................................................................2655
Handling data with NetSuite..................................................................................................................................2657

tNetsuiteOutput...................................................................................................... 2663
tNetsuiteOutput Standard properties.................................................................................................................. 2663
Related scenario.......................................................................................................................................................... 2666

tNormalize............................................................................................................... 2667
tNormalize Standard properties.............................................................................................................................2667
Normalizing data......................................................................................................................................................... 2669

tOpenbravoERPInput..............................................................................................2672
tOpenbravoERPInput Standard properties.........................................................................................................2672
Related Scenario..........................................................................................................................................................2673

tOpenbravoERPOutput...........................................................................................2674
tOpenbravoERPOutput Standard properties..................................................................................................... 2674
Related scenario.......................................................................................................................................................... 2675

tOracleBulkExec......................................................................................................2676
tOracleBulkExec Standard properties..................................................................................................................2676
Truncating and inserting file data into an Oracle database.......................................................................2681

tOracleClose............................................................................................................ 2684
tOracleClose Standard properties......................................................................................................................... 2684
Related scenarios........................................................................................................................................................ 2685
tOracleCommit........................................................................................................ 2686
tOracleCommit Standard properties.................................................................................................................... 2686
Related scenario.......................................................................................................................................................... 2687

tOracleConnection.................................................................................................. 2688
tOracleConnection Standard properties............................................................................................................. 2688
Related scenario.......................................................................................................................................................... 2691

tOracleInput............................................................................................................ 2692
tOracleInput Standard properties......................................................................................................................... 2692
Using context parameters when reading a table from an Oracle database..........................................2695

tOracleOutput......................................................................................................... 2699
tOracleOutput Standard properties...................................................................................................................... 2699
Related scenarios........................................................................................................................................................ 2705

tOracleOutputBulk..................................................................................................2706
tOracleOutputBulk Standard properties............................................................................................................. 2706
Related scenarios........................................................................................................................................................ 2708

tOracleOutputBulkExec..........................................................................................2709
tOracleOutputBulkExec Standard properties.................................................................................................... 2709
Related scenarios........................................................................................................................................................ 2714

tOracleRollback.......................................................................................................2715
tOracleRollback Standard properties...................................................................................................................2715
Related scenario.......................................................................................................................................................... 2716

tOracleRow.............................................................................................................. 2717
tOracleRow Standard properties........................................................................................................................... 2717
Related scenarios........................................................................................................................................................ 2721

tOracleSCD...............................................................................................................2722
tOracleSCD Standard properties............................................................................................................................2722
Related scenario.......................................................................................................................................................... 2725

tOracleSCDELT........................................................................................................ 2726
tOracleSCDELT Standard properties.................................................................................................................... 2726
Related Scenarios........................................................................................................................................................2730

tOracleSP................................................................................................................. 2731
tOracleSP Standard properties...............................................................................................................................2731
Checking number format using a stored procedure...................................................................................... 2735
Related scenarios........................................................................................................................................................ 2738
tOracleTableList......................................................................................................2739
tOracleTableList Standard properties..................................................................................................................2739
Related scenarios........................................................................................................................................................ 2740

tPaloCheckElements...............................................................................................2741
tPaloCheckElements Standard properties..........................................................................................................2741
Related scenario.......................................................................................................................................................... 2743

tPaloClose................................................................................................................2744
tPaloClose Standard properties............................................................................................................................. 2744
Related scenarios........................................................................................................................................................ 2745

tPaloConnection..................................................................................................... 2746
tPaloConnection Standard properties..................................................................................................................2746
Related scenario.......................................................................................................................................................... 2747

tPaloCube................................................................................................................ 2748
tPaloCube Standard properties.............................................................................................................................. 2748
Creating a cube in an existing database........................................................................................................... 2750

tPaloCubeList.......................................................................................................... 2752
Discovering the read-only output schema of tPaloCubeList...................................................................... 2752
tPaloCubeList Standard properties.......................................................................................................................2752
Retrieving detailed cube information from a given database................................................................... 2754

tPaloDatabase......................................................................................................... 2756
tPaloDatabase Standard properties......................................................................................................................2756
Creating a database................................................................................................................................................... 2757

tPaloDatabaseList...................................................................................................2759
Discovering the read-only output schema of tPaloDatabaseList..............................................................2759
tPaloDatabaseList Standard properties...............................................................................................................2759
Retrieving detailed database information from a given Palo server.......................................................2761

tPaloDimension.......................................................................................................2763
tPaloDimension Standard properties...................................................................................................................2763
Creating a dimension with elements.................................................................................................................. 2766

tPaloDimensionList................................................................................................ 2771
Discovering the read-only output schema of tPaloDimensionList........................................................... 2771
tPaloDimensionList Standard properties............................................................................................................2771
Retrieving detailed dimension information from a given database........................................................ 2773

tPaloInputMulti.......................................................................................................2776
tPaloInputMulti Standard properties................................................................................................................... 2776
Retrieving dimension elements from a given cube....................................................................................... 2778

tPaloOutput............................................................................................................. 2782
tPaloOutput Standard properties.......................................................................................................................... 2782
Related scenario.......................................................................................................................................................... 2784

tPaloOutputMulti....................................................................................................2785
tPaloOutputMulti Standard properties................................................................................................................2785
Writing data into a given cube..............................................................................................................................2787
Rejecting inflow data when the elements to be written do not exist in a given cube..................... 2790

tPaloRule................................................................................................................. 2795
tPaloRule Standard properties............................................................................................................................... 2795
Creating a rule in a given cube............................................................................................................................ 2796

tPaloRuleList...........................................................................................................2799
Discovering the read-only output schema of tPaloRuleList....................................................................... 2799
tPaloRuleList Standard properties........................................................................................................................2799
Retrieving detailed rule information from a given cube............................................................................. 2801

tParAccelBulkExec.................................................................................................. 2803
tParAccelBulkExec Standard properties..............................................................................................................2803
Related scenarios........................................................................................................................................................ 2806

tParAccelClose........................................................................................................ 2807
tParAccelClose Standard properties.....................................................................................................................2807
Related scenarios........................................................................................................................................................ 2808

tParAccelCommit.................................................................................................... 2809
tParAccelCommit Standard properties................................................................................................................ 2809
Related scenario.......................................................................................................................................................... 2810

tParAccelConnection.............................................................................................. 2811
tParAccelConnection Standard properties......................................................................................................... 2811
Related scenario.......................................................................................................................................................... 2812

tParAccelInput.........................................................................................................2813
tParAccelInput Standard properties..................................................................................................................... 2813
Related scenarios........................................................................................................................................................ 2816

tParAccelOutput......................................................................................................2817
tParAccelOutput Standard properties..................................................................................................................2817
Related scenarios........................................................................................................................................................ 2822

tParAccelOutputBulk..............................................................................................2823
tParAccelOutputBulk Standard properties......................................................................................................... 2823
Related scenarios........................................................................................................................................................ 2825

tParAccelOutputBulkExec......................................................................................2826
tParAccelOutputBulkExec Standard properties................................................................................................2826
Related scenarios........................................................................................................................................................ 2829

tParAccelRollback...................................................................................................2830
tParAccelRollback Standard properties...............................................................................................................2830
Related scenario.......................................................................................................................................................... 2831

tParAccelRow.......................................................................................................... 2832
tParAccelRow Standard properties....................................................................................................................... 2832
Related scenarios........................................................................................................................................................ 2835

tParAccelSCD...........................................................................................................2836
tParAccelSCD Standard properties........................................................................................................................2836
Related scenario.......................................................................................................................................................... 2839

tParseRecordSet......................................................................................................2840
tParseRecordSet Standard properties..................................................................................................................2840
Related Scenario..........................................................................................................................................................2841

tPatternUnmasking.................................................................................................2842
tPatternUnmasking Standard properties............................................................................................................ 2842
Unmasking Australian phone numbers...............................................................................................................2845
tPatternUnmasking properties for Apache Spark Batch............................................................................... 2849
tPatternUnmasking properties for Apache Spark Streaming...................................................................... 2853

tPivotToColumnsDelimited................................................................................... 2857
tPivotToColumnsDelimited Standard properties.............................................................................................2857
Using a pivot column to aggregate data...........................................................................................................2858

tPOP......................................................................................................................... 2861
tPOP Standard properties........................................................................................................................................ 2861
Retrieving a selection of email messages from an email server.............................................................. 2863

tPostgresPlusBulkExec.......................................................................................... 2865
tPostgresPlusBulkExec Standard properties..................................................................................................... 2865
Related scenarios........................................................................................................................................................ 2868

tPostgresPlusClose.................................................................................................2869
tPostgresPlusClose Standard properties.............................................................................................................2869
Related scenarios........................................................................................................................................................ 2870

tPostgresPlusCommit.............................................................................................2871
tPostgresPlusCommit Standard properties........................................................................................................2871
Related scenario.......................................................................................................................................................... 2872

tPostgresPlusConnection.......................................................................................2873
tPostgresPlusConnection Standard properties.................................................................................................2873
Related scenario.......................................................................................................................................................... 2874

tPostgresPlusInput................................................................................................. 2875
tPostgresPlusInput Standard properties.............................................................................................................2875
Related scenarios........................................................................................................................................................ 2878

tPostgresPlusOutput.............................................................................................. 2879
tPostgresPlusOutput Standard properties..........................................................................................................2879
Related scenarios........................................................................................................................................................ 2884

tPostgresPlusOutputBulk...................................................................................... 2885
tPostgresPlusOutputBulk Standard properties.................................................................................................2885
Related scenarios........................................................................................................................................................ 2887

tPostgresPlusOutputBulkExec.............................................................................. 2888
tPostgresPlusOutputBulkExec Standard properties....................................................................................... 2888
Related scenarios........................................................................................................................................................ 2890

tPostgresPlusRollback........................................................................................... 2891
tPostgresPlusRollback Standard properties...................................................................................................... 2891
Related scenarios........................................................................................................................................................ 2892

tPostgresPlusRow...................................................................................................2893
tPostgresPlusRow Standard properties...............................................................................................................2893
Related scenarios........................................................................................................................................................ 2896

tPostgresPlusSCD................................................................................................... 2897
tPostgresPlusSCD Standard properties............................................................................................................... 2897
Related scenario.......................................................................................................................................................... 2900

tPostgresPlusSCDELT.............................................................................................2901
tPostgresPlusSCDELT Standard properties........................................................................................................2901
Related Scenarios........................................................................................................................................................2905

tPostgresqlBulkExec...............................................................................................2906
tPostgresqlBulkExec Standard properties..........................................................................................................2906
Related scenarios........................................................................................................................................................ 2909

tPostgresqlClose..................................................................................................... 2910
tPostgresqlClose Standard properties................................................................................................................. 2910
Related scenarios........................................................................................................................................................ 2911
tPostgresqlCommit.................................................................................................2912
tPostgresqlCommit Standard properties............................................................................................................ 2912
Related scenario.......................................................................................................................................................... 2913

tPostgresqlConnection...........................................................................................2914
tPostgresqlConnection Standard properties..................................................................................................... 2914
Related scenario.......................................................................................................................................................... 2915

tPostgresqlInput..................................................................................................... 2916
tPostgresqlInput Standard properties................................................................................................................. 2916
Related scenarios........................................................................................................................................................ 2919

tPostgresqlOutput.................................................................................................. 2920
tPostgresqlOutput Standard properties.............................................................................................................. 2920
Related scenarios........................................................................................................................................................ 2926

tPostgresqlOutputBulk.......................................................................................... 2927
tPostgresqlOutputBulk Standard properties..................................................................................................... 2927
Related scenarios........................................................................................................................................................ 2929

tPostgresqlOutputBulkExec.................................................................................. 2930
tPostgresqlOutputBulkExec Standard properties............................................................................................ 2930
Related scenarios........................................................................................................................................................ 2933

tPostgresqlRollback............................................................................................... 2934
tPostgresqlRollback Standard properties...........................................................................................................2934
Related scenario.......................................................................................................................................................... 2935

tPostgresqlRow.......................................................................................................2936
tPostgresqlRow Standard properties................................................................................................................... 2936
Related scenarios........................................................................................................................................................ 2939

tPostgresqlSCD....................................................................................................... 2940
tPostgresqlSCD Standard properties....................................................................................................................2940
Related scenario.......................................................................................................................................................... 2943

tPostgresqlSCDELT.................................................................................................2944
tPostgresqlSCDELT Standard properties............................................................................................................ 2944
Tracking data changes in a PostgreSQL table using the tPostgreSQLSCDELT component............. 2948
Related Scenario..........................................................................................................................................................2957

tPostjob....................................................................................................................2958
tPostjob Standard properties.................................................................................................................................. 2958
Related scenarios........................................................................................................................................................ 2958
tPrejob......................................................................................................................2959
tPrejob Standard properties.................................................................................................................................... 2959
Handling files before and after the execution of a data Job..................................................................... 2959
Related scenario.......................................................................................................................................................... 2962

tPubSubOutput....................................................................................................... 2963

tRedshiftBulkExec.................................................................................................. 2964
tRedshiftBulkExec Standard properties.............................................................................................................. 2964
Loading/unloading data to/from Amazon S3................................................................................................... 2970

tRedshiftClose.........................................................................................................2980
tRedshiftClose Standard properties......................................................................................................................2980
Related scenario.......................................................................................................................................................... 2981

tRedshiftCommit.....................................................................................................2982
tRedshiftCommit Standard properties.................................................................................................................2982
Related scenario.......................................................................................................................................................... 2983

tRedshiftConnection...............................................................................................2984
tRedshiftConnection Standard properties..........................................................................................................2984
Related scenario.......................................................................................................................................................... 2986

tRedshiftInput......................................................................................................... 2987
tRedshiftInput Standard properties......................................................................................................................2987
Handling data with Redshift...................................................................................................................................2991

tRedshiftOutput...................................................................................................... 2996
tRedshiftOutput Standard properties...................................................................................................................2996
Related scenarios........................................................................................................................................................ 3001

tRedshiftOutputBulk.............................................................................................. 3002
tRedshiftOutputBulk Standard properties..........................................................................................................3002
Related scenario.......................................................................................................................................................... 3006

tRedshiftOutputBulkExec...................................................................................... 3007
tRedshiftOutputBulkExec Standard properties................................................................................................ 3007
Related scenario.......................................................................................................................................................... 3013

tRedshiftRollback................................................................................................... 3014
tRedshiftRollback Standard properties............................................................................................................... 3014
Related scenario.......................................................................................................................................................... 3015

tRedshiftRow...........................................................................................................3016
tRedshiftRow Standard properties........................................................................................................................3016
Related scenarios........................................................................................................................................................ 3020

tRedshiftUnload...................................................................................................... 3021
tRedshiftUnload Standard properties.................................................................................................................. 3021
Related Scenario..........................................................................................................................................................3025

tReplace................................................................................................................... 3026
tReplace Standard properties................................................................................................................................. 3026
Cleaning up and filtering a CSV file....................................................................................................................3027

tReplaceList.............................................................................................................3031
tReplaceList Standard properties..........................................................................................................................3031
Replacing state names with their two-letter codes.......................................................................................3032

tReplicate.................................................................................................................3036
tReplicate Standard properties.............................................................................................................................. 3036
Replicating a flow and sorting two identical flows respectively..............................................................3037

tREST........................................................................................................................3041
tREST Standard properties.......................................................................................................................................3041
Creating and retrieving data by invoking REST Web service..................................................................... 3042

tRESTClient............................................................................................................. 3045
tRESTClient Standard properties...........................................................................................................................3045
Getting user information by interacting with a RESTful service...............................................................3050
Updating user information by interacting with a RESTful service........................................................... 3056

tRESTRequest..........................................................................................................3063
tRESTRequest Standard properties.......................................................................................................................3063
Using a REST service to accept HTTP GET requests and send responses............................................. 3066
Using URI Query parameters to explore the data of a database.............................................................. 3072
Using a REST service to accept HTTP POST requests.................................................................................. 3080
Using a REST service to accept HTTP POST requests and send responses...........................................3085
Using a REST service to accept HTTP POST requests in an HTML form................................................3093

tRESTResponse....................................................................................................... 3100
tRESTResponse Standard properties....................................................................................................................3100
Related scenario.......................................................................................................................................................... 3101

tRiakBucketList....................................................................................................... 3102
tRiakBucketList Standard properties....................................................................................................................3102
Related scenarios........................................................................................................................................................ 3103

tRiakClose................................................................................................................3104
tRiakClose Standard properties............................................................................................................................. 3104
Related Scenario..........................................................................................................................................................3104
tRiakConnection......................................................................................................3105
tRiakConnection Standard properties..................................................................................................................3105
Related scenario.......................................................................................................................................................... 3106

tRiakInput................................................................................................................ 3107
tRiakInput Standard properties..............................................................................................................................3107
Exporting data from a Riak bucket to a local file..........................................................................................3108

tRiakKeyList............................................................................................................ 3113
tRiakKeyList Standard properties..........................................................................................................................3113
Related scenarios........................................................................................................................................................ 3114

tRiakOutput............................................................................................................. 3115
tRiakOutput Standard properties.......................................................................................................................... 3115
Related scenarios........................................................................................................................................................ 3117

tRouteFault..............................................................................................................3118
tRouteFault Standard properties........................................................................................................................... 3118
Exchanging messages between a Job and a Route....................................................................................... 3119

tRouteInput............................................................................................................. 3126
tRouteInput Standard properties...........................................................................................................................3126
Exchanging messages between a Job and a Route....................................................................................... 3127

tRouteOutput.......................................................................................................... 3132
tRouteOutput Standard properties....................................................................................................................... 3132
Related scenario.......................................................................................................................................................... 3133

tRowGenerator........................................................................................................ 3134
tRowGenerator Standard properties.....................................................................................................................3134
Generating random java data.................................................................................................................................3136

tRSSInput.................................................................................................................3138
tRSSInput Standard properties...............................................................................................................................3138
Fetching frequently updated blog entries.........................................................................................................3139

tRSSOutput..............................................................................................................3141
tRSSOutput Standard properties........................................................................................................................... 3141
Creating an RSS flow and storing files on an FTP server........................................................................... 3142
Creating an RSS flow that contains metadata.................................................................................................3147
Creating an ATOM feed XML file..........................................................................................................................3149

tRunJob.................................................................................................................... 3153
tRunJob Standard properties...................................................................................................................................3153
Calling a Job and passing the parameter needed to the called Job........................................................ 3156
Running a list of child Jobs dynamically........................................................................................................... 3160
Propagating the buffered output data from the child Job to the parent Job........................................3164

tS3BucketCreate..................................................................................................... 3169
tS3BucketCreate Standard properties..................................................................................................................3169
Related scenario.......................................................................................................................................................... 3171

tS3BucketDelete..................................................................................................... 3172
tS3BucketDelete Standard properties................................................................................................................. 3172
Related scenario.......................................................................................................................................................... 3173

tS3BucketExist........................................................................................................ 3174
tS3BucketExist Standard properties.....................................................................................................................3174
Verifing the absence of a bucket, creating it and listing all the S3 buckets........................................ 3176

tS3BucketList.......................................................................................................... 3180
tS3BucketList Standard properties....................................................................................................................... 3180
Related scenario.......................................................................................................................................................... 3181

tS3Close...................................................................................................................3182
tS3Close Standard properties................................................................................................................................. 3182
Related scenario.......................................................................................................................................................... 3183

tS3Connection.........................................................................................................3184
tS3Connection Standard properties..................................................................................................................... 3184
Creating an IAM role on AWS................................................................................................................................ 3187
Setting up SSE KMS for your EMR cluster........................................................................................................ 3187
Setting up SSE KMS for your S3 bucket............................................................................................................ 3189
Related scenario.......................................................................................................................................................... 3191

tS3Copy....................................................................................................................3192
tS3Copy Standard properties.................................................................................................................................. 3192
Copying an S3 object from one bucket to another........................................................................................3194

tS3Delete.................................................................................................................3199
tS3Delete Standard properties...............................................................................................................................3199
Related scenario.......................................................................................................................................................... 3201

tS3Get...................................................................................................................... 3202
tS3Get Standard properties..................................................................................................................................... 3202
Related scenario.......................................................................................................................................................... 3205

tS3List...................................................................................................................... 3206
tS3List Standard properties.....................................................................................................................................3206
Listing files with the same prefix from a bucket........................................................................................... 3208
Tagging S3 objects................................................................................................ 3212
Tagging S3 objects: linking the components...................................................................................................3212
Tagging S3 objects: configuring the components..........................................................................................3212
Tagging S3 objects: executing the Job...............................................................................................................3213

tS3Put...................................................................................................................... 3215
tS3Put Standard properties..................................................................................................................................... 3215
Exchange files with Amazon S3............................................................................................................................3218

tSalesforceBulkExec............................................................................................... 3222
tSalesforceBulkExec Standard properties.......................................................................................................... 3222
Related scenario.......................................................................................................................................................... 3226

tSalesforceConnection........................................................................................... 3227
tSalesforceConnection Standard properties......................................................................................................3227
Connecting to Salesforce using OAuth implicit flow to authenticate the user (deprecated).......... 3230
Related scenario.......................................................................................................................................................... 3234

tSalesforceGetDeleted........................................................................................... 3235
tSalesforceGetDeleted Standard properties...................................................................................................... 3235
Recovering deleted data from Salesforce..........................................................................................................3238

tSalesforceGetServerTimestamp...........................................................................3243
tSalesforceGetServerTimestamp Standard properties................................................................................... 3243
Related scenario.......................................................................................................................................................... 3246

tSalesforceGetUpdated.......................................................................................... 3247
tSalesforceGetUpdated Standard properties.....................................................................................................3247
Related scenario.......................................................................................................................................................... 3251

tSalesforceInput......................................................................................................3252
tSalesforceInput Standard properties..................................................................................................................3252
How to set schema for the guess query feature of tSalesforceInput...................................................... 3257
Related scenario.......................................................................................................................................................... 3262

tSalesforceOutput...................................................................................................3263
tSalesforceOutput Standard properties...............................................................................................................3263
Upserting Salesforce data based on external IDs.......................................................................................... 3268

tSalesforceOutputBulk........................................................................................... 3279
tSalesforceOutputBulk Standard properties......................................................................................................3279
Related scenario.......................................................................................................................................................... 3280

tSalesforceOutputBulkExec...................................................................................3281
tSalesforceOutputBulkExec Standard properties............................................................................................ 3281
Inserting bulk data into Salesforce......................................................................................................................3286

tSalesforceEinsteinBulkExec................................................................................. 3290
tSalesforceEinsteinBulkExec Standard properties.......................................................................................... 3290
Related scenario.......................................................................................................................................................... 3293

tSalesforceEinsteinOutputBulkExec..................................................................... 3294
tSalesforceEinsteinOutputBulkExec Standard properties............................................................................ 3294
Related scenario.......................................................................................................................................................... 3298

tSampleRow............................................................................................................ 3299
tSampleRow Standard properties......................................................................................................................... 3299
Filtering rows and groups of rows.......................................................................................................................3300

tSAPHanaClose........................................................................................................3303
tSAPHanaClose Standard properties.................................................................................................................... 3303
Related scenarios........................................................................................................................................................ 3303

tSAPHanaCommit................................................................................................... 3304
tSAPHanaCommit Standard properties............................................................................................................... 3304
Related scenario.......................................................................................................................................................... 3305

tSAPHanaConnection............................................................................................. 3306
tSAPHanaConnection Standard properties........................................................................................................ 3306
Related scenarios........................................................................................................................................................ 3307

tSAPHanaInput........................................................................................................3308
tSAPHanaInput Standard properties.................................................................................................................... 3308
Related scenarios........................................................................................................................................................ 3311

tSAPHanaOutput.....................................................................................................3312
tSAPHanaOutput Standard properties.................................................................................................................3312
Related scenarios........................................................................................................................................................ 3317

tSAPHanaRollback.................................................................................................. 3318
tSAPHanaRollback Standard properties..............................................................................................................3318
Related scenarios........................................................................................................................................................ 3318

tSAPHanaRow......................................................................................................... 3319
tSAPHanaRow Standard properties...................................................................................................................... 3319
Related scenarios........................................................................................................................................................ 3322

Exporting data using tSAPHanaUnload...............................................................3323


Creating the SAP HANA database connection................................................................................................. 3323
Creating and running the Job.................................................................................................................................3324
tSchemaComplianceCheck.....................................................................................3325

tSCPClose.................................................................................................................3326
tSCPClose Standard properties.............................................................................................................................. 3326
Related scenario.......................................................................................................................................................... 3327

tSCPConnection...................................................................................................... 3328
tSCPConnection Standard properties...................................................................................................................3328
Related scenarios........................................................................................................................................................ 3329

tSCPDelete...............................................................................................................3330
tSCPDelete Standard properties............................................................................................................................ 3330
Related scenarios........................................................................................................................................................ 3331

tSCPFileExists......................................................................................................... 3332
tSCPFileExists Standard properties...................................................................................................................... 3332
Handling a file using SCP........................................................................................................................................3333

tSCPFileList............................................................................................................. 3338
tSCPFileList Standard properties...........................................................................................................................3338
Related scenario.......................................................................................................................................................... 3339

tSCPGet.................................................................................................................... 3340
tSCPGet Standard properties.................................................................................................................................. 3340
Related scenario.......................................................................................................................................................... 3341

tSCPPut.................................................................................................................... 3342
tSCPPut Standard properties.................................................................................................................................. 3342
Related scenario.......................................................................................................................................................... 3343

tSCPRename............................................................................................................ 3344
tSCPRename Standard properties......................................................................................................................... 3344
Related scenario.......................................................................................................................................................... 3345

tSCPTruncate...........................................................................................................3346
tSCPTruncate Standard properties........................................................................................................................3346
Related scenarios........................................................................................................................................................ 3347

tSendMail.................................................................................................................3348
tSendMail Standard properties.............................................................................................................................. 3348
Sending an email on error...................................................................................................................................... 3350

tServerAlive............................................................................................................. 3352
tServerAlive Standard properties.......................................................................................................................... 3352
Validating the status of the connection to a remote host.......................................................................... 3353
tServiceNowConnection.........................................................................................3356
tServiceNowConnection Standard properties...................................................................................................3356
Related scenario.......................................................................................................................................................... 3357

tServiceNowInput................................................................................................... 3358
tServiceNowInput Standard properties............................................................................................................... 3358
Related scenario.......................................................................................................................................................... 3360

tServiceNowOutput................................................................................................ 3361
tServiceNowOutput Standard properties............................................................................................................3361
Related scenario.......................................................................................................................................................... 3363

tSetEnv.....................................................................................................................3364
tSetEnv Standard properties................................................................................................................................... 3364
Modifying a variable during a Job execution................................................................................................... 3365

tSetGlobalVar.......................................................................................................... 3368
tSetGlobalVar Standard properties....................................................................................................................... 3368
Printing out the content of a global variable..................................................................................................3369

tSetKerberosConfiguration....................................................................................3371
tSetKerberosConfiguration Standard properties..............................................................................................3371
Related scenarios........................................................................................................................................................ 3372

tSetKeystore............................................................................................................3373
tSetKeystore Standard properties......................................................................................................................... 3373
Extracting customer information from a private WSDL file........................................................................3374

tSetProxy................................................................................................................. 3379
tSetProxy Standard properties............................................................................................................................... 3379
Related scenarios........................................................................................................................................................ 3381

tSleep....................................................................................................................... 3382
tSleep Standard properties......................................................................................................................................3382
Related scenarios........................................................................................................................................................ 3383

tSnowflakeBulkExec...............................................................................................3384
tSnowflakeBulkExec Standard properties.......................................................................................................... 3384
Loading data in a Snowflake table using custom stage path.................................................................... 3390
Related scenarios........................................................................................................................................................ 3397

tSnowflakeClose..................................................................................................... 3398
tSnowflakeClose Standard properties................................................................................................................. 3398
Related scenario.......................................................................................................................................................... 3398
tSnowflakeCommit................................................................................................. 3399
tSnowflakeCommit Standard properties.............................................................................................................3399
Related scenario for tSnowflakeCommit............................................................................................................3400

tSnowflakeConnection........................................................................................... 3401
tSnowflakeConnection Standard properties......................................................................................................3401
Related scenario.......................................................................................................................................................... 3403

tSnowflakeInput......................................................................................................3404
tSnowflakeInput Standard properties..................................................................................................................3404
Writing data into and reading data from a Snowflake table......................................................................3407

tSnowflakeOutput...................................................................................................3412
tSnowflakeOutput Standard properties.............................................................................................................. 3412
Related scenario.......................................................................................................................................................... 3415

tSnowflakeOutputBulk...........................................................................................3416
tSnowflakeOutputBulk Standard properties......................................................................................................3416
Related scenarios........................................................................................................................................................ 3422

tSnowflakeOutputBulkExec...................................................................................3423
tSnowflakeOutputBulkExec Standard properties............................................................................................ 3423
Loading Data Using COPY Command..................................................................................................................3430
Related scenarios........................................................................................................................................................ 3437

tSnowflakeRollback................................................................................................3438
tSnowflakeRollback Standard properties........................................................................................................... 3438
Related scenario: tSnowflakeRollback................................................................................................................ 3439

tSnowflakeRow....................................................................................................... 3440
tSnowflakeRow Standard properties....................................................................................................................3440
Querying data in a cloud file through a Snowflake external table and a materialized view......... 3443
Related scenario.......................................................................................................................................................... 3449

tSOAP....................................................................................................................... 3450
tSOAP Standard properties......................................................................................................................................3450
Fetching the country name information using a Web service................................................................... 3452
Using a SOAP message from an XML file to get country name information and saving the
information to an XML file..................................................................................................................................... 3454

tSocketInput............................................................................................................ 3458
tSocketInput Standard properties......................................................................................................................... 3458
Passing on data to the listening port................................................................................................................. 3460

tSocketOutput......................................................................................................... 3463
tSocketOutput Standard properties......................................................................................................................3463
Related Scenario..........................................................................................................................................................3464

tSortRow.................................................................................................................. 3465
tSortRow Standard properties................................................................................................................................ 3465
Sorting entries.............................................................................................................................................................. 3466

tSplitRow................................................................................................................. 3469
tSplitRow Standard properties............................................................................................................................... 3469
Splitting one row into two rows...........................................................................................................................3470

tSplunkEventCollector........................................................................................... 3474
tSplunkEventCollector Standard properties...................................................................................................... 3474
Related scenario.......................................................................................................................................................... 3475

tSQLDWHBulkExec................................................................................................. 3476
tSQLDWHBulkExec Standard properties.............................................................................................................3476
Related scenario.......................................................................................................................................................... 3480

tSQLDWHClose........................................................................................................3481
tSQLDWHClose Standard properties.................................................................................................................... 3481
Related scenario.......................................................................................................................................................... 3482

tSQLDWHCommit....................................................................................................3483
tSQLDWHCommit Standard properties............................................................................................................... 3483
Related scenario.......................................................................................................................................................... 3484

tSQLDWHConnection............................................................................................. 3485
tSQLDWHConnection Standard properties........................................................................................................ 3485
Related scenario.......................................................................................................................................................... 3487

tSQLDWHInput........................................................................................................3488
tSQLDWHInput Standard properties.................................................................................................................... 3488
Related scenario.......................................................................................................................................................... 3491

tSQLDWHOutput.....................................................................................................3492
tSQLDWHOutput Standard properties.................................................................................................................3492
Related scenario.......................................................................................................................................................... 3497

tSQLDWHRollback.................................................................................................. 3498
tSQLDWHRollback Standard properties..............................................................................................................3498
Related scenario.......................................................................................................................................................... 3499

tSQLDWHRow......................................................................................................... 3500
tSQLDWHRow Standard properties...................................................................................................................... 3500
Related scenario.......................................................................................................................................................... 3503
tSQLiteClose............................................................................................................3504
tSQLiteClose Standard properties.........................................................................................................................3504
Related scenarios........................................................................................................................................................ 3505

tSQLiteCommit........................................................................................................3506
tSQLiteCommit Standard properties.................................................................................................................... 3506
Related scenario.......................................................................................................................................................... 3507

tSQLiteConnection..................................................................................................3508
tSQLiteConnection Standard properties............................................................................................................. 3508
Related scenarios........................................................................................................................................................ 3509

tSQLiteInput............................................................................................................ 3510
tSQLiteInput Standard properties......................................................................................................................... 3510
Filtering SQlite data...................................................................................................................................................3512

tSQLiteOutput......................................................................................................... 3515
tSQLiteOutput Standard properties......................................................................................................................3515
Related Scenario..........................................................................................................................................................3519

tSQLiteRollback...................................................................................................... 3520
tSQLiteRollback Standard properties...................................................................................................................3520
Related scenarios........................................................................................................................................................ 3521

tSQLiteRow..............................................................................................................3522
tSQLiteRow Standard properties...........................................................................................................................3522
Updating SQLite rows............................................................................................................................................... 3525
Related scenarios........................................................................................................................................................ 3527

tSQLTemplate......................................................................................................... 3528
tSQLTemplate Standard properties...................................................................................................................... 3528
Related scenarios........................................................................................................................................................ 3530

tSQLTemplateAggregate....................................................................................... 3531
tSQLTemplateAggregate Standard properties..................................................................................................3531
Filtering and aggregating table columns directly on the DBMS...............................................................3533

tSQLTemplateCommit............................................................................................3537
tSQLTemplateCommit Standard properties...................................................................................................... 3537
Related scenario.......................................................................................................................................................... 3538

tSQLTemplateFilterColumns................................................................................. 3539
tSQLTemplateFilterColumns Standard properties.......................................................................................... 3539
Related Scenario..........................................................................................................................................................3540
tSQLTemplateFilterRows.......................................................................................3541
tSQLTemplateFilterRows Standard properties.................................................................................................3541
Related Scenario..........................................................................................................................................................3542

tSQLTemplateMerge.............................................................................................. 3543
tSQLTemplateMerge Standard properties..........................................................................................................3543
Merging data directly on the DBMS.................................................................................................................... 3545

tSQLTemplateRollback.......................................................................................... 3552
tSQLTemplateRollback Standard properties.....................................................................................................3552
Related scenarios........................................................................................................................................................ 3553

tSqoopExport.......................................................................................................... 3554
Additional arguments................................................................................................................................................ 3554
tSqoopExport Standard properties....................................................................................................................... 3555
Related scenarios........................................................................................................................................................ 3564

tSqoopImport.......................................................................................................... 3565
tSqoopImport Standard properties....................................................................................................................... 3565
Importing a MySQL table to HDFS.......................................................................................................................3574

tSqoopImportAllTables..........................................................................................3580
tSqoopImportAllTables Standard properties.....................................................................................................3580
Related scenarios........................................................................................................................................................ 3587

tSqoopMerge...........................................................................................................3588
tSqoopMerge Standard properties........................................................................................................................3588
Merging two datasets in HDFS..............................................................................................................................3595

tSQSConnection...................................................................................................... 3600
tSQSConnection Standard properties.................................................................................................................. 3600
Related scenarios........................................................................................................................................................ 3602

tSQSInput.................................................................................................................3603
tSQSInput Standard properties.............................................................................................................................. 3603
Retrieving messages from an Amazon SQS queue........................................................................................ 3606

tSQSMessageChangeVisibility...............................................................................3611
tSQSMessageChangeVisibility Standard properties........................................................................................3611
Related scenario.......................................................................................................................................................... 3613

tSQSMessageDelete............................................................................................... 3614
tSQSMessageDelete Standard properties...........................................................................................................3614
Related scenario.......................................................................................................................................................... 3616
tSQSOutput..............................................................................................................3617
tSQSOutput Standard properties...........................................................................................................................3617
Delivering messages to an Amazon SQS queue............................................................................................. 3620

tSQSQueueAttributes............................................................................................. 3626
tSQSQueueAttributes Standard properties........................................................................................................ 3626
Related scenario.......................................................................................................................................................... 3628

tSQSQueueCreate................................................................................................... 3629
tSQSQueueCreate Standard properties............................................................................................................... 3629
Related scenario.......................................................................................................................................................... 3631

tSQSQueueDelete................................................................................................... 3632
tSQSQueueDelete Standard properties...............................................................................................................3632
Related scenario.......................................................................................................................................................... 3634

tSQSQueueList........................................................................................................ 3635
tSQSQueueList Standard properties.....................................................................................................................3635
Listing Amazon SQS queues in an AWS region...............................................................................................3637

tSQSQueuePurge.................................................................................................... 3641
tSQSQueuePurge Standard properties................................................................................................................ 3641
Related scenario.......................................................................................................................................................... 3643

tSSH..........................................................................................................................3644
tSSH Standard properties.........................................................................................................................................3644
Displaying remote system information via SSH..............................................................................................3647

tStatCatcher.............................................................................................................3649
tStatCatcher Standard properties..........................................................................................................................3649
Displaying the statistics log of Job execution................................................................................................. 3650

tSVNLogInput..........................................................................................................3654
tSVNLogInput Standard properties.......................................................................................................................3654
Retrieving a log message from an SVN repository........................................................................................ 3655

tSybaseBulkExec.....................................................................................................3658
tSybaseBulkExec Standard properties.................................................................................................................3658
Related scenarios........................................................................................................................................................ 3662

tSybaseClose........................................................................................................... 3663
tSybaseClose Standard properties........................................................................................................................ 3663
Related scenario.......................................................................................................................................................... 3664

tSybaseCommit....................................................................................................... 3665
tSybaseCommit Standard properties....................................................................................................................3665
Related scenario.......................................................................................................................................................... 3666

tSybaseConnection................................................................................................. 3667
tSybaseConnection Standard properties.............................................................................................................3667
Related scenarios........................................................................................................................................................ 3668

tSybaseInput............................................................................................................3669
tSybaseInput Standard properties.........................................................................................................................3669
Related scenarios........................................................................................................................................................ 3672

tSybaseIQBulkExec................................................................................................. 3673
tSybaseIQBulkExec Standard properties............................................................................................................ 3673
Related scenarios........................................................................................................................................................ 3680

tSybaseIQOutputBulkExec.....................................................................................3681
tSybaseIQOutputBulkExec Standard properties...............................................................................................3681
Bulk-loading data to a Sybase IQ 12 database............................................................................................... 3685
Related scenarios........................................................................................................................................................ 3688

tSybaseOutput.........................................................................................................3689
tSybaseOutput Standard properties..................................................................................................................... 3689
Related scenarios........................................................................................................................................................ 3694

tSybaseOutputBulk.................................................................................................3695
tSybaseOutputBulk Standard properties............................................................................................................ 3695
Related scenarios........................................................................................................................................................ 3697

tSybaseOutputBulkExec.........................................................................................3698
tSybaseOutputBulkExec Standard properties................................................................................................... 3698
Related scenarios........................................................................................................................................................ 3702

tSybaseRollback......................................................................................................3703
tSybaseRollback Standard properties.................................................................................................................. 3703
Related scenarios........................................................................................................................................................ 3704

tSybaseRow............................................................................................................. 3705
tSybaseRow Standard properties.......................................................................................................................... 3705
Related scenarios........................................................................................................................................................ 3708

tSybaseSCD..............................................................................................................3709
tSybaseSCD Standard properties...........................................................................................................................3709
Related scenarios........................................................................................................................................................ 3712

tSybaseSCDELT....................................................................................................... 3713
tSybaseSCDELT Standard properties................................................................................................................... 3713
Related scenario.......................................................................................................................................................... 3717

tSybaseSP................................................................................................................ 3718
tSybaseSP Standard properties.............................................................................................................................. 3718
Related scenarios........................................................................................................................................................ 3720

tSystem.................................................................................................................... 3722
tSystem Standard properties...................................................................................................................................3722
Echoing 'Hello World!'...............................................................................................................................................3724

tTeradataClose........................................................................................................3726
tTeradataClose Standard properties.....................................................................................................................3726
Related scenarios........................................................................................................................................................ 3727

tTeradataCommit....................................................................................................3728
tTeradataCommit Standard properties................................................................................................................3728
Related scenario.......................................................................................................................................................... 3729

tTeradataConnection..............................................................................................3730
tTeradataConnection Standard properties.........................................................................................................3730
Related scenario.......................................................................................................................................................... 3732

tTeradataFastExport...............................................................................................3733
tTeradataFastExport Standard properties.......................................................................................................... 3733
Related scenarios........................................................................................................................................................ 3735

tTeradataFastLoad..................................................................................................3736
tTeradataFastLoad Standard properties............................................................................................................. 3736
Related scenarios........................................................................................................................................................ 3738

tTeradataFastLoadUtility....................................................................................... 3739
tTeradataFastLoadUtility Standard properties................................................................................................. 3739
Related scenario.......................................................................................................................................................... 3741

tTeradataInput........................................................................................................ 3742
tTeradataInput Standard properties.....................................................................................................................3742
Related scenarios........................................................................................................................................................ 3745

tTeradataMultiLoad................................................................................................3746
tTeradataMultiLoad Standard properties........................................................................................................... 3746
Related scenario.......................................................................................................................................................... 3748

tTeradataOutput..................................................................................................... 3749
tTeradataOutput Standard properties................................................................................................................. 3749
Related scenarios........................................................................................................................................................ 3754
tTeradataRollback.................................................................................................. 3755
tTeradataRollback Standard properties.............................................................................................................. 3755
Related scenario.......................................................................................................................................................... 3756

tTeradataRow..........................................................................................................3757
tTeradataRow Standard properties.......................................................................................................................3757
Related scenarios........................................................................................................................................................ 3761

tTeradataSCD.......................................................................................................... 3762
tTeradataSCD Standard properties....................................................................................................................... 3762
Related scenario.......................................................................................................................................................... 3765

tTeradataSCDELT....................................................................................................3766
tTeradataSCDELT Standard properties................................................................................................................3766
Related scenario.......................................................................................................................................................... 3770

tTeradataTPTExec.................................................................................................. 3771
tTeradataTPTExec Standard properties.............................................................................................................. 3771
Supported optional attributes for each consumer operator....................................................................... 3775
Loading data into a Teradata database............................................................................................................. 3776

tTeradataTPTUtility................................................................................................3783
tTeradataTPTUtility Standard properties........................................................................................................... 3783
Related scenario.......................................................................................................................................................... 3787

tTeradataTPump..................................................................................................... 3788
tTeradataTPump Standard properties................................................................................................................. 3788
Inserting data into a Teradata database table................................................................................................ 3790

tUniqRow................................................................................................................. 3794
tUniqRow Standard properties...............................................................................................................................3794
Deduplicating entries.................................................................................................................................................3795

tUnite....................................................................................................................... 3799
tUnite Standard properties...................................................................................................................................... 3799
Iterating on files and merge the content.......................................................................................................... 3800

tVectorWiseCommit................................................................................................3803
tVectorWiseCommit Standard properties........................................................................................................... 3803
Related scenario.......................................................................................................................................................... 3804

tVectorWiseConnection......................................................................................... 3805
tVectorWiseConnection Standard properties.................................................................................................... 3805
Related scenario.......................................................................................................................................................... 3806
tVectorWiseInput.................................................................................................... 3807
tVectorWiseInput Standard properties................................................................................................................ 3807
Related scenario.......................................................................................................................................................... 3810

tVectorWiseOutput................................................................................................. 3811
tVectorWiseOutput Standard properties.............................................................................................................3811
Related scenario.......................................................................................................................................................... 3815

tVectorWiseRollback.............................................................................................. 3816
tVectorWiseRollback Standard properties..........................................................................................................3816
Related scenario.......................................................................................................................................................... 3817

tVectorWiseRow......................................................................................................3818
tVectorWiseRow Standard properties.................................................................................................................. 3818
Related scenario.......................................................................................................................................................... 3821

tVerticaBulkExec.....................................................................................................3822
tVerticaBulkExec Standard properties.................................................................................................................3822
Related scenarios........................................................................................................................................................ 3827

tVerticaClose........................................................................................................... 3828
tVerticaClose Standard properties........................................................................................................................ 3828
Related scenarios........................................................................................................................................................ 3829

tVerticaCommit....................................................................................................... 3830
tVerticaCommit Standard properties....................................................................................................................3830
Related scenario.......................................................................................................................................................... 3831

tVerticaConnection................................................................................................. 3832
tVerticaConnection Standard properties.............................................................................................................3832
Related scenario.......................................................................................................................................................... 3833

tVerticaInput........................................................................................................... 3834
tVerticaInput Standard properties.........................................................................................................................3834
Related scenarios........................................................................................................................................................ 3837

tVerticaOutput........................................................................................................ 3838
tVerticaOutput Standard properties..................................................................................................................... 3838
Related scenarios........................................................................................................................................................ 3843

tVerticaOutputBulk.................................................................................................3844
tVerticaOutputBulk Standard properties............................................................................................................ 3844
Related scenarios........................................................................................................................................................ 3846

tVerticaOutputBulkExec.........................................................................................3847
tVerticaOutputBulkExec Standard properties................................................................................................... 3847
Related scenarios........................................................................................................................................................ 3851

tVerticaRollback......................................................................................................3852
tVerticaRollback Standard properties..................................................................................................................3852
Related scenario.......................................................................................................................................................... 3853

tVerticaRow............................................................................................................. 3854
tVerticaRow Standard properties.......................................................................................................................... 3854
Related scenario.......................................................................................................................................................... 3857

tVerticaSCD............................................................................................................. 3858
tVerticaSCD Standard properties...........................................................................................................................3858
Related scenarios........................................................................................................................................................ 3861

tVtigerCRMInput..................................................................................................... 3862
tVtigerCRMInput Standard properties................................................................................................................. 3862
Related scenarios........................................................................................................................................................ 3863

tVtigerCRMOutput.................................................................................................. 3864
tVtigerCRMOutput Standard properties.............................................................................................................. 3864
Related scenarios........................................................................................................................................................ 3866

tWaitForFile.............................................................................................................3867
tWaitForFile Standard properties.......................................................................................................................... 3867
Waiting for a file to be created and stopping the iteration loop after a message is triggered.......3869
Waiting for a file to be created and continuing the iteration loop after a message is triggered....3871

tWaitForSocket....................................................................................................... 3873
tWaitForSocket Standard properties.................................................................................................................... 3873
Related scenarios........................................................................................................................................................ 3874

tWaitForSqlData..................................................................................................... 3875
tWaitForSqlData Standard properties..................................................................................................................3875
Waiting for insertion of rows in a table............................................................................................................ 3876

tWarn........................................................................................................................3879
tWarn Standard properties.......................................................................................................................................3879
Related scenarios........................................................................................................................................................ 3880

tWebService.............................................................................................................3881
tWebService Standard properties..........................................................................................................................3881
Getting country names using tWebService....................................................................................................... 3883

tWebServiceInput................................................................................................... 3890
tWebServiceInput Standard properties............................................................................................................... 3890
Getting country names using tWebServiceInput.............................................................................................3892

tWorkdayInput........................................................................................................ 3895
tWorkdayInput Standard properties..................................................................................................................... 3895
Related scenario.......................................................................................................................................................... 3896

tWriteJSONField......................................................................................................3897
Configuring a JSON Tree.......................................................................................................................................... 3897
tWriteJSONField Standard properties.................................................................................................................. 3897
Writing flat data into JSON fields.........................................................................................................................3899
Related Scenarios........................................................................................................................................................3903

tWriteXMLField.......................................................................................................3904
tWriteXMLField Standard properties................................................................................................................... 3904
Extracting the structure of an XML file and inserting it into the fields of a database table...........3906

tXMLMap................................................................................................................. 3910
tXMLMap Standard properties............................................................................................................................... 3910
Mapping and transforming XML data................................................................................................................. 3911
Restructuring products data using multiple loop elements....................................................................... 3933

tXMLRPCInput.........................................................................................................3943
tXMLRPCInput Standard properties..................................................................................................................... 3943
Guessing the State name from an XMLRPC..................................................................................................... 3944

tXSDValidator......................................................................................................... 3946
tXSDValidator Standard properties...................................................................................................................... 3946
Validating data flows against an XSD file........................................................................................................ 3948

tXSLT........................................................................................................................3953
tXSLT Standard properties.......................................................................................................................................3953
Transforming XML to html using an XSL stylesheet.................................................................................... 3954
Copyleft

Copyleft
Adapted for 7.3.1. Supersedes previous releases.
The content of this document is correct at the time of publication.
However, more recent updates may be available in the online version that can be found on Talend
Help Center.
This documentation is provided under the terms of the Creative Commons Public License (CCPL).
For more information about what you can and cannot do with this documentation in accordance with
the CCPL, please read: http://creativecommons.org/licenses/by-nc-sa/2.0/.
Notices
Talend is a trademark of Talend, Inc.
All brands, product names, company names, trademarks and service marks are the properties of their
respective owners.
License Agreement
The software described in this documentation is licensed under the Apache License, Version 2.0 (the
"License"); you may not use this software except in compliance with the License. You may obtain
a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.html. Unless required by
applicable law or agreed to in writing, software distributed under the License is distributed on an "AS
IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under the License.
This product includes software developed at AOP Alliance (Java/J2EE AOP standards), ASM, Amazon,
AntlR, Apache ActiveMQ, Apache Ant, Apache Avro, Apache Axiom, Apache Axis, Apache Axis 2,
Apache Batik, Apache CXF, Apache Cassandra, Apache Chemistry, Apache Common Http Client, Apache
Common Http Core, Apache Commons, Apache Commons Bcel, Apache Commons JxPath, Apache
Commons Lang, Apache Datafu, Apache Derby Database Engine and Embedded JDBC Driver, Apache
Geronimo, Apache HCatalog, Apache Hadoop, Apache Hbase, Apache Hive, Apache HttpClient, Apache
HttpComponents Client, Apache JAMES, Apache Log4j, Apache Lucene Core, Apache Neethi, Apache
Oozie, Apache POI, Apache Parquet, Apache Pig, Apache PiggyBank, Apache ServiceMix, Apache
Sqoop, Apache Thrift, Apache Tomcat, Apache Velocity, Apache WSS4J, Apache WebServices Common
Utilities, Apache Xml-RPC, Apache Zookeeper, Box Java SDK (V2), CSV Tools, Cloudera HTrace,
ConcurrentLinkedHashMap for Java, Couchbase Client, DataNucleus, DataStax Java Driver for Apache
Cassandra, Ehcache, Ezmorph, Ganymed SSH-2 for Java, Google APIs Client Library for Java, Google
Gson, Groovy, Guava: Google Core Libraries for Java, H2 Embedded Database and JDBC Driver, Hector:
A high level Java client for Apache Cassandra, Hibernate BeanValidation API, Hibernate Validator,
HighScale Lib, HsqlDB, Ini4j, JClouds, JDO-API, JLine, JSON, JSR 305: Annotations for Software Defect
Detection in Java, JUnit, Jackson Java JSON-processor, Java API for RESTful Services, Java Agent for
Memory Measurements, Jaxb, Jaxen, JetS3T, Jettison, Jetty, Joda-Time, Json Simple, LZ4: Extremely
Fast Compression algorithm, LightCouch, MetaStuff, Metrics API, Metrics Reporter Config, Microsoft
Azure SDK for Java, Mondrian, MongoDB Java Driver, Netty, Ning Compression codec for LZF encoding,
OpenSAML, Paraccel JDBC Driver, Parboiled, PostgreSQL JDBC Driver, Protocol Buffers - Google's
data interchange format, Resty: A simple HTTP REST client for Java, Riak Client, Rocoto, SDSU Java
Library, SL4J: Simple Logging Facade for Java, SQLite JDBC Driver, Scala Lang, Simple API for CSS,
Snappy for Java a fast compressor/decompresser, SpyMemCached, SshJ, StAX API, StAXON - JSON via
StAX, Super SCV, The Castor Project, The Legion of the Bouncy Castle, Twitter4J, Uuid, W3C, Windows
Azure Storage libraries for Java, Woden, Woodstox: High-performance XML processor, Xalan-J, Xerces2,

77
Copyleft

XmlBeans, XmlSchema Core, Xmlsec - Apache Santuario, YAML parser and emitter for Java, Zip4J,
atinject, dropbox-sdk-java: Java library for the Dropbox Core API, google-guice. Licensed under their
respective license.

78
tAccessBulkExec

tAccessBulkExec
Offers gains in performance when carrying out Insert operations in an Access database.
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data
to a delimited file and then to perform various actions on the file in an Access database, in a two step
process. These two steps are fused together in the tAccessOutputBulkExec component, detailed in a
separate section. The advantage of using a two step process is that it makes it possible to carry out
transformations on the data before loading it in the database.
This component executes an Insert action on the data provided.

tAccessBulkExec Standard properties


These properties are used to configure tAccessBulkExec running in the Standard Job framework.
The Standard tAccessBulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data is stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

79
tAccessBulkExec

DB version Select the version of your database.

Database Type in the directory where your database is stored.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create table: The table is removed and created
again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear table: The table content is deleted.

Table Name of the table to be written. Note that only one table
can be written at a time and that the table must exist
already for the insert operation to succeed.

Local filename Browse to the delimited file to be loaded into your d


atabase.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon

80
tAccessBulkExec

completion and choose this schema metadata again in


the Repository Content window.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

Include header Select this check box to include the column header.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is to be used along with tAccessOutputB


ulk component. Used together, they can offer gains in
performance while feeding an Access database.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Limitation If you are using an ODBC driver, make sure that your JVM
and ODBC versions match up: both 64-bit or 32-bit.

Related scenarios
For use cases in relation with tAccessBulkExec, see the following scenarios:
• Inserting transformed data in MySQL database on page 2482
• Inserting data in bulk in MySQL database on page 2489

81
tAccessClose

tAccessClose
Closes an active connection to the Access database so as to release occupied resources.

tAccessClose Standard properties


These properties are used to configure tAccessClose running in the Standard Job framework.
The Standard tAccessClose component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tAccessConnection component in the list if more


than one connection is planned for the current Job.

Advanced settings

tStat Catcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Usage

Usage rule This component is to be used along with other Access


components, especially with tAccessConnection and
tAccessCommit.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

82
tAccessClose

Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.

Related scenarios
No scenario is available for the Standard version of this component yet.

83
tAccessCommit

tAccessCommit
Commits in one go a global transaction instead of doing that on every row or every batch, and
provides gain in performance, using a unique connection.
tAccessCommit validates the data processed through the Job into the connected database.

tAccessCommit Standard properties


These properties are used to configure tAccessCommit running in the Standard Job framework.
The Standard tAccessCommit component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tAccessConnection component in the list if more


than one connection are planned for the current Job.

Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.

Warning:
If you want to use a Row > Main connection to link
tAccessCommit to your Job, your data will be committed
row by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other tAccess*
components, especially with the tAccessConnection and
tAccessRollback components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces

84
tAccessCommit

s database tables having the same data structure but in


different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.

Related scenario
For tAccessCommit related scenario, see Inserting data in mother/daughter tables on page 2426

85
tAccessConnection

tAccessConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tAccessConnection opens a connection to the database for a current transaction.

tAccessConnection Standard properties


These properties are used to configure tAccessConnection running in the Standard Job framework.
The Standard tAccessConnection component belongs to the Databases and the ELT families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

DB Version Access 2003 or later versions.

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.

86
tAccessConnection

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating.

Usage

Usage rule This component is more commonly used with other


tAccess* components, especially with the tAccessCommit
and tAccessRollback components.

Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
When working with Java 8, this component supports only
the General collation mode of Access.

Inserting data in parent/child tables


The following Job is dedicated to advanced database users, who want to carry out multiple table
insertions using a parent table Table1 to generate two child tables: Name and Birthday.
• In Access 2007, create an Access database named Database1.
• Once the Access database is created, create a table named Table1 with two column headings:
Name and Birthday.
Back into the Integration perspective of Talend Studio , the Job requires twelve components
including tAccessConnection, tAccessCommit, tAccessInput, tAccessOutput and tAccessClose.

87
tAccessConnection

• Drop the following components from the Palette to the design workspace: tFileList, tFileInputDeli
mited, tMap, tAccessOutput (two), tAccessInput (two), tAccessCommit, tAccessClose and tLogRow
(x2).
• Connect the tFileList component to the input file component using an Iterate link. Thus, the name
of the file to be processed will be dynamically filled in from the tFileList directory using a global
variable.
• Connect the tFileInputDelimited component to the tMap component and dispatch the flow
between the two output Access components. Use a Row link for each of these connections
representing the main data flow.
• Set the tFileList component properties, such as the directory where files will be fetched from.
• Add a tAccessConnection component and connect it to the starter component of this Job. In this
example, the tFileList component uses an OnComponentOk link to define the execution order.
• In the tAccessConnection Component view, set the connection details manually or fetch them
from the Repository if you centrally store them as a Metadata DB connection entry. For more
information about Metadata, see Talend Studio User Guide .
• In the tFileInputDelimited component's Basic settings view, press Ctrl+Space bar to access the
variable list. Set the File Name field to the global variable: tFileList_1.CURRENT_FILEPATH. For
more information about using variables, see Talend Studio User Guide.

• Set the rest of the fields as usual, defining the row and field separators according to your file
structure.
• Then set the schema manually through the Edit schema dialog box or select the schema from the
Repository . Make sure the data type is correctly set, in accordance with the nature of the data
processed.
• In the tMap Output area, add two output tables, one called Name for the Name table, the second
called Birthday, for the Birthday table. For more information about the tMap component, see
Talend Studio User Guide.
• Drag the Name column from the Input area, and drop it to the Name table.
• Drag the Birthday column from the Input area, and drop it to the Birthday table.

88
tAccessConnection

• Then connect the output row links to distribute the flow correctly to the relevant DB output
components.
• In each of the tAccessOutput components' Basic settings view, select the Use an existing
connection check box to retrieve the tAccessConnection details.

• Set the Table name making sure it corresponds to the correct table, in this example either Name
or Birthday.
• There is no action on the table as they are already created.
• Select Insert as Action on data for both output components.
• Click on Sync columns to retrieve the schema set in the tMap.
• Then connect the first tAccessOutput component to the first tAccessInput component using an
OnComponentOk link.
• In each of the tAccessInput components' Basic settings view, select the Use an existing
connection check box to retrieve the distributed data flow. Then set the schema manually through
Edit schema dialog box.
• Then set the Table Name accordingly. In tAccessInput_1, this will be Name.
• Click on the Guess Query.
• Connect each tAccessInput component to tLogRow component with a Row > Main link. In each of
the tLogRow components' basic settings view, select Table in the Mode field.
• Add the tAccessCommit component below the tFileList component in the design workspace and
connect them together using an OnComponentOk link in order to terminate the Job with the trans
action commit.
• In the basic settings view of tAccessCommit component and from the Component list, select the
connection to be used, tAccessConnection_1 in this scenario.
• Save your Job and press F6 to execute it.

89
tAccessConnection

The parent table Table1 is reused to generate the Name table and Birthday table.

90
tAccessInput

tAccessInput
Reads a database and extracts fields based on a query.
tAccessInput executes a DB query with a strictly defined statement which must correspond to
the schema definition. Then it passes on the field list to the next component via a Row > Main
connection.

tAccessInput Standard properties


These properties are used to configure tAccessInput running in the Standard Job framework.
The Standard tAccessInput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see the section describing how to
set up a DB connection of Talend Studio User Guide.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

91
tAccessInput

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

DB Version Select the version of Access that you are using.

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.

92
tAccessInput

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Remove leading and trailing whitespace from defined


columns.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all possible SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.

93
tAccessInput

For examples on using dynamic parameters, see Reading


data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
When working with Java 8, this component supports only
the General collation mode of Access.

Related scenarios
For related topics, see:
Related topic in description of tContextLoad on page 496.

94
tAccessOutput

tAccessOutput
Writes, updates, makes changes or suppresses entries in a database.
tAccessOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.

tAccessOutput Standard properties


These properties are used to configure tAccessOutput running in the Standard Job framework.
The Standard tAccessOutput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

95
tAccessOutput

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

DB Version Select the version of Access that you are using.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create table: The table is removed and created
again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear table: The table content is deleted.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries.
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.

96
tAccessOutput

Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You can
do that by clicking Edit Schema and selecting the check
box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings
view where you can simultaneously define primary keys for
the update and delete operations. To do that: Select the
Use field options check box and then in the Key in update
column, select the check boxes next to the column name on
which you want to base the update operation. Do the same
in the Key in delete column for the deletion operation.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if

97
tAccessOutput

you have selected the Use an existing connection check box


in the Basic settings.

Note:
You can press Ctrl+Space to access a list of predefined
global variables.

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and, above all, better
performance at executions.

Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.

  Name: Type in the name of the schema column to be


altered or inserted as new column

  SQL expression: Type in the SQL statement to be executed


in order to alter or insert the relevant column data.

  Position: Select Before, Replace or After following the


action to be performed on the reference column.

  Reference column: Type in a column of reference that the


tDBOutput can use to place or replace the new or altered
column.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Use field options Select this check box to customize a request, especially
when there is double action on data.

Debug query mode Select this check box to display each step during processing
entries in a database.

Support null in "SQL WHERE" statement Select this check box if you want to deal with the Null
values contained in a DB table.

Note:
Make sure the Nullable check box is selected for the c
orresponding columns in the schema.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.

98
tAccessOutput

NB_LINE_DELETED: the number of rows deleted. This is an


After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a Access database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMysqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
When working with Java 8, this component supports only
the General collation mode of Access.

99
tAccessOutput

Related scenarios
For related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.

100
tAccessOutputBulk

tAccessOutputBulk
Prepares the file which contains the data used to feed the Access database.
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data
to a delimited file and then to perform various actions on the file in an Access database, in a two step
process. These two steps are fused together in the tAccessOutputBulkExec component, detailed in a
separate section. The advantage of using a two step process is that it makes it possible to carry out
transformations on the data before loading it in the database.
tAccessOutputBulk writes a delimited file.

tAccessOutputBulk Standard properties


These properties are used to configure tAccessOutputBulk running in the Standard Job framework.
The Standard tAccessOutputBulk component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

File Name Name and path to the file to be created and/or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Create directory if not exists Select this check box to create the as yet non-existant file d
irectory that specified in the File name field.

Append Select this check box to add any new rows to the end of the
file.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

101
tAccessOutputBulk

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Advanced settings

Include header Select this check box to include the column header in the
file.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

102
tAccessOutputBulk

Usage

Usage rule This component is to be used along with tAccessBulkExec


component. Used together they offer gains in performance
while feeding an Access database.

Component family Databases/Access

Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
When working with Java 8, this component supports only
the General collation mode of Access.

Related scenarios
For use cases in relation with tAccessOutputBulk, see the following scenarios:
• Inserting transformed data in MySQL database on page 2482
• Inserting data in bulk in MySQL database on page 2489

103
tAccessOutputBulkExec

tAccessOutputBulkExec
Executes an Insert action on the data provided, in an Access database.
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data
to a delimited file and then to perform various actions on the file in an Access database, in a two step
process. These two steps are fused together in tAccessOutputBulkExec.
As a dedicated component, tAccessOutputBulkExec improves performance during Insert operations in
an Access database.

tAccessOutputBulkExec Standard properties


These properties are used to configure tAccessOutputBulkExec running in the Standard Job
framework.
The Standard tAccessOutputBulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

104
tAccessOutputBulkExec

DB Version Select the version of Access that you are using.

DB name Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create table: The table is removed and created
again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
already exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear table: The table content is deleted.

Table Name of the table to be written.

Note:
Note that only one table can be written at a time and
that the table must already exist for the insert operation
to succeed

FileName Name of the file to be processed.


Related topic: see Talend Studio User Guide.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

105
tAccessOutputBulkExec

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Create directory if not exists Select this check box to create the as yet non existant file d
irectory specified in the File name field.

Append Select this check box to append new rows to the end of the
file.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

Note:
You can press Ctrl+Space to access a list of predefined
global variables.

Include header Select this check box to include the column header to the
file.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

tStatCatcher Statistics Select this check box to collect the log data at the
component level.

Usage

Usage rule This component is mainly used when no particular


transformation is required on the data to be loaded in the
database.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.

106
tAccessOutputBulkExec

The Dynamic settings table is available only when the Use


an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Limitation If you are using an ODBC driver, make sure that your JVM
and ODBC versions match up: both 64-bit or 32-bit.

Related scenarios
For use cases in relation with tAccessOutputBulkExec, see the following scenarios:
• Inserting data in bulk in MySQL database on page 2489
• Inserting transformed data in MySQL database on page 2482

107
tAccessRollback

tAccessRollback
Cancels the transaction commit in the connected database and avoids to commit part of a transaction
involuntarily.

tAccessRollback Standard properties


These properties are used to configure tAccessRollback running in the Standard Job framework.
The Standard tAccessRollback component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tAccessConnection component in the list if more


than one connection are planned for the current Job.

Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other tAccess*
components, especially with the tAccessConnection and
tAccessCommit components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection

108
tAccessRollback

parameters on page 497. For more information on


Dynamic settings and context variables, see Talend Studio
User Guide.

Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.

Related scenarios
No scenario is available for the Standard version of this component yet.

109
tAccessRow

tAccessRow
Executes the SQL query stated onto the specified database.
Depending on the nature of the query and the database, tAccessRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL
statements. tAccessRow is the specific component for this database query. The row suffix means the
component implements a flow in the job design although it does not provide output.

tAccessRow Standard properties


These properties are used to configure tAccessRow running in the Standard Job framework.
The Standard tAccessRow component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

DB Version Select the Access database version that you are using.

110
tAccessRow

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Table Name Name of the source table where changes made to data
should be captured.

Query type The query can be Built-in for a particular Job, or for
commonly used query, it can be stored in the Repository to
ease the query reuse.

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-

111
tAccessRow

free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.

Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

112
tAccessRow

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Limitation If you are using an ODBC driver, you need to work with Java
7, and make sure that your JVM and ODBC versions match
up: both 64-bit or 32-bit.
When working with Java 8, this component supports only
the General collation mode of Access.

Related scenarios
For related topics, see:
• Procedure on page 622
• Removing and regenerating a MySQL table index on page 2497.

113
tAddCRCRow

tAddCRCRow
Provides a unique ID which helps improving the quality of processed data. CRC stands for Cyclical
Redundancy Checking.
tAddCRCRow calculates a surrogate key based on one or several columns and adds it to the defined
schema.

tAddCRCRow Standard properties


These properties are used to configure tAddCRCRow running in the Standard Job framework.
The Standard tAddCRCRow component belongs to the Data Quality family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description. it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
In this component, a new CRC column is automatically
added.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
designs. Related topic: see Talend Studio User Guide.

Implication Select the check box facing the relevant columns to be used
for the surrogate key checksum.

Advanced Settings

CRC type Select a CRC type in the list. The longer the CRC, the least
overlap you will have.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

114
tAddCRCRow

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is an intermediary step. It requires an input


flow as well as an output.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Adding a surrogate key to a file


This scenario describes a Job adding a surrogate key to a delimited file schema.

Setting up the Job


Procedure
1. Drop the following components: tFileInputDelimited, tAddCRCRow and tLogRow.
2. Connect them using a Main row connection.

Configuring the input component


Procedure
1. In the tFileInputDelimited Component view, set the File Name path and all related properties in
case these are not stored in the Repository.

115
tAddCRCRow

2. Create the schema through the Edit Schema button, if the schema is not stored already in the
Repository . Remember to set the data type column and for more information on the Date pattern
to be filled in, visit http://docs.oracle.com/javase/6/docs/api/index.html .

Configuring the tAddCRCRow component


Procedure
1. In the tAddCRCRow Component view, select the check boxes of the input flow columns to be used
to calculate the CRC.

Notice that a CRC column (read-only) has been added at the end of the schema.
2. Select CRC32 as CRC Type to get a longer surrogate key.

3. In the Basic settings view of tLogRow, select the Print values in cells of a table option to display
the output data in a table on the Console.

Job execution
Then save your Job and press F6 to execute it.

116
tAddCRCRow

An additional CRC Column has been added to the schema calculated on all previously selected
columns (in this case all columns of the schema).

117
tAddLocationFromIP

tAddLocationFromIP
Replaces IP addresses with geographical locations.
tAddLocationFromIP geolocates visitors through their IP addresses: this component identifies visitors'
geographical locations (country, region, city, latitude, longitude, ZIP code, etc.) using an IP address
lookup database file.

tAddLocationFromIP Standard properties


These properties are used to configure tAddLocationFromIP running in the Standard Job framework.
The Standard tAddLocationFromIP component belongs to the Misc family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the fields to be
processed and passed on to the next component. The
schema of this component is read-only.

  Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.

  Repository: Select the Repository file where Properties are


stored. When selected, the fields that follow are pre-defined
using fetched data.

Database Filepath The path to the IP address lookup database file.

Input parameters Input column: Select the input column from which the input
values are to be taken.

  input value is a hostname: Check if the input column holds


hostnames.

  input value is an IP address: Check if the input column holds


IP addresses.

Location type Country code: Check to replace IP with country code.

  Country name: Check to replace IP with country name.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.

118
tAddLocationFromIP

ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is an intermediary step in the data flow


allowing to replace IP with geolocation information. It can
not be a start component as it requires an input flow. It also
requires an output component.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).
• geoip.jar

Identifying a real-world geographic location of an IP


The following scenario creates a three-component Job that associates an IP with a geographical
location. It obtains a site visitor's geographical location based on its IP.

Dropping and linking components


Procedure
1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput,
tAddLocationFromIP, and tLogRow.
2. Connect the three components using Row Main links.

119
tAddLocationFromIP

Configuring the components


Procedure
1. In the design workspace, select tFixedFlowInput, and click the Component tab to define the basic
settings for tFixedFlowInput.
2. Click the [...] button next to Edit Schema to define the structure of the data you want to use as
input. In this scenario, the schema is made of one column that holds an IP address.

3. Click OK to close the dialog box, and accept propagating the changes when prompted by the
system. The defined column is displayed in the Values panel of the Basic settings view.
4. In the Number of rows field, enter the number of rows to be generated, and click in the Value cell
and set the value for the IP address.

5. In the design workspace, select tAddLocationFromIP and click the Component tab to define the
basic settings for tAddLocationFromIP.

6. Click the Sync columns button to synchronize the schema with the input schema set with
tFixedFlowInput.
7. Browse to the GeoIP.dat file to set its path in the Database filepath field.

120
tAddLocationFromIP

Note:
Ensure to download the latest version of the IP address lookup database file from the relevant
site as indicated in the Basic settings view of tAddLocationFromIp.

8. In the Input parameters panel, set your input parameters as needed. In this scenario, the input
column is the ip column defined earlier that holds an IP address.
9. In the Location type panel, set location type as needed. In this scenario, we want to display the
country name.
10. In the design workspace, select tLogRow and click the Component tab and define the basic
settings for tLogRow as needed. In this scenario, we want to display values in cells of a table.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 or click Run in the Run tab to execute the Job.

Results
One row is generated to display the country name that is associated with the set IP address.

121
tAdvancedFileOutputXML

tAdvancedFileOutputXML
Writes an XML file with separated data values according to an XML tree structure.
tAdvancedFileOutputXML outputs data to an XML type of file and offers an interface to deal with loop
and group by elements if needed.

tAdvancedFileOutputXML Standard properties


These properties are used to configure tAdvancedFileOutputXML running in the Standard Job
framework.
The Standard tAdvancedFileOutputXML component belongs to the File and the XML families.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the Repository file where Properties are


stored. The following fields are pre-filled in using fetched
data.

Use Output Stream Select this check box process the data flow of interest. Once
you have selected it, the Output Stream field displays and
you can type in the data flow of interest.
The data flow to be processed must be added to the flow
in order for this component to fetch these data via the
corresponding representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using
along with this component; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.
For further information about how to use a stream, see
Reading data from a remote file in streaming mode on page
1020.

File name Name or path to the output file and/or the variable to be
used.
This field becomes unavailable once you have selected the
Use Output Stream check box.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Configure XML tree Opens the dedicated interface to help you set the XML
mapping. For details about the interface, see Defining the
XML tree on page 125.

122
tAdvancedFileOutputXML

Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either built-in or remote in the
Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and job
designs. Related topic: see Talend Studio User Guide.

Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the Output component.

Append the source xml file Select this check box to add the new lines at the end of
your source XML file.

Generate compact file Select this check box to generate a file that does not have
any empty space or line separators. All elements then are
presented in a unique line and this will reduce considerably
file size.

Include DTD or XSL Select this check box to to add the DOCTYPE declaration,
indicating the root element, the access path and the DTD
file, or to add the processing instruction, indicating the
type of stylesheet used (such as XSL types), along with the
access path and file name.

Advanced settings

Split output in several files If the XML file output is big, you can split the file every
certain number of rows.

Trim data This check box is activated when you are using the dom4j
generation mode. Select this check box to trim the leading
or trailing whitespace from the value of a XML element.

Create directory only if not exists This check box is selected by default. It creates a directory
to hold the output XML files if required.

123
tAdvancedFileOutputXML

Create empty element if needed This box is selected by default. If no column is associated
to an XML node, this option will create an open/close tag in
place of the expected tag.

Create attribute even if its value is NULL Select this check box to generate XML tag attribute for the
associated input column whose value is null.

Create attribute even if it is unmapped Select this check box to generate XML tag attribute for the
associated input column that is unmapped.

Create associated XSD file If one of the XML elements is defined as a Namespace
element, this option will create the corresponding XSD file.

Note:
To use this option, you must select Dom4J as the
generation mode.

Add Document type as node Select this check box to add column(s) of the Document
type as node(s) instead of escaped string(s) in the output
XML file.
This check box appears only when the generation mode
is set to Slow and memory-consuming (Dom4j) in the
Advanced settings tab.

Advanced separator (for number) Select this check box to change the expected data s
eparator.
Thousands separator: define the thousands separator,
between inverted commas
Decimal separator: define the decimals separator between
inverted commas

Generation mode Select the appropriate generation mode according to your


memory availability. The available modes are:
• Slow and memory-consuming (Dom4j)

Note:
This option allows you to use dom4j to process the
XML files of high complexity.

• Fast with low memory consumption


Once you select Append the source xml file in the Basic
settings view, this field disappears because in this situation,
your generation mode is set automatically as dom4j.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

Don't generate empty file Select the check box to avoid the generation of an empty
file.

tStatCatcher Statistics Select the check box to collect the log data at a Job level as
well as at each component level.

124
tAdvancedFileOutputXML

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to write an XML file with data passed
on from other components using a Row link.

Defining the XML tree


Double-click on the tAdvancedFileOutputXML component to open the dedicated interface or click on
the three-dot button on the Basic settings vertical tab of the Component Settings tab.

125
tAdvancedFileOutputXML

To the left of the mapping interface, under Schema List, all of the columns retrieved from the
incoming data flow are listed (only if an input flow is connected to the tAdvancedFileOutputXML
component).
To the right of the interface, define the XML structure you want to obtain as output.
You can easily import the XML structure or create it manually, then map the input schema columns
onto each corresponding element of the XML tree.

Importing the XML tree


The easiest and most common way to fill out the XML tree panel, is to import a well-formed XML file.

Procedure
1. Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2. Right-click on the root tag to display the contextual menu.
3. On the menu, select Import XML tree.
4. Browse to the file to import and click OK.
• You can import an XML tree from files in XML, XSD and DTD formats.
• When importing an XML tree structure from an XSD file, you can choose an element as the
root of your XML tree.
The XML Tree column is hence automatically filled out with the correct elements.
5. If you need to add or remove an element or sub-elements, right-click the relevant element of the
tree to display the contextual menu.
6. Select Delete to remove the selection from the tree or select the relevant option among: Add sub-
element, Add attribute, Add namespace to enrich the tree.

Creating the XML tree manually


If you don't have any XML structure defined as yet, you can create it manually.

Procedure
1. Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2. Right-click on the root tag to display the contextual menu.
3. On the menu, select Add sub-element to create the first element of the structure.
4. If you need to add an attribute or a child element to any element or remove any element, right-
click the left of the corresponding element name to display the contextual menu.
5. Right-click to the left of the element name to display the contextual menu.
6. On the menu, select the relevant option among: Add sub-element, Add attribute, Add namespace
or Delete.

126
tAdvancedFileOutputXML

Mapping XML data


Once your XML tree is ready, you can map each input column with the relevant XML tree element or
sub-element to fill out the Related Column.

Procedure
1. Click on one of the Schema column name.
2. Drag it onto the relevant sub-element to the right.
3. Release to implement the actual mapping.
4. If you need to disconnect any mapping on any element of the XML tree, select the element and
right-click to the left of the element name to display the contextual menu.
5. Select Disconnect linker.

Defining the node status


Defining the XML tree and mapping the data is not sufficient. You also need to define the loop
element and if required the group element.

Define a loop element


The loop element allows you to define the iterating object. Generally the Loop element is also the
row generator.

About this task


To define an element as loop element:

Procedure
1. Select the relevant element on the XML tree.
2. Right-click to the left of the element name to display the contextual menu.
3. Select Set as Loop Element.

Results
The Node Status column shows the newly added status.
There can only be one loop element at a time.

Define a group element


The group element is optional, it represents a constant element where the groupby operation can be
performed. A group element can be defined only if a loop element was defined before.

About this task


When using a group element, the rows should sorted, in order to be able to group by the selected
node.
To define an element as group element:

127
tAdvancedFileOutputXML

Procedure
1. Select the relevant element on the XML tree.
2. Right-click to the left of the element name to display the contextual menu.
3. Select Set as Group Element.

Results
The Node Status column shows the newly added status and any group status required are
automatically defined, if needed.
Click OK once the mapping is complete to validate the definition and continue the job configuration
where needed.

Creating an XML file using a loop


The following scenario describes the creation of an XML file from a sorted flat file gathering a video
collection.

Configuring the source file


Procedure
1. Drop a tFileInputDelimited and a tAdvancedFileOutputXML from the Palette onto the design
workspace.
2. Alternatively, if you configured a description for the input delimited file in the Metadata area of
the Repository, then you can directly drag & drop the metadata entry onto the editor, to set up
automatically the input flow.
3. Right-click on the input component and drag a row main link towards the tAdvancedFileO
utputXML component to implement a connection.
4. Select the tFileInputDelimited component and display the Component settings tab located in the
tab system at the bottom of the Studio.

128
tAdvancedFileOutputXML

5. Select the Property type, according to whether you stored the file description in the Repository or
not. If you dragged & dropped the component directly from the Metadata, no changes to the setti
ng should be needed.
If you didn't setup the file description in the Repository, then select Built-in and manually fill out
the fields displayed on the Basic settings vertical tab.
The input file contains the following type of columns separated by semi-colons: id, name, category,
year, language, director and cast.

In this simple use case, the Cast field gathers different values and the id increments when
changing movie.
6. If needed, define the tFileDelimitedInput schema according to the file structure.

7. Once you checked that the schema of the input file meets your expectation, click on OK to
validate.

129
tAdvancedFileOutputXML

Configuring the XML output and mapping


Procedure
1. Then select the tAdvancedFileOutputXML component and click on the Component settings tab to
configure the basic settings as well as the mapping. Note that a double-click on the component
will open directly the mapping interface.

2. In the File Name field, browse to the file to be written if it exists or type in the path and file name
that needs to be created for the output.
By default, the schema (file description) is automatically propagated from the input flow. But you
can edit it if you need.
3. Then click on the three-dot button or double-click on the tAdvancedFileOutputXML component
on the design workspace to open the dedicated mapping editor.
To the left of the interface, are listed the columns from the input file description.
4. To the right of the interface, set the XML tree panel to reflect the expected XML structure output.
You can create the structure node by node. For more information about the manual creation of an
XML tree, see Defining the XML tree on page 125.
In this example, an XML template is used to populate the XML tree automatically.
5. Right-click on the root tag displaying by default and select Import XML tree at the end of the
contextual menu options.
6. Browse to the XML file to be imported and click OK to validate the import operation.

Note:
You can import an XML tree from files in XML, XSD and DTD formats.

7. Then drag & drop each column name from the Schema List to the matching (or relevant) XML tree
elements as described in Mapping XML data on page 127.
The mapping is shown as blue links between the left and right panels.

130
tAdvancedFileOutputXML

Finally, define the node status where the loop should take place. In this use case, the Cast being
the changing element on which the iteration should operate, this element will be the loop
element.
Right-click on the Cast element on the XML tree, and select Set as loop element.
8. To group by movie, this use case needs also a group element to be defined.
Right-click on the Movie parent node of the XML tree, and select Set as group element.
The newly defined node status show on the corresponding element lines.
9. Click OK to validate the configuration.
10. Press F6 to execute the Job.

131
tAdvancedFileOutputXML

The output XML file shows the structure as defined.

132
tAggregateRow

tAggregateRow
Receives a flow and aggregates it based on one or more columns.
For each output line, are provided the aggregation key and the relevant result of set operations (min,
max, sum...).
tAggregateRow helps to provide a set of metrics based on values or calculations.

tAggregateRow Standard properties


These properties are used to configure tAggregateRow running in the Standard Job framework.
The Standard tAggregateRow component belongs to the Processing family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Group by Define the aggregation sets, the values of which will be


used for calculations.

  Output Column: Select the column label in the list offered


based on the schema structure you defined. You can add as
many output columns as you wish to make more precise
aggregations.
Ex: Select Country to calculate an average of values for eac
h country of a list or select Country and Region if you want
to compare one country's regions with another country'
regions.

133
tAggregateRow

  Input Column: Match the input column label with your


output columns, in case the output label of the aggregation
set needs to be different.

Operations Select the type of operation along with the value to use for
the calculation and the output field.

  Output Column: Select the destination field in the list.

Function: Select the operator among:


• count: calculates the number of rows
• min: selects the minimum value
• max: selects the maximum value
• avg: calculates the average
• sum: calculates the sum
• first: returns the first value
• last: returns the last value
• list: lists values of an aggregation by multiple keys.
• list (object): lists Java values of an aggregation by
multiple keys
• count (distinct): counts the number of the distinct rows
• standard deviation: calculates the variability of a set of
value.
• union (geometry): makes the union of a set of
Geometry objects
• population standard deviation: calculates the spread of
a data distribution. Use this function if the data to be
calculated is considered a population on its own. This
calculation supports 39 decimal places.
• sample standard deviation: calculates the spread of
a data distribution. Use this function if the data to
be calculated is considered a sample from a larger
population. This calculation supports 39 decimal
places.

  Input column: Select the input column from which the


values are taken to be aggregated.

  Ignore null values: Select the check boxes corresponding


to the names of the columns for which you want the NULL
value to be ignored.

Advanced settings

Delimiter(only for list operation) Enter the delimiter you want to use to separate the different
operations.

Use financial precision, this is the max precision for "sum" Select this check box to use a financial precision. This is a
and "avg" operations, checked option heaps more memory max precision but consumes more memory and slows the
and slower than unchecked. processing.

Warning:
We advise you to use the BigDecimal type for the output in
order to obtain precise results.

134
tAggregateRow

Check type overflow (slower) Checks the type of data to ensure that the Job doesn't
crash.

Check ULP (Unit in the Last Place), ensure that a value will Select this check box to ensure the most precise results
be incremented or decremented correctly, only float and possible for the Float and Double types.
double types. (slower)

tStatCatcher Statistics Check this box to collect the log data at component level.
Note that this check box is not available in the Map/Reduce
version of the component.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component handles flow of data therefore it requires


input and output, hence is defined as an intermediary step.
Usually the use of tAggregateRow is combined with the
tSortRow component.

Aggregating values and sorting data


This example shows you how to use Talend components to aggregate the students' comprehensive
scores and then sort the aggregated scores based on the student names.

Creating a Job for aggregating and sorting data


Create a Job to aggregate the students' comprehensive scores using the tAggregateRow component,
then sort the aggregated data using the tSortRow component, finally display the aggregated and
sorted data on the console.

135
tAggregateRow

Procedure
1. Create a new Job and add a tFixedFlowInput component, a tAggregateRow component, a
tSortRow component, and a tLogRow component by typing their names in the design workspace
or dropping them from the Palette.
2. Link the tFixedFlowInput component to the tAggregateRow component using a Row > Main
connection.
3. Do the same to link the tAggregateRow component to the tSortRow component, and the tSortRow
component to the tLogRow component.

Configuring the Job for aggregating and sorting data


Configure the Job to aggregate the students' comprehensive scores using the tAggregateRow
component and then sort the aggregated data using the tSortRow component.

Procedure
1. Double-click the tFixedFlowInput component to open its Basic settings view.
2. Click the button next to Edit schema to open the schema dialog box and define the schema by
adding two columns, name of String type and score of Double type. When done, click OK to save
the changes and close the schema dialog box.
3. In the Mode area, select Use Inline Content (delimited file) and in the Content field displayed,
enter the following input data:

Peter;92
James;93
Thomas;91
Peter;94
James;96
Thomas;95
Peter;96
James;92
Thomas;98
Peter;95
James;96
Thomas;93
Peter;98
James;97
Thomas;95

4. Double-click the tAggregateRow component to open its Basic settings view.

136
tAggregateRow

5. Click the button next to Edit schema to open the schema dialog box and define the schema by
adding five columns, name of String type, and sum, average, max, and min of Double type.

When done, click OK to save the changes and close the schema dialog box.
6. Add one row in the Group by table by clicking the button below it, and select name from both
the Output column and Input column position column fields to group the input data by the name
column.
7. Add four rows in the Operations table and define the operations to be carried out. In this example,
the operations are sum, average, max, and min. Then select score from all four Input column po
sition column fields to aggregate the input data based on it.
8. Double-click the tSortRow component to open its Basic settings view.

137
tAggregateRow

9. Add one row in the Criteria table and specify the column based on which the sort operation is
performed. In this example, it is the name column. Then select alpha from the sort num or alpha?
column field and asc from the Order asc or desc? column field to sort the aggregated data in
ascending alphabetical order.
10. Double-click the tLogRow component to open its Basic settings view, and then select Table (print
values in cells of a table) in the Mode area for better readability of the result.

Executing the Job to aggregate and sort data


After setting up the Job and configuring the components used in the Job for aggregating and sorting
data, you can then execute the Job and verify the Job execution result.

Procedure
1. Press Ctrl + S to save the Job.
2. Press F6 to execute the Job.

Results
As shown above, the students' comprehensive scores are aggregated and then sorted in ascending
alphabetical order based on the student names.

138
tAggregateSortedRow

tAggregateSortedRow
Aggregates the sorted input data for output column based on a set of operations. Each output column
is configured with many rows as required, the operations to be carried out and the input column from
which the data will be taken for better data aggregation.

tAggregateSortedRow Standard properties


These properties are used to configure tAggregateSortedRow running in the Standard Job framework.
The Standard tAggregateSortedRow component belongs to the Processing family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
flowcharts. Related topic: see Talend Studio User Guide.

Input rows count Specify the number of rows that are sent to the
tAggregateSortedRow component.

Note:
If you specified a Limit for the number of rows to be
processed in the input component, you will have to use
that same limit in the Input rows count field.

139
tAggregateSortedRow

Group by Define the aggregation sets, the values of which will be


used for calculations.

  Output Column: Select the column label in the list offered


based on the schema structure you defined. You can add
as many output columns as you wish to make more precise
aggregations.
Ex: Select Country to calculate an average of values for each
country of a list or select Country and Region if you want
to compare one country's regions with another country'
regions.

  Input Column: Match the input column label with your


output columns, in case the output label of the aggregation
set needs to be different.

Operations Select the type of operation along with the value to use for
the calculation and the output field.

  Output Column: Select the destination field in the list.

Function: Select the operator among:


• count: calculates the number of rows
• min: selects the minimum value
• max: selects the maximum value
• avg: calculates the average
• sum: calculates the sum
• first: returns the first value
• last: returns the last value
• list: lists values of an aggregation by multiple keys.
• list (object): lists Java values of an aggregation by
multiple keys
• count (distinct): counts the number of the distinct rows
• standard deviation: calculates the variability of a set of
value.
• union (geometry): makes the union of a set of
Geometry objects
• population standard deviation: calculates the spread of
a data distribution. Use this function if the data to be
calculated is considered a population on its own. This
calculation supports 39 decimal places.
• sample standard deviation: calculates the spread of
a data distribution. Use this function if the data to
be calculated is considered a sample from a larger
population. This calculation supports 39 decimal
places.

  Input column: Select the input column from which the


values are taken to be aggregated.

  Ignore null values: Select the check boxes corresponding


to the names of the columns for which you want the NULL
value to be ignored.

Advanced settings

tStatCatcher Statistics Check this box to collect the log data at component level.

140
tAggregateSortedRow

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows read by an input component
or transferred to an output component. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component handles flow of data therefore it requires


input and output, hence is defined as an intermediary step.

Sorting and aggregating the input data


This scenario describes a Job that sorts the entries of the input data based on two columns and
displays the sorted data on the console, then aggregates the sorted data based on one column and
displays the aggregated data on the console.

Adding and linking components


Procedure
1. Create a new Job and add the following components by typing their names in the design
workspace or dropping them from the Palette: a tFixedFlowInput component, a tSortRow
component, a tAggregateSortedRow component, and two tLogRow components.
2. Link tFixedFlowInput to tSortRow using a Row > Main connection.
3. Do the same to link tSortRow to the first tLogRow, link the first tLogRow to tAggregateSort
edRow, and link tAggregateSortedRow to the second tLogRow.

141
tAggregateSortedRow

Configuring the components


Sorting the input data

Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.

2. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
four columns: Id and Age of Integer type, and Name and Team of String type.

Click OK to close the schema editor and accept the propagation prompted by the pop-up dialog
box.

142
tAggregateSortedRow

3. In the Mode area, select Use Inline Content(delimited file), and then in the Content field
displayed, enter the input data to be sorted and aggregated. In this example, the input data is as
follows:

1;Thomas;28;Component Team
2;Harry;32;Doc Team
3;John;26;Component Team
4;Nicolas;27;QA Team
5;George;24;Component Team
6;Peter;30;Doc Team
7;Teddy;23;QA Team
8;James;26;Component Team

4. Double-click tSortRow to open its Basic settings view.

5. Click the [+] button below the Criteria table to add as many rows as required and then specify
the sorting criteria in the table. In this example, two rows are added, and the input entries will be
sorted based on the column Team and then the column Age, both in ascending order.
6. Double-click the first tLogRow to open its Basic settings view.

7. In the Mode area, select Table (print values in cells of a table) for better readability of the sorting
result.

Aggregating the sorted data

Procedure
1. Double-click tAggregateSortedRow to open its Basic settings view.

143
tAggregateSortedRow

2. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
five columns: AggTeam of String type, AggCount, MinAge, MaxAge, and AvgAge of Integer type.

Click OK to close the schema editor and accept the propagation prompted by the pop-up dialog
box.
3. In the Input rows count field, enter the exact number of rows of the input data. In this example, it
is 8.
4. Click the [+] button below the Group by table to add as many rows as required and specify the
aggregation set in the table. In this example, the data will be aggregated based on the input
column Team.
5. Click the [+] button below the Operations table to add as many rows as required and specify the
operation to be carried out and the corresponding input column from which the data will be taken
for each output column. In this example, we want to calculate the number of the input entries, the
minimum age, the maximum age, and the average age for each team.
6. Double-click the second tLogRow to open its Basic settings view.

144
tAggregateSortedRow

7. In the Mode area, select Table (print values in cells of a table) for better readability of the sorting
result.

Saving and executing the Job


Procedure
1. Press Ctrl + S to save the Job.
2. Execute the Job by pressing F6 or clicking Run on the Run tab.

As shown above, the input entries are sorted based on the column Team and then the column Age,
both in ascending order, and the sorted entries are then aggregated based on the column Team.

145
tAmazonAuroraClose

tAmazonAuroraClose
Closes an active connection to an Amazon Aurora database instance to release the occupied
resources.

tAmazonAuroraClose Standard properties


These properties are used to configure tAmazonAuroraClose running in the Standard Job framework.
The Standard tAmazonAuroraClose component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component List Select the tAmazonAuroraConnection component that


opens the connection you need to close from the list.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is more commonly used with other Amazon
Aurora components, especially with the tAmazonAuroraC
onnection and tAmazonAuroraCommit components.

146
tAmazonAuroraClose

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenario
For a related scenario, see Handling data with Amazon Aurora on page 156.

147
tAmazonAuroraCommit

tAmazonAuroraCommit
Commits in one go a global transaction instead of doing that on every row or every batch, and
provides gain in performance, using a unique connection.
tAmazonAuroraCommit validates the data processed through the Job into the connected Amazon
Aurora database.

tAmazonAuroraCommit Standard properties


These properties are used to configure tAmazonAuroraCommit running in the Standard Job framework.
The Standard tAmazonAuroraCommit component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component List Select the tAmazonAuroraConnection component for which


you want the commit action to be performed.

Close Connection This check box is selected by default and it allows you
to close the database connection once the commit is
done. Clear this check box to continue to use the selected
connection after the component has performed its task.

Warning:
If you want to use a Row > Main connection to link
tAmazonAuroraCommit to your Job, your data will be
committed row by row. In this case, do not select the Close
Connection check box or your connection will be closed
before the end of your first row commit.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

148
tAmazonAuroraCommit

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is more commonly used with other Amazon
Aurora components, especially with the tAmazonAuroraC
onnection and tAmazonAuroraRollback components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenario
For a related scenario, see Handling data with Amazon Aurora on page 156.

149
tAmazonAuroraConnection

tAmazonAuroraConnection
Opens a connection to an Amazon Aurora database instance that can then be reused by other Amazon
Aurora components.

tAmazonAuroraConnection Standard properties


These properties are used to configure tAmazonAuroraConnection running in the Standard Job
framework.
The Standard tAmazonAuroraConnection component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Either Built-In or Repository.

  Built-In: No property data stored centrally.

  Repository: Select the repository file in which the properties


are stored. The database connection fields that follow are
completed automatically using the data retrieved.

Host Type in the IP address or hostname of the Amazon Aurora


database.

Port Type in the listening port number of the Amazon Aurora


database.

Database Type in the name of the database you want to use.

Additional JDBC parameters Specify additional connection properties for the database
connection you are creating.

Username and Password Type in the database user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.

150
tAmazonAuroraConnection

This option is incompatible with the Use dynamic job and


Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.
This check box is not available when the Specify a data
source alias check box is selected.

Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box disappears when the Use or register a shared
DB Connection check box is selected.

Data source alias Type in the alias of the data source created on the Talend
Runtime side.
This field appears only when the Specify a data source alias
check box is selected.

Advanced settings

Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

151
tAmazonAuroraConnection

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule This component is more commonly used with other Amazon
Aurora components, especially with the tAmazonAuroraC
ommit and tAmazonAuroraRollback components.

Related scenario
For a related scenario, see Handling data with Amazon Aurora on page 156.

152
tAmazonAuroraInput

tAmazonAuroraInput
Reads an Amazon Aurora database and extracts fields based on a query.
tAmazonAuroraInput executes a database query with a strictly defined order which must correspond
to the schema definition. Then it passes on the field list to the next component via a Row >Main link.

tAmazonAuroraInput Standard properties


These properties are used to configure tAmazonAuroraInput running in the Standard Job framework.
The Standard tAmazonAuroraInput component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Either Built-In or Repository.

  Built-In: No property data stored centrally.

  Repository: Select the repository file in which the properties


are stored. The database connection fields that follow are
completed automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Type in the IP address or hostname of the Amazon Aurora


database.

153
tAmazonAuroraInput

Port Type in the listening port number of the Amazon Aurora


database.

Database Type in the name of the database you want to use.

Username and Password Type in the database user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

  Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Table Name Type in the name of the table to be read.

Query Type and Query Enter the database query paying particularly attention to
the proper sequence of the fields in order to match the
schema definition.

Guess Query Click the button to generate the query which corresponds to
the table schema in the Query field.

Guess schema Click the button to retrieve the schema from the table.

Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box disappears when the Use an existing
connection check box is selected.

Data source alias Type in the alias of the data source created on the Talend
Runtime side.

154
tAmazonAuroraInput

This field appears only when the Specify a data source alias
check box is selected.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the database
connection you are creating. When you need to handle
data of the time-stamp type 0000-00-00 00:00:00 using
this component, set the parameter to noDatetimeStri
ngSync=true&zeroDateTimeBehavior=convertT
oNull.
This field disappears when the Use an existing connection
check box in the Basic settings view is selected.

Enable stream Select this check box to enable streaming over buffering
which allows the code to read from a large table without
consuming a large amount of memory in order to optimize
the performance.

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Select the check box(es) in the Trim column to remove
leading and trailing whitespace from the corresponding
column(s).
This option disappears when the Trim all the String/Char
columns check box is selected.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

155
tAmazonAuroraInput

Usage

Usage rule This component is usually used as a start component of a


Job or subJob and it needs an output link.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Handling data with Amazon Aurora


This scenario describes a Job that writes the user information into Amazon Aurora, and then reads the
information in Amazon Aurora and displays it on the console.

156
tAmazonAuroraInput

The scenario requires the following seven components:


• tAmazonAuroraConnection: opens a connection to Amazon Aurora.
• tFixedFlowInput: defines the user information data structure, and sends the data to the next
component.
• tAmazonAuroraOutput: writes the data it receives from the preceding component into Amazon
Aurora.
• tAmazonAuroraCommit: commits in one go the data processed to Amazon Aurora.
• tAmazonAuroraInput: reads the data from Amazon Aurora.
• tLogRow: displays the data it receives from the preceding component on the console.
• tAmazonAuroraClose: closes the connection to Amazon Aurora.

157
tAmazonAuroraInput

Adding and linking the components


Procedure
1. Create a new Job and add seven components listed previously by typing their names in the design
workspace or dropping them from the Palette.
2. Connect tFixedFlowInput to tAmazonAuroraOutput using a Row > Main connection.
3. Do the same to connect tAmazonAuroraInput to tLogRow.
4. Connect tAmazonAuroraConnection to tFixedFlowInput using a Trigger > OnSubjobOk connection.
5. Do the same to connect tFixedFlowInput to tAmazonAuroraCommit, tAmazonAuroraCommit to
tAmazonAuroraInput, and tAmazonAuroraInput to tAmazonAuroraClose.

Configuring the components


Opening a connection to Amazon Aurora

Procedure
1. Double-click tAmazonAuroraConnection to open its Basic settings view.

2. In the Host, Port, Database, Username and Password fields, enter the information required for the
connection to Amazon Aurora.

Writing the data into Amazon Aurora

Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.

158
tAmazonAuroraInput

2. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
three columns: id of Integer type, and name and city of String type.

Click OK to validate the changes and accept the propagation prompted by the pop-up dialog box.
3. In the Mode area, select Use Inline Content (delimited file) and enter the following user
information in the Content field.

1;George;Bismarck
2;Abraham;Boise
3;Taylor;Nashville
4;William;Jefferson City
5;Alexander;Jackson
6;James;Boise
7;Gerald;Little Rock
8;Tony;Richmond
9;Thomas;Springfield
10;Andre;Nashville

4. Double-click tAmazonAuroraOutput to open its Basic settings view.

159
tAmazonAuroraInput

5. Select the Use an existing connection check box and in the Component List that appears, select
the connection component you have configured.
6. In the Table field, enter or browse to the table into which you want to write the data. In this
example, it is TalendUser.
7. Select Drop table if exists and create from the Action on table drop-down list, and select Insert
from the Action on data drop-down list.
8. Double-click tAmazonAuroraCommit to open its Basic settings view.

9. Clear the Close Connection check box if it is selected.

Retrieving the data from Amazon Aurora

Procedure
1. Double-click tAmazonAuroraInput to open its Basic settings view.

2. Select the Use an existing connection check box and in the Component List that appears, select
the connection component you have configured.

160
tAmazonAuroraInput

3. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
three columns: id of Integer type, and name and city of String type. The data structure is same as
the structure you have defined for tFixedFlowInput.
4. In the Table Name field, enter or browse to the table into which you write the data. In this
example, it is TalendUser.
5. Click the Guess Query button to generate the query. The Query field will be filled with the autom
atically generated query.
6. Double-click tLogRow to open its Basic settings view.

7. In the Mode area, select Table (print values in cells of a table) for better readability of the result.

Closing the connection to Amazon Aurora

Procedure
1. Double-click tAmazonAuroraClose to open its Basic settings view.

2. In the Component List, select the connection component you have configured.

Saving and executing the Job


Procedure
1. Press Ctrl + S to save the Job.
2. Press F6 or click Run on the Run tab to run the Job.

161
tAmazonAuroraInput

As shown above, the user information is written into Amazon Aurora, and then the data is retrie
ved from Amazon Aurora and displayed on the console.

162
tAmazonAuroraOutput

tAmazonAuroraOutput
Writes, updates, makes changes or suppresses entries in an Amazon Aurora database.
tAmazonAuroraOutput executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.

tAmazonAuroraOutput Standard properties


These properties are used to configure tAmazonAuroraOutput running in the Standard Job framework.
The Standard tAmazonAuroraOutput component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Either Built-In or Repository.

  Built-In: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The database connection fields that
follow are completed automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Type in the IP address or hostname of the Amazon Aurora


database.

163
tAmazonAuroraOutput

Port Type in the listening port number of the Amazon Aurora


database.

Database Type in the name of the database you want to use.

Username and Password Type in the database user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Type in the name of the table to be written. Note that only
one table can be written at a time.

Action on table On the table defined, you can perform one of the following
operations:
• None: No operation is carried out.
• Drop and create table: The table is removed and
created again.
• Create table: The table does not exist and gets
created.
• Create table if not exists: The table is created if it does
not exist.
• Drop table if exists and create: The table is removed if
it already exists and created again.
• Clear table: The table content is deleted.
• Truncate table: The table content is quickly deleted.
However, you will not be able to rollback the
operation.

Action on data On the data of the table defined, you can perform one of the
following operations:
• Insert: Add new entries to the table. If duplicates are
found, the job stops.
• Update: Make changes to existing entries.
• Insert or update: Insert a new record. If the record with
the given reference already exists, an update would be
made.
• Update or insert: Update the record with the given
reference. If the record does not exist, a new record
would be inserted.
• Delete: Remove entries corresponding to the input
flow.
• Replace: Add new entries to the table. If an old row
in the table has the same value as a new row for a
PRIMARY KEY or a UNIQUE index, the old row is
deleted before the new row is inserted.
• Insert or update on duplicate key or unique index: Add
entries if the inserted value does not exist or update
entries if the inserted value already exists and there is
a risk of violating a unique index or primary key.
• Insert Ignore: Add only new rows to prevent duplicate
key errors.

164
tAmazonAuroraOutput

Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You
can do that by clicking Edit schema and selecting the check
box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings
view where you can simultaneously define primary keys for
the update and delete operations. To do that: Select the
Use field options check box and then in the Key in update
column, select the check boxes next to the column name on
which you want to base the update operation. Do the same
in the Key in delete column for the deletion operation.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

  Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box disappears when the Use an existing
connection check box is selected.

Data source alias Type in the alias of the data source created on the Talend
Runtime side.

165
tAmazonAuroraOutput

This field appears only when the Specify a data source alias
check box is selected.

Die on error This check box is selected by default. Clear the check box to
skip the row in error and complete the process for error-free
rows. If needed, you can retrieve the rows in error via a Row
> Rejects link.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the database
connection you are creating.
This field disappears when the Use an existing connection
check box in the Basic settings view is selected.

Note:
You can press Ctrl+Space to access a list of predefined
global variables.

Extend Insert Select this check box to carry out a bulk insert of a defined
set of lines instead of inserting lines one by one. The gain in
system performance is considerable.
This check box appears only when the Insert option is
selected from the Action on data list in the Basic settings
view.

Note:
This option is not compatible with the Reject link. You
should therefore clear the check box if you are using a
Row > Rejects link with this component.

Number of rows per insert Enter the number of rows to be inserted per operation. Note
that the higher the value specified, the lower performance
levels shall be due to the increase in memory demands.
This field appears only when the Extend Insert check box is
selected.

Use Batch Select this check box to activate the batch mode for data
processing.
This check box is available only when the Update or Delete
option is selected from the Action on data list in the Basic
settings view.

Batch Size Specify the number of records to be processed in each


batch.
This field appears only when the Use batch mode check box
is selected.

Commit every Enter the number of rows to be included in a batch before


it is committed to the database. This option ensures
transaction quality (but not rollback) and, above all, a
higher performance level.

Additional columns This option allows you to call SQL functions to perform
actions on columns, provided that these are not insert,
update or delete actions, or actions that require pre-

166
tAmazonAuroraOutput

processing. This option is not available if you have


just created the database table (even if you delete it
beforehand). Click the [+] button under the table to add
column(s), and set the following parameters for each co
lumn.
• Name: Type in the name of the schema column to be al
tered or inserted.
• SQL expression: Type in the SQL statement to be ex
ecuted in order to alter or insert the data in the c
orresponding column.
• Position: Select Before, After or Replace depending on
the action to be performed on the reference column.
• Reference column: Type in a reference column that
tAmazonAuroraOutput can use to locate or replace the
new column or the column to be modified.

Use field options Select the check box for the corresponding column to
customize a request, particularly if multiple actions are
being carried out on the data.
• Key in update: Select the check box for the c
orresponding column based on which the data is up
dated.
• Key in delete: Select the check box for the c
orresponding column based on which the data is de
leted.
• Updatable: Select the check box if the data in the c
orresponding column can be updated.
• Insertable: Select the check box if the data in the c
orresponding column can be inserted.

Use Hint Options Select this check box to configure the hint(s) which can help
you optimize a query's execution.

Hint Options Click the [+] button under the table to add hint(s) and set
the following parameters for each hint. This table appears
only when the Use Hint Options check box is selected.
• HINT: Specify the hint you need, using the syntax /*+
*/.
• POSITION: Specify where you put the hint in an SQL s
tatement.
• SQL STMT*: Select an SQL statement INSERT, UPDATE,
or DELETE you need to use.

Debug query mode Select this check box to display each step during processing
entries in a database.

Use duplicate key update mode insert Select this check box to activate the ON DUPLICATE KEY
UPDATE mode, and then click the [+] button under the
table displayed to add column(s) to be updated and specify
the update action to be performed on the corresponding
column.
• Column: Enter the name of the column to be updated.
• Value: Enter the action to be performed on the column.
This check box is available only when the Insert option is
selected from the Action on data list in the Basic settings
view.

167
tAmazonAuroraOutput

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component must be used as an output component. It


allows you to carry out actions on a table or on the data of
a table in an Amazon Aurora database. It also allows you to
create a reject flow using a Row > Rejects link to filter data
in error. For a similar scenario, see .

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different

168
tAmazonAuroraOutput

MySQL databases using dynamically loaded connection


parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenario
For a related scenario, see Handling data with Amazon Aurora on page 156.

169
tAmazonAuroraRollback

tAmazonAuroraRollback
Rolls back any changes made in the Amazon Aurora database to prevent partial transaction commit if
an error occurs.

tAmazonAuroraRollback Standard properties


These properties are used to configure tAmazonAuroraRollback running in the Standard Job
framework.
The Standard tAmazonAuroraRollback component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component List Select the tAmazonAuroraConnection component for which


you want the rollback action to be performed.

Close Connection This check box is selected by default and it allows you
to close the database connection once the rollback is
done. Clear this check box to continue to use the selected
connection after the component has performed its task.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

170
tAmazonAuroraRollback

Usage

Usage rule This component is more commonly used with other Amazon
Aurora components, especially with the tAmazonAuroraC
onnection and tAmazonAuroraCommit components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related Scenario
No scenario is available for the Standard version of this component yet.

171
tAmazonEMRListInstances

tAmazonEMRListInstances
Lists the details about the instance groups in a cluster on Amazon EMR (Elastic MapReduce).

tAmazonEMRListInstances Standard properties


These properties are used to configure tAmazonEMRListInstances running in the Standard Job
framework.
The Standard tAmazonEMRListInstances component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Access key and Secret key Specify the access keys (the access key ID in the Access
Key field and the secret access key in the Secret Key field)
required to access the Amazon Web Services. For more
information on AWS access keys, see Access keys (access key
ID and secret access key).
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. These credentials can be used on Amazon
EC2 instances, and are delivered through the Amazon
EC2 metadata service. To use this option, your Job must
be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.

Assume role If you temporarily need some access permissions associated


to an AWS IAM role that is not granted to your user account,
select this check box to assume that role. Then specify
the values for the following parameters to create a new
assumed role session.

Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks
(for example "us-east-1"). For more information about how to
specify the AWS region, see Choose an AWS Region.

Filter master and core instances Select this check box to ignore the master and core instance
groups and list only the task instance groups.

Cluster id Enter the ID of the cluster for which you want to list the
instance groups.

Advanced settings

STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,

172
tAmazonEMRListInstances

sts.amazonaws.com, where session credentials are


retrieved from.
This check box is available only when the Assume role
check box is selected.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables CURRENT_GROUP_ID: the ID of the current instance group.


This is an After variable and it returns a string.
CURRENT_GROUP_NAME: the name of the current instance
group. This is an After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tAmazonEMRListInstances is usually used as a start


component of a Job or subJob.

Related scenario
No scenario is available for the Standard version of this component yet.

173
tAmazonEMRManage

tAmazonEMRManage
Launches or terminates a cluster on Amazon EMR (Elastic MapReduce).

tAmazonEMRManage Standard properties


These properties are used to configure tAmazonEMRManage running in the Standard Job framework.
The Standard tAmazonEMRManage component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Access key and Secret key Specify the access keys (the access key ID in the Access
Key field and the secret access key in the Secret Key field)
required to access the Amazon Web Services. For more
information on AWS access keys, see Access keys (access key
ID and secret access key).
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. The credentials can be used on Amazon EC2
instances or AWS ECS, and are delivered through the
Amazon EC2 metadata service. To use this option, your Job
must be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.

Assume role If you temporarily need some access permissions associated


to an AWS IAM role that is not granted to your user account,
select this check box to assume that role. Then specify
the values for the following parameters to create a new
assumed role session.

Action Select an action to be performed from the list, either Start


or Stop.
• Start: launch an Amazon EMR cluster.
• Stop: terminate an Amazon EMR cluster.

Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks
(for example "us-east-1"). For more information about how to
specify the AWS region, see Choose an AWS Region.

Cluster name Enter the name of the cluster.

Cluster version Select the version of the cluster.


You can also select the Customize Version and Application
check box on the Advanced settings view to customize the
cluster version information.

174
tAmazonEMRManage

This property is not available when the Customize Version


and Application check box is selected.

Application Select the applications to be installed on the cluster.


You can also select the Customize Version and Application
check box on the Advanced settings view to customize the
applications information.
This property is available when an EMR version is selected
from the Cluster version list and the Customize Version and
Application check box is cleared.

Service role Enter the IAM (Identity and Access Management) role for the
Amazon EMR service. The default role is EMR_DefaultRole.
To use this default role, you must have already created it.

Job flow role Enter the IAM role for the EC2 instances that Amazon EMR
manages. The default role is EMR_EC2_DefaultRole. To use
this default role, you must have already created it.

Enable log Select this check box to enable logging and in the field
displayed specify the path to a folder in an S3 bucket where
you want Amazon EMR to write the log data.

Use EC2 key pair Select this check box to associate an Amazon EC2 (Elastic
Compute Cloud) key pair with the cluster and in the field
displayed enter the name of your EC2 key pair.

Predicate Specify the cluster(s) that you want to stop:


• All running clusters: all running clusters will be
stopped.
• All running clusters with predefined name: the running
cluster with a given name will be stopped. In the
Cluster name field displayed, you need to specify the
name of the cluster to be stopped.
• Running cluster with predefined id: the running cluster
with a given ID will be stopped. In the Cluster id field
displayed, you need to specify the ID of the cluster to
be stopped.
This list is available only when Stop is selected from the
Action list.

Instance count Enter the number of Amazon EC2 instances to initialize.

Master instance type Select the type of the master instance to initialize.

Slave instance type Select the type of the slave instance to initialize.

Advanced settings

STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,
sts.amazonaws.com, where session credentials are
retrieved from.
This check box is available only when the Assume role
check box is selected.

175
tAmazonEMRManage

Wait for cluster ready Select this check box to let your Job wait until the launch of
the cluster is completed.

Visible to all users Select this check box to make the cluster visible to all IAM
users.

Termination Protect Select this check box to enable termination protection to


prevent instances in the cluster from shutting down due to
errors or issues during processing.

Enable debug Select this check box to enable the debug mode.

Customize Version and Application Select this check box to customize the version of the cluster
and the applications to be installed on the cluster.
• Cluster version: enter the version of the cluster.
• Applications: click the [+] button below the table
to add as many rows as needed, each row for an
application, and specify the application by clicking
the right side of the cell and selecting the application
from the drop-down list displayed, or just entering the
application name in the cell if it is not in the list.

Subnet id Specify the identifier of the Amazon VPC (Virtual Private


Cloud) subnet where you want the job flow to launch.

Availability Zone Specify the availability zone for your cluster's EC2 instances.

Master security group Specify the security group for the master instance.

Additional master security groups Specify additional security groups for the master instance
and separate them with a comma, for example, gname1,
gname2, gname3.

Slave security group Specify the security group for the slave instances.

Additional slave security groups Specify additional security groups for the slave instances
and separate them with a comma, for example, gname1,
gname2, gname3.

Service Access Security Group Specify the identifier of the Amazon EC2 security group for
the Amazon EMR service to access clusters in VPC private
subnet.
For how to create a private subnet to enable service access
security group on Amazon EMR, see Scenario 2: VPC with
Public and Private Subnets (NAT).

Actions Specify the bootstrap actions associated with the cluster, by


clicking the [+] button below the table to add as many rows
as needed, each row for a bootstrap action, and setting the
following parameters for each action:
• Name: enter the name of the bootstrap action.
• Script location: specify the location of the script run
by the bootstrap action, for example, s3://ap-northe
ast-1.elasticmapreduce/bootstrap-actions/run-if.
• Arguments: enter the list of command line arguments
(separated by commas) passed to the bootstrap action
script, for example, "arg0","arg1","arg2".

176
tAmazonEMRManage

For more information about the bootstrap actions, see


BootstrapActionConfig.

Steps Specify the job flow step(s) to be invoked on the cluster


after its launch, by clicking the [+] button below the table
to add as many rows as needed, each row for a step, and
setting the following parameters for each step:
• Name: enter the name of the job flow step.
• Action on Failure: click the cell and from the drop-
down list select the action to take if the job flow step
fails.
• Main Class: enter the name of the main class in the
specified Java file. If not specified, the JAR file should
specify a Main-Class in its manifest file.
• Jar: enter the path to the JAR file run during the step,
for example, "s3://inputjar/test.jar".
• Args: enter the list of command line arguments
(separated by commas) passed to the JAR file's main
function when executed, for example, "arg0","arg1",
"arg2".
For more information about the job flow steps, see
StepConfig.

Keep alive after steps complete Select this check box to keep the job flow alive after
completing all steps.

Wait for steps to complete Select this check box to let your Job wait until the job flow
steps are completed.
This check box is available only when the Wait for cluster
ready check box is selected.

Properties Specify the classification and property information supplied


to the configuration object of the EMR cluster to be created,
by clicking the [+] button below the table to add as many
rows as needed, each row for a property, and setting the
following parameters:
• Classification: specify the classification of the
configuration.
• Key: enter the key of the property.
• Value: enter the value of the property.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

CLUSTER_FINAL_ID The ID of the cluster. This is an After variable and it returns


a string.

CLUSTER_FINAL_NAME The name of the cluster. This is an After variable and it


returns a string.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

177
tAmazonEMRManage

Usage

Usage rule tAmazonEMRManage is usually used as a standalone


component.

Managing an Amazon EMR cluster


Here's an example of using Talend components to manage an Amazon EMR cluster.

Creating an Amazon EMR cluster management Job


Create a Job to start a new Amazon EMR cluster, then resize the cluster, and finally list the ID and
name information of the instance groups in the cluster.

Procedure
1. Create a new Job and add a tAmazonEMRManage component, a tAmazonEMRResize component, a
tAmazonEMRListInstances component, and a tJava component by typing their names in the design
workspace or dropping them from the Palette.
2. Link the tAmazonEMRManage component to the tAmazonEMRResize component using a Trigger >
OnSubjobOk connection.
3. Link the tAmazonEMRResize component to the tAmazonEMRListInstances component using a
Trigger > OnSubjobOk connection.
4. Link the tAmazonEMRListInstances component to the tJava component using a Row > Iterate
connection.

Starting a new Amazon EMR cluster


Configure the tAmazonEMRManage component to start a new Amazon EMR cluster.

Procedure
1. Double-click the tAmazonEMRManage component to open its Basic settings view.

178
tAmazonEMRManage

2. In the Access Key and Secret Key fields, enter the authentication credentials required to access
Amazon S3.
3. From the Action list, select Start to start a cluster.
4. Select the AWS region from the Region drop-down list. In this example, it is Asia Pacific (Tokyo).
5. In the Cluster name field, enter the name of the cluster to be started. In this example, it is talend-
doc-emr-cluster.
6. From the Cluster version and Application drop-down list, select the version of the cluster and the
application to be installed on the cluster.
7. Select the Enable log check box and in the field displayed, specify the path to a folder in an S3
bucket where you want Amazon EMR to write the log data. In this example, it is s3://talend-doc-
emr-bucket.

Resizing the Amazon EMR cluster by adding a new task instance group
Configure the tAmazonEMRResize component to resize a running Amazon EMR cluster by adding a
new task instance group.

Procedure
1. Double-click the tAmazonEMRResize component to open its Basic settings view.

179
tAmazonEMRManage

2. In the Access Key and Secret Key fields, enter the authentication credentials required to access
Amazon S3.
3. From the Action drop-down list, select Add task instance group to resize the cluster by adding a
new task instance group.
4. In the Cluster id field, enter the ID of the cluster to be resized. In this example, the returned value
of the global variable CLUSTER_FINAL_ID of the previous tAmazonEMRManage component is used.
Note that you can retrieve the global variable by pressing Ctrl + Space and selecting the relevant
global variable from the list.
5. In the Group name field, enter the name of the task instance group to be added in the cluster. In
this example, it is talend-doc-instance-group.
6. In the Instance count field, specify the number of the instances to be created.
7. From the Task instance type drop-down list, select the type of the instances to be created.

Listing the instance groups in the Amazon EMR cluster


Configure the tAmazonEMRListInstances component and the tJava component to retrieve and display
the ID and name information of all instance groups in a running cluster.

Procedure
1. Double-click the tAmazonEMRListInstances component to open its Basic settings view.

2. In the Access Key and Secret Key fields, enter the authentication credentials required to access
Amazon S3.
3. Select the AWS region from the Region drop-down list. In this example, it is Asia Pacific (Tokyo).
4. Clear the Filter master and core instances check box to list all instance groups, including the
Master, Core, and Task type instance groups.
5. In the Cluster id field, enter the ID of the cluster for which to list the instance groups. In
this example, the returned value of the global variable CLUSTER_FINAL_ID of the previous
tAmazonEMRManage component is used.
6. Double-click the tJava component to open its Basic settings view.

180
tAmazonEMRManage

7. In the Code field, enter the following code to print the ID and Name information of each instance
group in the cluster.

System.out.println("\r\n===== Instance Group =====");


System.out.println("Instance Group ID: " + (String)globalMap.get("tAmaz
onEMRListInstances_1_CURRENT_GROUP_ID"));
System.out.println("Instance Group Name: " + (String)globalMap.get("tAmaz
onEMRListInstances_1_CURRENT_GROUP_NAME"));

Executing the Job to manage the Amazon EMR cluster


After setting up the Job and configuring the components used in the Job for managing Amazon EMR
cluster, you can then execute the Job and verify the Job execution result.

Procedure
1. Press Ctrl + S to save the Job and then F6 to execute the Job.

As shown above, the Job starts and resizes the Amazon EMR cluster, and then lists all instance
groups in the cluster.
2. View the cluster details on the Amazon EMR Cluster List page to validate the Job execution result.

181
tAmazonEMRResize

tAmazonEMRResize
Adds or resizes a task instance group in a cluster on Amazon EMR (Elastic MapReduce).

tAmazonEMRResize Standard properties


These properties are used to configure tAmazonEMRResize running in the Standard Job framework.
The Standard tAmazonEMRResize component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Access key and Secret key Specify the access keys (the access key ID in the Access
Key field and the secret access key in the Secret Key field)
required to access the Amazon Web Services. For more
information on AWS access keys, see Access keys (access key
ID and secret access key).
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. These credentials can be used on Amazon
EC2 instances, and are delivered through the Amazon
EC2 metadata service. To use this option, your Job must
be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.

Assume role If you temporarily need some access permissions associated


to an AWS IAM role that is not granted to your user account,
select this check box to assume that role. Then specify
the values for the following parameters to create a new
assumed role session.

Action Select an action to be performed from the drop-down list.


• Add task instance group: add a task instance group in a
cluster.
• Resize task instance group: resize a task instance group
in a cluster.

Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks
(for example "us-east-1"). For more information about how to
specify the AWS region, see Choose an AWS Region.

Cluster id Enter the ID of the cluster to be resized.

Group name Enter the name of the task instance group to be added.
This field is available only when Add task instance group is
selected from the Action drop-down list.

182
tAmazonEMRResize

Group id Enter the ID of the task instance group to be resized.


This field is available only when Resize task instance group
is selected from the Action drop-down list.

Instance count Enter the number of instances for the task instance group.

Task instance type Select an instance type for all instances in the task instance
group to be added from the drop-down list.
This list is available only when Add task instance group is
selected from the Action drop-down list.

Request spot Select this check box to launch Spot instances, and in the
Bid price($) field displayed, enter the maximum hourly rate
(in dollars) you are willing to pay per instance.
This check box is available only when Add task instance
group is selected from the Action drop-down list.

Advanced settings

STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,
sts.amazonaws.com, where session credentials are
retrieved from.
This check box is available only when the Assume role
check box is selected.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables TASK_GROUP_ID: the ID of the task instance group. This is


an After variable and it returns a string.
TASK_GROUP_NAME: the name of the task instance group.
This is an After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tAmazonEMRResize is usually used as a standalone


component.

183
tAmazonEMRResize

Related scenario
No scenario is available for the Standard version of this component yet.

184
tAmazonMysqlClose

tAmazonMysqlClose
Closes the transaction committed in the connected DB.

tAmazonMysqlClose Standard properties


These properties are used to configure tAmazonMysqlClose running in the Standard Job framework.
The Standard tAmazonMysqlClose component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tAmazonMysqlConnection component in the list


if more than one connection are planned for the current Job.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is to be used along with AmazonMysql


components, especially with tAmazonMysqlConnection and
tAmazonMysqlCommit.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database

185
tAmazonMysqlClose

connection dynamically from multiple connections planned


in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenarios
No scenario is available for the Standard version of this component yet.

186
tAmazonMysqlCommit

tAmazonMysqlCommit
Commits in one go a global transaction instead of doing that on every row or every batch, and
provides gain in performance, using a unique connection.
tAmazonMysqlCommit validates the data processed through the Job into the connected database.

tAmazonMysqlCommit Standard properties


These properties are used to configure tAmazonMysqlCommit running in the Standard Job framework.
The Standard tAmazonMysqlCommit component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tAmazonMysqlConnection component in the list


if more than one connection are planned for the current job.

Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.

Warning:
If you want to use a Row > Main connection to link
tAmazonMysqlCommit to your Job, your data will be
committed row by row. In this case, do not select the Close
connection check box or your connection will be closed
before the end of your first row commit.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

187
tAmazonMysqlCommit

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is more commonly used with other


tAmazonMysql* components, especially with the
tAmazonMysqlConnection and tAmazonMysqlRollback
components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenario
For tAmazonMysqlCommit related scenario, see Inserting data in mother/daughter tables on page
2426.

188
tAmazonMysqlConnection

tAmazonMysqlConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tAmazonMysqlConnection opens a connection to the database for a current transaction.

tAmazonMysqlConnection Standard properties


These properties are used to configure tAmazonMysqlConnection running in the Standard Job
framework.
The Standard tAmazonMysqlConnection component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

DB Version MySQL 5 is available.

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database

189
tAmazonMysqlConnection

connection components from different Job levels that can


be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.

Advanced settings

Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is more commonly used with other


tAmazonMysql* components, especially with the
tAmazonMysqlCommit and tAmazonMysqlRollback
components.

190
tAmazonMysqlConnection

Related scenario
For a related scenario using this component, see Inserting data in mother/daughter tables on page
2426

191
tAmazonMysqlInput

tAmazonMysqlInput
Reads a database and extracts fields based on a query.
tAmazonMysqlInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Row > Main link.

tAmazonMysqlInput Standard properties


These properties are used to configure tAmazonMysqlInput running in the Standard Job framework.
The Standard tAmazonMysqlInput component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

DB Version MySQL 5 is available.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address.

192
tAmazonMysqlInput

Port Listening port number of DB server.

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Table Name Name of the table to be read.

Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

Note:
When you need to handle data of the time-stamp type
0000-00-00 00:00:00 using this component, set the
parameter as:
noDatetimeStringSync=true&zeroDa-
teTimeBehavior=convertToNull.

Enable stream Select this check box to enables streaming over buffering
which allows the code to read from a large table without
consuming a large amount of memory in order to optimize
the performance.

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Remove leading and trailing whitespace from defined


columns.

Note: Deselect Trim all the String/Char columns to


enable Trim columns in this field.

193
tAmazonMysqlInput

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component covers all possible SQL queries for Mysql
databases.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenarios
For related scenarios, see tMysqlInput on page 2437.

194
tAmazonMysqlOutput

tAmazonMysqlOutput
Writes, updates, makes changes or suppresses entries in a database.
tAmazonMysqlOutput executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.

tAmazonMysqlOutput Standard properties


These properties are used to configure tAmazonMysqlOutput running in the Standard Job framework.
The Standard tAmazonMysqlOutput component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

DB Version MySQL 5 is available.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address.

195
tAmazonMysqlOutput

Port Listening port number of DB server.

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create table: The table is removed and created
again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear table: The table content is deleted.
Truncate table: The table content is quickly deleted.
However, you will not be able to rollback the operation.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
the job stops.
Update: Make changes to existing entries.
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.
Replace: Add new entries to the table. If an old row in the
table has the same value as a new row for a PRIMARY KEY
or a UNIQUE index, the old row is deleted before the new
row is inserted.
Insert or update on duplicate key or unique index: Add
entries if the inserted value does not exist or update entries
if the inserted value already exists and there is a risk of
violating a unique index or primary key.
Insert Ignore: Add only new rows to prevent duplicate key
errors.

196
tAmazonMysqlOutput

Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You can
do that by clicking Edit Schema and selecting the check
box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings
view where you can simultaneously define primary keys for
the update and delete operations. To do that: Select the
Use field options check box and then in the Key in update
column, select the check boxes next to the column name on
which you want to base the update operation. Do the same
in the Key in delete column for the deletion operation.

Schema and Edit schema A schema is a row description, i.e. it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Die on error This check box is selected by default. Clear the check box to
skip the row in error and complete the process for error-free
rows. If needed, you can retrieve the rows in error via a Row
> Rejects link.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

Note:
You can press Ctrl+Space to access a list of predefined
global variables.

Extend Insert Select this check box to carry out a bulk insert of a defined
set of lines instead of inserting lines one by one. The gain in
system performance is considerable.

Number of rows per insert: enter the number of rows to


be inserted per operation. Note that the higher the value
specidied, the lower performance levels shall be due to the
increase in memory demands.

197
tAmazonMysqlOutput

Note:
This option is not compatible with the Reject link. You
should therefore clear the check box if you are using a
Row > Rejects link with this component.

Warning:
If you are using this component with tMysqlLastInsertID, en
sure that the Extend Insert check box in Advanced Settings
is not selected. Extend Insert allows for batch loading,
however, if the check box is selected, only the ID of the last
line of the last batch will be returned.

Use Batch Select this check box to activate the batch mode for data
processing.

Note:
This check box is available only when you have selected,
the Update or the Delete option in the Action on data
field.

Batch Size Specify the number of records to be processed in each


batch.
This field appears only when the Use batch mode check box
is selected.

Commit every Number of rows to be included in the batch before it is


committed to the DB. This option ensures transaction
quality (but not rollback) and, above all, a higher
performance level.

Additional Columns This option is not available if you have just created the DB
table (even if you delete it beforehand). This option allows
you to call SQL functions to perform actions on columns,
provided that these are not insert, update or delete actions,
or actions that require pre-processing.

  Name: Type in the name of the schema column to be


altered or inserted.

  SQL expression: Type in the SQL statement to be executed


in order to alter or insert the data in the corrsponding
column.

  Position: Select Before, Replace or After, depending on the


action to be performed on the reference column.

  Reference column: Type in a reference column that tAma


zonMysqlOutput can use to locate or replace the new
column, or the column to be modified.

Use field options Select this check box to customize a request, particularly if
multiple actions are being carried out on the data.

Use Hint Options Select this check box to activate the hint configuration area
which helps you optimize a query's execution. In this area,
parameters are:

198
tAmazonMysqlOutput

- HINT: specify the hint you need, using the syntax

/*+ */.

- POSITION: specify where you put the hint in a SQL


statement.
- SQL STMT: select the SQL statement you need to use.

Debug query mode Select this check box to display each step during processing
entries in a database.

Use duplicate key update mode insert Updates the values of the columns specified, in the event of
duplicate primary keys.:
Column: Between double quotation marks, enter the name
of the column to be updated.
Value: Enter the action you want to carry out on the column.

Note:
To use this option you must first of all select the Insert
mode in the Action on data list found in the Basic
Settings view.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
QUERY: the query statement processed. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

199
tAmazonMysqlOutput

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a MySQL database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tAmazonMysqlOutput in use, see .

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenarios
For related scenarios, see tMysqlSCD on page 2508.

200
tAmazonMysqlRollback

tAmazonMysqlRollback
Cancels the transaction commit in the connected database and avoids to commit part of a transaction
involuntarily.

tAmazonMysqlRollback Standard properties


These properties are used to configure tAmazonMysqlRollback running in the Standard Job
framework.
The Standard tAmazonMysqlRollback component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tAmazonMysqlConnection component in the list


if more than one connection are planned for the current job.

Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

201
tAmazonMysqlRollback

Usage

Usage rule This component is more commonly used with other


tAmazonMysql* components, especially with the
tAmazonMysqlConnection and tAmazonMysqlCommit
components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenario
For a related scenario, see Rollback from inserting data in mother/daughter tables on page 2429.

202
tAmazonMysqlRow

tAmazonMysqlRow
Executes the SQL query stated onto the specified database.
Depending on the nature of the query and the database, tAmazonMysqlRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write easily
your SQL statements. tAmazonMysqlRow is the specific component for this database query. The row
suffix means the component implements a flow in the job design although it does not provide output.

tAmazonMysqlRow Standard properties


These properties are used to configure tAmazonMysqlRow running in the Standard Job framework.
The Standard tAmazonMysqlRow component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

DB Version MySQL 5 is available.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

203
tAmazonMysqlRow

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description, that is to say, it defines the
number of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Table Name Name of the table to be processed.

Query type Either Built-in or Repository.

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if

204
tAmazonMysqlRow

you have selected the Use an existing connection check box


in the Basic settings.

Propagate QUERY's recordset Select this check box to insert the result of the query in a
COLUMN of the current flow. Select this column from the
use column list.

Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component is
usually followed by tParseRecordSet.

Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.

Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

205
tAmazonMysqlRow

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenario
For a related scenario, see:
• Combining two flows for selective output on page 2503

206
tAmazonOracleClose

tAmazonOracleClose
Closes the transaction committed in the connected database.

tAmazonOracleClose Standard properties


These properties are used to configure tAmazonOracleClose running in the Standard Job framework.
The Standard tAmazonOracleClose component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tAmazonOracleConnection component in the list


if more than one connection are planned for the current Job.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is to be used along with AmazonOracle


components, especially with tAmazonOracleConnection and
tAmazonOracleCommit.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database

207
tAmazonOracleClose

connection dynamically from multiple connections planned


in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenario
This component is to be used with tAmazonOracleConnection and tAmazonOracleRollback
components. It is generally used with a tAmazonOracleConnection to close a connection for the
ongoing transaction.
For a related scenario, see tMysqlConnection on page 2425.

208
tAmazonOracleCommit

tAmazonOracleCommit
Commits in one go a global transaction instead of doing that on every row or every batch, and
provides gain in performance, using a unique connection.
tAmazonOracleCommit validates the data processed through the Job into the connected database.

tAmazonOracleCommit Standard properties


These properties are used to configure tAmazonOracleCommit running in the Standard Job framework.
The Standard tAmazonOracleCommit component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tAmazonOracleConnection component in the list


if more than one connection are planned for the current job.

Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.

Warning:
If you want to use a Row > Main connection to link
tAmazonOracleCommit to your Job, your data will be
committed row by row. In this case, do not select the Close
connection check box or your connection will be closed
before the end of your first row commit.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

209
tAmazonOracleCommit

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is more commonly used with other


tAmazonOracle* components, especially with the
tAmazonOracleConnection and tAmazonOracleRollback
components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenario
For tAmazonOracleCommit related scenario, see Inserting data in mother/daughter tables on page
2426

210
tAmazonOracleConnection

tAmazonOracleConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tAmazonOracleConnection opens a connection to the database for a current transaction.

tAmazonOracleConnection Standard properties


These properties are used to configure tAmazonOracleConnection running in the Standard Job
framework.
The Standard tAmazonOracleConnection component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Connection type Drop-down list of available drivers:


Oracle SID: Select this connection type to uniquely identify
a particular database on a system.

DB Version Oracle 11-5 is available.

Use tns file Select this check box to use the metadata of a context
included in a tns file.

Note:
One tns file may have many contexts.

TNS File: Enter the path to the tns file manually or browse
to the file by clicking the three-dot button next to the filed.
Select a DB Connection in Tns File: Click the three-dot
button to display all the contexts held in the tns file and
select the desired one.

Host Database server IP address.

Port Listening port number of DB server.

211
tAmazonOracleConnection

Database Name of the database.

Schema Name of the schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating.

Note:
You can set the encoding parameters through this field.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.

Advanced settings

Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable

212
tAmazonOracleConnection

and it returns a string. This variable functions only if the


Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is more commonly used with other


tAmazonOracle* components, especially with the
tAmazonOracleCommit and tAmazonOracleRollback
components.

Related scenario
For tAmazonOracleConnection related scenario, see tMysqlConnection on page 2425

213
tAmazonOracleInput

tAmazonOracleInput
Reads a database and extracts fields based on a query.
tAmazonOracleInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Row > Main link.

tAmazonOracleInput Standard properties


These properties are used to configure tAmazonOracleInput running in the Standard Job framework.
The Standard tAmazonOracleInput component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Connection type Drop-down list of available drivers:


Oracle SID: Select this connection type to uniquely identify
a particular database on a system.

DB Version Select the Oracle version in use.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

214
tAmazonOracleInput

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

Oracle schema Oracle schema name.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description, i.e. it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Table name Database table name.

Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Use cursor When selected, helps to decide the row set to work with at a
time and thus optimize performance.

215
tAmazonOracleInput

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Remove leading and trailing whitespace from defined


columns.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component covers all possible SQL queries for Oracle
databases.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the

216
tAmazonOracleInput

missing JARs for this particular component by clicking the


Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related scenarios, see:
• Reading data from different MySQL databases using dynamically loaded connection parameters
on page 497.

217
tAmazonOracleOutput

tAmazonOracleOutput
Writes, updates, makes changes or suppresses entries in a database.
tAmazonOracleOutput executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.

tAmazonOracleOutput Standard properties


These properties are used to configure tAmazonOracleOutput running in the Standard Job framework.
The Standard tAmazonOracleOutput component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Connection type Drop-down list of available drivers:


Oracle SID: Select this connection type to uniquely identify
a particular database on a system.

218
tAmazonOracleOutput

DB Version Select the Oracle version in use.

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Oracle schema Name of the Oracle schema.

Table Name of the table to be written. Note that only one table
can be written at a time.

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.

Warning:
If you select the Use an existing connection check box
and select an option other than None from the Action
on table list, a commit statement will be generated
automatically before the data update/insert/delete
operation.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.

219
tAmazonOracleOutput

Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.

Schema and Edit schema A schema is a row description, i.e. it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Use alternate schema Select this option to use a schema other than the one
specified by the component that establishes the database
connection (that is, the component selected from the
Component list drop-down list in Basic settings view).
After selecting this option, provide the name of the desired
schema in the Schema field.
This option is available when Use an existing connection is
selected in Basic settings view.

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

220
tAmazonOracleOutput

Note:
You can press Ctrl+Space to access a list of predefined
global variables.

Override any existing NLS_LANG environment variable Select this check box to override variables already set for a
NLS language environment.

Commit every Enter the number of rows to be completed before


committing batches of rows together into the DB. This
option ensures transaction quality (but not rollback) and,
above all, better performance at execution.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.

  Name: Type in the name of the schema column to be


altered or inserted as new column.

  SQL expression: Type in the SQL statement to be executed


in order to alter or insert the relevant column data.

  Position: Select Before, Replace or After following the


action to be performed on the reference column.

  Reference column: Type in a column of reference that the


tDBOutput can use to place or replace the new or altered
column.

Use field options Select this check box to customize a request, especially
when there is double action on data.

Use Hint Options Select this check box to activate the hint configuration area
which helps you optimize a query's execution. In this area,
parameters are:
- HINT: specify the hint you need, using the syntax

/*+ */.

- POSITION: specify where you put the hint in a SQL


statement.
- SQL STMT: select the SQL statement you need to use.

Convert columns and table to uppercase Select this check box to set the names of columns and table
in upper case.

Debug query mode Select this check box to display each step during processing
entries in a database.

Use Batch Select this check box to activate the batch mode for data
processing.

Batch Size Specify the number of records to be processed in each


batch.

221
tAmazonOracleOutput

This field appears only when the Use batch mode check box
is selected.

Support null in "SQL WHERE" statement Select this check box to validate null in "SQL WHERE"
statement.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
QUERY: the query statement processed. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a Oracle database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For such an example, see Retrieving data in error with a
Reject link on page 2474.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.

222
tAmazonOracleOutput

The Dynamic settings table is available only when the Use


an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For tAmazonOracleOutput related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.

223
tAmazonOracleRollback

tAmazonOracleRollback
Cancels the transaction commit in the connected database and avoids to commit part of a transaction
involuntarily.

tAmazonOracleRollback Standard properties


These properties are used to configure tAmazonOracleRollback running in the Standard Job
framework.
The Standard tAmazonOracleRollback component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tAmazonOracleConnection component in the list


if more than one connection are planned for the current job.

Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

224
tAmazonOracleRollback

Usage

Usage rule This component is more commonly used with other


tAmazonOracle* components, especially with the
tAmazonOracleConnection and tAmazonOracleCommit
components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenario
For tAmazonOracleRollback related scenario, see tMysqlRollback on page 2491.

225
tAmazonOracleRow

tAmazonOracleRow
Executes the SQL query stated onto the specified database.
Depending on the nature of the query and the database, tAmazonOracleRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write easily
your SQL statements. tAmazonOracleRow is the specific component for this database query. The row
suffix means the component implements a flow in the job design although it does not provide output.

tAmazonOracleRow Standard properties


These properties are used to configure tAmazonOracleRow running in the Standard Job framework.
The Standard tAmazonOracleRow component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Connection type Drop-down list of available drivers.

226
tAmazonOracleRow

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description, i.e. it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Query type Either Built-in or Repository .

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Use NB_LINE_ This option allows you feed the variable with the number
of rows inserted/updated/deleted to the next component or
subJob. This field only applies if the query entered in Query
field is a INSERT, UPDATE or DELETE query.
• NONE: does not feed the variable.
• INSERTED: feeds the variable with the number of rows
inserted.
• UPDATED: feeds the variable with the number of rows
updated.
• DELETED: feeds the variable with the number of rows
deleted.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

227
tAmazonOracleRow

Advanced settings

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.

Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.

Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

228
tAmazonOracleRow

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenarios
For related topics, see:
• Combining two flows for selective output on page 2503
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.

229
tAmazonRedshiftManage

tAmazonRedshiftManage
Manages Amazon Redshift clusters and snapshots.
tAmazonRedshiftManage manages the work of creating a new Amazon Redshift cluster, creating a
snapshot of an Amazon Redshift cluster, resizing an existing Amazon Redshift cluster, and deleting an
existing cluster or snapshot.

tAmazonRedshiftManage Standard properties


These properties are used to configure tAmazonRedshiftManage running in the Standard Job
framework.
The Standard tAmazonRedshiftManage component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products.

Basic settings

Access Key and Secret Key Specify the access keys (the access key ID in the Access
Key field and the secret access key in the Secret Key field)
required to access the Amazon Web Services. For more
information on AWS access keys, see Access keys (access key
ID and secret access key).
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. These credentials can be used on Amazon
EC2 instances, and are delivered through the Amazon
EC2 metadata service. To use this option, your Job must
be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.

Assume role If you temporarily need some access permissions associated


to an AWS IAM role that is not granted to your user account,
select this check box to assume that role. Then specify
the values for the following parameters to create a new
assumed role session.

Action Select an action to be performed from the list.


• Create cluster: create a new Amazon Redshift cluster.
• Delete cluster: delete a previously provisioned Amazon
Redshift cluster.
• Resize cluster: resize an existing Amazon Redshift
cluster.
• Restore from snapshot: create a new Amazon Redshift
cluster from a snapshot.
• Delete snapshot: delete the specified manual snapshot.

Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks

230
tAmazonRedshiftManage

(e.g. "us-east-1") in the list. For more information about the


supported AWS regions where you can provision an Amazon
Redshift cluster, see Regions and Endpoints.

Create snapshot Select this check box to create a final snapshot of the
Amazon Redshift cluster before it is deleted.
This check box is available only when Delete cluster is
selected from the Action list.

Snapshot id Enter the identifier of the snapshot.


This field is available when:
• Delete cluster is selected from the Action list and the
Create snapshot check box is selected.
• Restore from snapshot or Delete snapshot is selected
from the Action list.

Cluster id Enter the ID of the cluster.


This field is available when Create cluster, Delete cluster,
Resize cluster, or Restore from snapshot is selected from the
Action list.

Database Enter the name of the first database to be created when the
cluster is created.
This field is available only when Create cluster is selected
from the Action list.

Port Enter the port number on which the cluster accepts


connections.
This field is available when Create cluster or Restore from
snapshot is selected from the Action list.

Master username and Master password The user name and the password associated with the master
user account for the cluster to be created.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
The two fields are available only when Create cluster is
selected from the Action list.

Node type Select the node type for the cluster.


This list is available when Create cluster, Resize cluster, or
Restore from snapshot is selected from the Action list.

Node count Enter the number of compute nodes in the cluster.


This field is available only when Create cluster or Resize
cluster is selected from the Action list.

Advanced settings

STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,
sts.amazonaws.com, where session credentials are
retrieved from.

231
tAmazonRedshiftManage

This check box is available only when the Assume role


check box is selected.

Wait for cluster ready Select this check box to let your Job wait until the launch of
the cluster is completed.
This check box is available when Create cluster or Restore
from snapshot is selected from the Action list.

Original cluster id of snapshot Enter the name of the cluster the source snapshot was
created from.
This field is available when Restore from snapshot or Delete
snapshot is selected from the Action list.

Parameter group name Enter the name of the parameter group to be associated
with the cluster.
This field is available when Create cluster or Restore from
snapshot is selected from the Action list.

Subnet group name Enter the name of the subnet group where you want the
cluster to be restored.
This field is available when Create cluster or Restore from
snapshot is selected from the Action list.

Publicly accessible Select this check box so that the cluster can be accessed
from a public network.
This check box is available when Create cluster or Restore
from snapshot is selected from the Action list.

Set public ip address Select this check box and in the field displayed enter the
Elastic IP (EIP) address for the cluster.
This check box is available only when the Publicly
accessible check box is selected.

Availability zone Enter the EC2 Availability Zone in which you want Amazon
Redshift to provision the cluster.
This field is available when Create cluster or Restore from
snapshot is selected from the Action list.

VPC security group ids Enter Virtual Private Cloud (VPC) security groups to be
associated with the cluster and separate them with a
comma, for example, gname1, gname2, gname3.
This field is available when Create cluster or Restore from
snapshot is selected from the Action list.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables CLUSTER_FINAL_ID: the ID of the cluster. This is an After


variable and it returns a string.
ENDPOINT: the endpoint address of the cluster. This is an
After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the

232
tAmazonRedshiftManage

Die on error check box is cleared, if the component has this


check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tAmazonRedshiftManage is usually used as a standalone


component.

Related scenario
No scenario is available for the Standard version of this component yet.

233
tApacheLogInput

tApacheLogInput
Reads the access-log file for an Apache HTTP server.
To effectively manage the Apache HTTP Server, it is necessary to get feedback about the activity and
performance of the server as well as any problems that may be occurring.

tApacheLogInput Standard properties


These properties are used to configure tApacheLogInput running in the Standard Job framework.
The Standard tApacheLogInput component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file where the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
In the context of tApacheLogInput usage, the schema is rea
d-only.

  Built-in: You can create the schema and store it locally


for this component. Related topic: see Talend Studio User
Guide.

  Repository: You have already created and stored the


schema in the Repository. You can reuse it in various
projects and Job flowcharts. Related topic: see Talend Studio
User Guide.

File Name Name of the file and/or the variable to be processed.


For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.

Advanced settings

Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.

234
tApacheLogInput

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tApacheLogInput can be used with other components or as


a standalone component. It allows you to create a data flow
using a Row > Main connection, or to create a reject flow to
filter specified data using a Row > Reject connection. For an
example of how to use these two links, see Procedure on
page 975.

Reading an Apache access-log file


The following scenario creates a two-component Job, which aims at reading the access-log file for an
Apache HTTP server and displaying the output in the Run log console.

Procedure
Procedure
1. Drop a tApacheLogInput component and a tLogRow component from the Palette onto the design
workspace.
2. Right-click on the tApacheLogInput component and connect it to the tLogRow component using
a Main Row link.

3. In the design workspace, select tApacheLogInput.

235
tApacheLogInput

4. Click the Component tab to define the basic settings for tApacheLogInput.

5. If desired, click the Edit schema button to see the read-only columns.
6. In the File Name field, enter the file path or browse to the access-log file you want to read.
7. In the design workspace, select tLogRow and click the Component tab to define its basic settings.
For more information, see tLogRow on page 1977
8. Press F6 to execute the Job.

Results
The log lines of the defined file are displayed on the console.

236
tAS400Close

tAS400Close
Closes the transaction committed in the connected database.

tAS400Close Standard properties


These properties are used to configure tAS400Close running in the Standard Job framework.
The Standard tAS400Close component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tAS400Connection component in the list if more


than one connection are planned for the current Job.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is to be used along with AS/400


components, especially with tAS400Connection and
tAS400Commit.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

237
tAS400Close

Related scenario
No scenario is available for the Standard version of this component yet.

238
tAS400Commit

tAS400Commit
Commits in one go a global transaction instead of doing that on every row or every batch, and
provides gain in performance, using a unique connection.
tAS400Commit validates the data processed through the Job into the connected database.

tAS400Commit Standard properties


These properties are used to configure tAS400Commit running in the Standard Job framework.
The Standard tAS400Commit component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tAS400Connection component in the list if more


than one connection are planned for the current Job.

Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.

Warning:
If you want to use a Row > Main connection to link
tAS400Commit to your Job, your data will be committed
row by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other tAS400*
components, especially with the tAS400Connection and
tAS400Rollback components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces

239
tAS400Commit

s database tables having the same data structure but in


different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenario
For a similar scenario using other database, see Inserting data in mother/daughter tables on page
2426.

240
tAS400Connection

tAS400Connection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tAS400Connection opens a connection to the database for a current transaction.

tAS400Connection Standard properties


These properties are used to configure tAS400Connection running in the Standard Job framework.
The Standard tAS400Connection component belongs to the Databases and the ELT families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

DB Version Select the AS/400 version in use

Host Database server IP address

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together

241
tAS400Connection

with a tRunJob component with either of these two options


enabled will cause your Job to fail.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.

Usage

Usage rule This component is more commonly used with other tAS400*
components, especially with the tAS400Commit and
tAS400Rollback components.

Related scenario
For similar scenarios using other database, see tMysqlConnection on page 2425.

242
tAS400Input

tAS400Input
Reads a database and extracts fields based on a query.
tAS400Input executes a DB query with a strictly defined order which must correspond to the schema
definition. Then it passes on the field list to the next component via a Row > Main link.

tAS400Input Standard properties


These properties are used to configure tAS400Input running in the Standard Job framework.
The Standard tAS400Input component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.

243
tAS400Input

For more information about setting up and storing database


connection parameters, see Talend Studio User Guide.

DB Version Select the AS 400 version in use

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

244
tAS400Input

Trim column Remove leading and trailing whitespace from defined


columns.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Handling data with AS/400


This scenario describes a Job that writes the user information into AS/400, and then reads the
information in AS/400 and displays it on the console.

245
tAS400Input

Adding and linking the components


Procedure
1. Create a new Job and add a tFixedFlowInput component, a tAS400Output component, a
tAS400Input component, and a tLogRow component by typing their names in the design
workspace or dropping them from the Palette.
2. Connect tFixedFlowInput to tAS400Output using a Row > Main connection.
3. Do the same to connect tAS400Input to tLogRow.
4. Connect tFixedFlowInput to tAS400Input using a Trigger > OnSubjobOk connection.

Configuring the components


Writing the data into AS/400

Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.

2. Click the [...] button next to Edit schema and in the Schema dialog box define the schema by
adding three columns: id of Integer type, and name and city of String type.

246
tAS400Input

Click OK to close the Schema dialog box and accept the propagation prompted by the pop-up
dialog box.
3. In the Mode area, select Use Inline Content (delimited file) and enter the following user
information in the Content field.

1;George;Bismarck
2;Abraham;Boise
3;Taylor;Nashville
4;William;Jefferson City
5;Alexander;Jackson
6;James;Boise
7;Gerald;Little Rock
8;Tony;Richmond
9;Thomas;Springfield
10;Andre;Nashville

4. Double-click tAS400Output to open its Basic settings view.

5. In the Host, Database, Username and Password fields, enter the information required for the
connection to AS/400.
6. In the Table field, specify the table into which you want to write the data. In this example, it is
doct1018.
7. Select Drop table if exists and create from the Action on table drop-down list, and select Insert
from the Action on data drop-down list.

247
tAS400Input

Retrieving the data from AS/400

Procedure
1. Double-click tAS400Input to open its Basic settings view.

2. In the Host, Database, Username and Password fields, enter the information required for the
connection to AS/400.
3. Click the [...] button next to Edit schema and in the Schema dialog box define the schema by
adding three columns: id of Integer type, and name and city of String type. The data structure is
same as the structure you have defined for tFixedFlowInput.
4. In the Table Name field, enter or browse to the table into which you write the data. In this
example, it is doct1018.
5. In the Query field, enter the SQL query sentence to be used to retrieve the user data from AS/400.
In this example, it is SELECT * FROM doct1018.
6. Double-click tLogRow to open its Basic settings view.

7. In the Mode area, select Table (print values in cells of a table) for better readability of the result.

Saving and executing the Job


Procedure
1. Press Ctrl + S to save the Job.

248
tAS400Input

2. Press F6 or click Run on the Run tab to run the Job.

As shown above, the user information is written into AS/400, and then the data is retrieved from
AS/400 and displayed on the console.

Related scenarios
For similar scenarios using other databases, see:
Related topic in tContextLoad, see Reading data from different MySQL databases using dynamically
loaded connection parameters on page 497.

249
tAS400LastInsertId

tAS400LastInsertId
Obtains the primary key value of the record that was last inserted in an AS/400 table.
tAS400LastInsertId fetches the last inserted ID from a selected AS/400 Connection.

tAS400LastInsertId Standard properties


These properties are used to configure tAS400LastInsertId running in the Standard Job framework.
The Standard tAS400LastInsertId component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.

  Repository: You have already created the schema and


stored it in the Repository. You can reuse it in various
projects and job flow charts. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Component list Select the relevant tAS400Connection component in the list


if more than one connection is planned for the current job.

250
tAS400LastInsertId

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is to be used as an intermediary


component.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenario
For a similar scenario using other database, see Getting the ID for the last inserted record with
tMysqlLastInsertId on page 2455.

251
tAS400Output

tAS400Output
Writes, updates, makes changes or suppresses entries in a database.
tAS400Output executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.

tAS400Output Standard properties


These properties are used to configure tAS400Output running in the Standard Job framework.
The Standard tAS400Output component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

DB Version Select the AS/400 version in use

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

252
tAS400Output

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.

 
Action on data

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

253
tAS400Output

When the schema to be reused has default values that are


integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Use commit control Select this check box to have access to the Commit every
field where you can define the commit operation.

Commit every: Enter the number of rows to be completed


before committing batches of rows together into the DB.
This option ensures transaction quality (but not rollback)
and, above all, better performance at execution.

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

Note:
You can press Ctrl+Space to access a list of predefined
global variables.

Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.

  Name: Type in the name of the schema column to be


altered or inserted as new column

  SQL expression: Type in the SQL statement to be executed


in order to alter or insert the relevant column data.

254
tAS400Output

  Position: Select Before, Replace or After following the


action to be performed on the reference column.

  Reference column: Type in a column of reference that the


tDBOutput can use to place or replace the new or altered
column.

Use field options Select this check box to customize a request, especially
when there is double action on data.

Debug query mode Select this check box to display each step during processing
entries in a database.

Use Batch Select this check box to activate the batch mode for data
processing.

Note:
This check box is available only when you have selected
the Insert, Update or Delete option in the Action on data
field.

Batch Size Specify the number of records to be processed in each


batch.
This field appears only when the Use batch mode check box
is selected.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a AS/400 database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different

255
tAS400Output

MySQL databases using dynamically loaded connection


parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenarios
For related scenario, see Handling data with AS/400 on page 245.
For similar scenarios using other databases, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.

256
tAS400Rollback

tAS400Rollback
Cancels the transaction commit in the connected database and avoids to commit part of a transaction
involuntarily.

tAS400Rollback Standard properties


These properties are used to configure tAS400Rollback running in the Standard Job framework.
The Standard tAS400Rollback component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tAS400Connection component in the list if more


than one connection are planned for the current Job.

Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other tAS400*
components, especially with the tAS400Connection and
tAS400Commit components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection

257
tAS400Rollback

parameters on page 497. For more information on


Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenarios
For a similar scenario using other database, see Rollback from inserting data in mother/daughter
tables on page 2429.

258
tAS400Row

tAS400Row
Executes the SQL query stated onto the specified database.
Depending on the nature of the query and the database, tAS400Row acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL
statements. tAS400Row is the specific component for this database query. The row suffix means the
component implements a flow in the job design although it does not provide output.

tAS400Row Standard properties


These properties are used to configure tAS400Row running in the Standard Job framework.
The Standard tAS400Row component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

DB Version Select the AS/400 version in use

259
tAS400Row

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type Either Built-in or Repository .

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via
a Row > Rejects link.

260
tAS400Row

Advanced settings

Additional JDBC Parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component is
usually followed by tParseRecordSet.

Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.

Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.

261
tAS400Row

The Dynamic settings table is available only when the Use


an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Related scenarios
For similar scenarios using other databases, see:
• Combining two flows for selective output on page 2503.
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.

262
tAssert

tAssert
Generates the boolean evaluation on the concern for the Job execution status and provides the Job
status messages to tAssertCatcher.
The status includes:
• Ok: the Job execution succeeds.
• Fail: the Job execution fails.
The tested Job's result does not match the expectation or an execution error occurred at runtime.
The tAssert component works alongside tAssertCatcher to evaluate the status of a Job execution. It
concludes with the boolean result based on an assertive statement related to the execution and feed
the result to tAssertCatcher for proper Job status presentation.

tAssert Standard properties


These properties are used to configure tAssert running in the Standard Job framework.
The Standard tAssert component belongs to the Logs & Errors family.
The component in this framework is available in all Talend products.

Basic settings

Description Type in your descriptive message to help identify the


assertion of a tAssert.

Expression Type in the assertive statement you base the evaluation on.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component follows the action the assertive condition
is directly related to. It can be the intermediate or end
component of the main Job, or the start, intermediate or end
component of the secondary Job.

263
tAssert

Limitation The evaluation of tAssert is captured only by tAssertCatcher.

Viewing product orders status (on a daily basis) against a


benchmark number
This scenario allows you to insert the orders information into a database table and to evaluate the
orders status (every day once scheduled to run) by using tAssert to compare the orders against a fixed
number and tAssertCatcher to indicate the results. In this case, Ok is returned if the number of orders
is greater than 20 and Failed is returned if the number of orders is less than 20.
In practice, this Job can be scheduled to run every day for the daily orders report and tFixedFlowInput
as well as tLogRow are replaced by input and output components in the Database/File families.

Linking the components


Procedure
1. Drop tFixedFlowInput, tMysqlOutput, tAssert, tAssertCatcher, and tLogRow onto the workspace.
2. Rename tFixedFlowInput as orders, tAssert as orders >=20, tAssertCatcher as catch comparison
result and tLogRow as ok or failed.
3. Link tFixedFlowInput to tMysqlOutput using a Row > Main connection.
4. Link tFixedFlowInput to tAssert using the Trigger > On Subjob OK connection.
5. Link tAssertCatcher to tLogRow using a Row > Main connection.

Configuring the components


Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.

264
tAssert

Select Use Inline Content (delimited file) in the Mode area.


In the Content field, enter the data to write to the Mysql database, for example:

AS2152;Washingto Berry Juice;2013-02-19 11:14:15;3.6


AS2152;Washingto Berry Juice;2013-02-19 12:14:15;3.6
AS2152;Washingto Berry Juice;2013-02-19 13:14:15;3.6
AS2152;Washingto Berry Juice;2013-02-19 14:14:15;3.6
AS2152;Washingto Berry Juice;2013-02-19 12:14:15;3.6
AS2152;Washingto Berry Juice;2013-02-19 12:14:15;3.6
AS2152;Washingto Berry Juice;2013-02-19 12:14:15;3.6
AS2152;Washingto Berry Juice;2013-02-19 12:14:15;3.6
AS2152;Washingto Berry Juice;2013-02-19 12:14:15;3.6
AS2152;Washingto Berry Juice;2013-02-19 12:14:15;3.6
AS2152;Washingto Berry Juice;2013-02-19 12:14:15;3.6
AS2152;Washingto Berry Juice;2013-02-19 12:14:15;3.6
AS2152;Washingto Berry Juice;2013-02-19 12:14:15;3.6

Note that the orders listed are just for illustration of how tAssert functions and the number here is
less than 20.
2. Click the Edit schema button to open the schema editor.

265
tAssert

3. Click the [+] button to add four columns, namely product_id, product_name, date and price, of the
String, Date, Float types respectively.
Click OK to validate the setup and close the editor.
4. Double-click tMysqlOutput to display the Basic settings view.

5. In the Host, Port, Database, Username and Password fields, enter the connection details and the
authentication credentials.
6. In the Table field, enter the name of the table, for example order.
7. In the Action on table list, select the option Drop table if exists and create.
8. In the Action on data list, select the option Insert.
9. Double-click tAssert to display the Basic settings view.

266
tAssert

10. In the description field, enter the descriptive information for the purpose of tAssert in this case.
11. In the expression field, enter the expression allowing you to compare the data to a fixed number:

((Integer)globalMap.get("tMysqlOutput_1_NB_LINE_INSERTED"))>=20

12. Double-click tLogRow to display the Basic settings view.

13. In the Mode area, select Table (print values in cells of a table) for a better display.

Executing the Job


Procedure
1. Press Ctrl + S to save the Job.
2. Press F6 to run the Job.

As shown above, the orders status indicates Failed as the number of orders is less than 20.

Setting up the assertive condition for a Job execution


This scenario describes how to set up an assertive condition in tAssert in order to evaluate that a Job
execution succeeds or not. Moreover, you can also find out how the two different evaluation results
display and the way to read them. Apart from tAssert, the scenario uses the following components as
well:
• tFileInputDelimited and tFileOutputDelimited. The two components compose the main Job of
which the execution status is evaluated. For the detailed information on the two components, see
tFileInputDelimited on page 1015 and tFileOutputDelimited on page 1113.
• tFileCompare. It realizes the comparison between the output file of the main Job and a standard
reference file. The comparative result is evaluated by tAssert against the assertive condition set

267
tAssert

up in its settings. For more detailed information on tFileCompare, see tFileCompare on page
984.
• tAssertCatcher. It captures the evaluation generated by tAssert. For more information on
tAssertCatcher, see tAssertCatcher on page 273.
• tLogRow. It allows you to read the captured evaluation. For more information on tLogRow, see
tLogRow on page 1977.
First proceed as follows to design the main Job:
• Prepare a delimited .csv file as the source file read by your main Job.
• Edit two rows in the delimited file. The contents you edit are not important, so feel free to
simplify them.
• Name it source.csv.
• In Talend Studio , create a new job JobAssertion.
• Place tFileInputDelimited and tFileOutputDelimited on the workspace.
• Connect them with a Row Main link to create the main Job.

• Double-click tFileInputDelimited to open its Component view.


• In the File Name field of the Component view, fill in the path or browse to source.csv.

• Still in the Component view, set Property Type to Built-In and click next to Edit schema to
define the data to pass on to tFileOutputDelimited. In the scenario, define the data presented in
source.csv you created.
For more information about schema types, see Talend Studio User Guide.
• Define the other parameters in the corresponding fields according to source.csv you created.
• Double-click tFileOutputDelimited to open its Component view.
• In the File Name field of the Component view, fill in or browse to specify the path to the output
file, leaving the other fields as they are by default.

268
tAssert

• Press F6 to execute the main Job. It reads source.csv, pass the data to tFileOutputDelimited and
output an delimited file, out.csv.
Then contine to edit the Job to see how tAssert evaluates the execution status of the main Job.
• Rename out.csv as reference.csv.This file is used as the expected result the main Job should output.
• Place tFileCompare, tAssert and tLogRow on the workspace.
• Connect them with Row Main link.
• Connect tFileInputDelimited to tFileCompare with OnSubjobOk link.

• Double-click tFileCompare to open its Component view.


• In the Component view, fill in the corresponding file paths in the File to compare field and the
Reference file field, leaving the other fields as default.

269
tAssert

For more information on the tFileCompare component, see tFileCompare on page 984.
• Then click tAssert and click the Component tab on the lower side of the workspace.

• In the Component view, edit the assertion row2.differ==0 in the expression field and the
descriptive message of the assertion in description field.
In the expression field, row2 is the data flow transmissing from tFileCompare to tAssert, differ
is one of the columns of the tFileCompare schema and presents whether the compared files
are identical, and 0 means no difference is detected between the out.csv and reference.csv by
tFileCompare. Hence when the compared files are identical, the assertive condition is thus fulfilled,
tAssert concludes that the main Job succeeds; otherwise, it concludes failure.

Note:
The differ column is in the read-only tFileCompare schema. For more information on its schema, see
tFileCompare on page 984.

• Press F6 to execute the Job.


• Check the result presented in the Run view

The console shows the comparison result of tFileCompare: Files are identical. But you find
nowhere the evaluation result of tAssert.
So you need tAssertCatcher to capture the evaluation.
• Place tAssertCatcher and tLogRow on the workspace.
• Connect them with Row Main link.

270
tAssert

• Use the default configuration in the Component view of tAssertCatcher.

• Press F6 to execute the Job.


• Check the result presented in the Run view. You will see the Job status information is added in:

2010-01-29 15:37:33|fAvAzH|TASSERT|JobAssertion|java|tAssert_1|Ok|--|
The output file should be identical with the reference file

The descriptive information on JobAssertion in the console is organized according to the


tAssertCatcher schema. This schema includes, in the following order, the execution time, the process
ID, the project name, the Job name, the code language, the evaluation origin, the evaluation result,
detailed information of the evaluation, descriptive message of the assertion. For more information on
the schema of tAssertCatcher, see tAssertCatcher on page 273.
The console indicates that the execution status of Job JobAssertion is Ok. In addition to the evalution,
you can still see other descriptive information about JobAssertion including the descriptive message
you have edited in the Basic settings of tAssert.

271
tAssert

Then you will perform operations to make the main Job fail to generate the expected file. To do so,
proceed as follows in the same Job you have executed:
• Delete a row in reference.csv.
• Press F6 to execute the Job again.
• Check the result presented in Run view.

2010-02-01 19:47:43|GeHJNO|TASSERT|JobAssertion|tAssert_1|Failed|Test
logically failed|The output file should be identical with the reference
file

The console shows that the execution status of the main Job is Failed. The detailed explanation for
this status is closely behind it, reading Test logically failed.
You can thus get a basic idea about your present Job status: it fails to generate the expected file
because of a logical failure. This logical failure could come from a logical mistake during the Job
design.
The status and its explanatory information are presented respectively in the status and the substatus
columns of the tAssertCatcher schema. For more information on the columns, see tAssertCatcher on
page 273.

272
tAssertCatcher

tAssertCatcher
Generates a data flow consolidating the status information of a job execution and transfer the data
into defined output files.
Based on its pre-defined schema, tAssertCatcher fetches the execution status information from
repository, Job execution and tAssert.

tAssertCatcher Standard properties


These properties are used to configure tAssertCatcher running in the Standard Job framework.
The Standard tAssertCatcher component belongs to the Logs & Errors family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the fields to be
processed and passed on to the next component. In this
particular case, the schema is read-only, as this component
gathers standard log information including:

  Moment: Processing time and date.

  Pid: Process ID.

  Project: Project which the job belongs to.

  Job: Job name.

  Language: Language used by the Job (Java)

  Origin: Status evaluation origin. The origin may be different


tAssert components.

  Status: Evaluation fetched from tAssert. They may be


- Ok: if the assertive statement of tAssert is evaluated as
true at runtime.
- Failed: if the assertive statement of tAssert is evaluated
as false or an execution error occurs at runtime. The tested
Job's result does not match the expectation or an execution
error occured at runtime.

  Substatus: Detailed explanation for failed execution. The


explanation can be:
- Test logically failed: the investigated Job does not produce
the expected result.
- Execution error: an execution error occurred at runtime.

  Description: Descriptive message typed in Basic settings of


tAssert (when Catch tAssert is selected) and/or the message
of the exception captured (when Catch Java Exception is
selected).

273
tAssertCatcher

  Exception: The Exception object thrown by the Job, namely


the original exception.
Available when Get original exception is selected.

Catch Java Exception This check box allows to capture Java exception errors and
show the message in the Description column (Get original
exception not selected) or in the Exception column (Get
original exception selected) column, once checked.

Get original exception This check box allows to show the original exception object
in the Exception column, once checked.
Available when Catch Java Exception is selected.

Catch tAssert This check box allows to capture the evaluations of tAssert.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is the start component of a secondary Job


which fetches the execution status information from several
sources. It generates a data flow to transfer the information
to the component which proceeds.

Limitation This component must be used with tAssert together.

Related scenarios
For using case in relation with tAssertCatcher, see tAssert scenario:
• Setting up the assertive condition for a Job execution on page 267

274
tAzureAdlsGen2Input

tAzureAdlsGen2Input
Retrieves data from an ADLS Gen2 file system of an Azure storage account and passes the data to the
subsequent component connected to it through a Main>Row link.

tAzureAdlsGen2Input Standard properties


These properties are used to configure tAzureAdlsGen2Input running in the Standard Job framework.
The Standard tAzureAdlsGen2Input component belongs to the Cloud family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.

Note: If you make changes, the schema automatically


becomes built-in.

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

275
tAzureAdlsGen2Input

Guess schema Click this button to retrieve the schema from the data object
specified.

Authentication method Select one of the following authentication method from the
drop-down list.
• Shared key, which requires an account access key. See
Manage a storage account for related information.
• Shared access signature, which requires a shared
access signature. See Constructing the Account SAS
URI for related information.

Account name Enter the name of the Data Lake Storage account you need
to access. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
account.

Endpoint suffix Enter the Azure Storage service endpoint.


The combination of the account name and the Azure
Storage service endpoint forms the endpoint of the storage
account.

Shared key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access. To know
how to get your key, read Manage a storage account.
This field is available if you select Shared key from
Authentication method drop-down list.

SAS token Enter your account SAS token. You can get the SAS token
for each allowed service on the Microsoft Azure portal
after generating SAS. The SAS token format is https://
<$storagename><$service>.core.windows.net/
<$sastoken>, where <$storagename> is the storage
account name, <$service> is the allowed service name
(blob, file, queue or table), and <$sastoken> is the SAS
token value. For more information, read Constructing the
Account SAS URI.
This field is available if you select Shared access signature
from Authentication method drop-down list.

Check connection Click this button to validate the connection parameters


provided.

Filesystem Enter the name of the target Blob container.


You can also click the ... button to the right of this field and
select the desired Blob container from the list in the dialog
box.

Blobs Path Enter the path to the target blobs.

Format Set the format for the incoming data. Currently, the
following formats are supported: CSV, AVRO, JSON, and
Parquet.

Field Delimiter Set the field delimiter. You can select Semicolon, Comma,
Tabulation, and Space from the drop-down list; you can
also select Other and enter your own in the Custom field
delimiter field.

276
tAzureAdlsGen2Input

Record Separator Set the record separator. You can select LF, CR, and CRLF
from the drop-down list; you can also select Other and enter
your own in the Custom Record Separator field.

Text Enclosure Character Enter the character used to enclose text.

Escape character Enter the character of the row to be escaped.

Header Select this check box to insert a header row to the data
retrieved.

Note:
• Select this option if the data to be retrieved has a
header row. In this case, you need also to make sure
that the column names in the schema are consistent
with the column headers of the data.
• Clear this option if the data to be retrieved does not
have a header row. In this case, you need to name
the columns in the schema as field0, field1,
field2, and so on.

File Encoding Select the file encoding from the drop-down list.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

NB_LINE The number of rows successfully processed. This is an After


variable and it returns an integer.

Usage

Usage rule This component is usually used as a start component of a


Job or subJob and it always needs an output link.

Related scenario
For a related scenario, see Accessing Azure ADLS Gen2 storage on page 280.

277
tAzureAdlsGen2Output

tAzureAdlsGen2Output
Uploads incoming data to an ADLS Gen2 file system of an Azure storage account in the specified
format.

tAzureAdlsGen2Output Standard properties


These properties are used to configure tAzureAdlsGen2Output running in the Standard Job framework.
The Standard tAzureAdlsGen2Output component belongs to the Cloud family.
The component in this framework is available in all subscription-based Talend products with Big Data
and Talend Data Fabric.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.

Note: If you make changes, the schema automatically


becomes built-in.

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

278
tAzureAdlsGen2Output

Sync colnmns Click this button to retrieve the schema from the previous
component connected in the Job.

Authentication method Select one of the following authentication method from the
drop-down list.
• Shared key, which requires an account access key. See
Manage a storage account for related information.
• Shared access signature, which requires a shared
access signature. See Constructing the Account SAS
URI for related information.

Account name Enter the name of the Data Lake Storage account you need
to access. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
account.

Endpoint suffix Enter the Azure Storage service endpoint.


The combination of the account name and the Azure
Storage service endpoint forms the endpoint of the storage
account.

Shared key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access. To know
how to get your key, read Manage a storage account.
This field is available if you select Shared key from
Authentication method drop-down list.

SAS token Enter your account SAS token. You can get the SAS token
for each allowed service on the Microsoft Azure portal
after generating SAS. The SAS token format is https://
<$storagename><$service>.core.windows.net/
<$sastoken>, where <$storagename> is the storage
account name, <$service> is the allowed service name
(blob, file, queue or table), and <$sastoken> is the SAS
token value. For more information, read Constructing the
Account SAS URI.
This field is available if you select Shared access signature
from Authentication method drop-down list.

Check connection Click this button to validate the connection parameters


provided.

Filesystem Enter the name of the target Blob container.


You can also click the ... button to the right of this field and
select the desired Blob container from the list in the dialog
box.

Blobs Path Enter the path to the target blobs.

Format Set the format for the incoming data. Currently, the
following formats are supported: CSV, AVRO, JSON, and
Parquet.

Field Delimiter Set the field delimiter. You can select Semicolon, Comma,
Tabulation, and Space from the drop-down list; you can
also select Other and enter your own in the Custom field
delimiter field.

279
tAzureAdlsGen2Output

Record Separator Set the record separator. You can select LF, CR, and CRLF
from the drop-down list; you can also select Other and enter
your own in the Custom Record Separator field.

Text Enclosure Character Enter the character used to enclose text.

Escape character Enter the character of the row to be escaped.

Header Select this check box to insert a header row to the data. The
schema column names will be used as column headers.

File Encoding Select the file encoding from the drop-down list.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Max batch size Set the maximum number of lines allowed in each batch.
Do not change the default value unless you are facing
performance issues. Increasing the batch size can improve
the performance but a value too high could cause Job
failures.

Blob Template Name Enter a string as the name prefix for the Blob files
generated. The name of a Blob file generated will be the
name prefix followed by another string.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

NB_LINE The number of rows successfully processed. This is an After


variable and it returns an integer.

Usage

Usage rule This component is usually used as an end component of a


Job or subJob and it always needs an input link.

Accessing Azure ADLS Gen2 storage


This scenario demonstrates the use of the tAzureAdlsGen2Output and tAzureAdlsGen2Input
components. In the first subJob, a tFixedFlowInput component passes data to tAzureAdlsGen2Output,
which then uploads the data to Azure ADLS Gen2 storage; in the second subJob, tAzureAdlsGen2Input
reads the data and passes it to tLogRow.

280
tAzureAdlsGen2Output

In this scenario, the following data is uploaded and then retrieved.

1;James
2;Josephine
3;Donette
4;Simona
5;Mitsue
6;Leota

This scenario requires an Azure storage user account with permissions for reading and writing files.
Optionally, you can monitor the data using Microsoft Azure Storage Explorer, a utility for managing
your Azure storage resources. Check Azure Storage Explorer for related information.

Accessing Azure ADLS Gen2 storage: establishing the Job


Procedure
1. Create a standard Job and drop tFixedFlowInput, tAzureAdlsGen2Output, tAzureAdlsGen2Input,
and tLogRow onto the workspace.
2. Connect tFixedFlowInput and tAzureAdlsGen2Output using the Row > Main link.
3. Connect tAzureAdlsGen2Input and tLogRow using the Row > Main link.
4. Connect tFixedFlowInput and tAzureAdlsGen2Input using the RowTrigger > OnSubjobOk link.

Accessing Azure ADLS Gen2 storage: setting up the Job


Procedure
1. In the Basic settings view of tFixedFlowInput:
• Click the Edit schema button and add two columns: id (type Integer) and name (type String);
• Select Use Inline Content(delimited file) and enter the following into the Content field.

1;James
2;Josephine
3;Donette
4;Simona
5;Mitsue
6;Leota

• Leave other options as they are.


2. In the Basic settings view of tAzureAdlsGen2Output:
• Click the Edit schema button and add two columns: id (type Integer) and name (type String);
• Provide your Azure storage user account credentials in the Authentication method, Account
name, Endpoint suffix, and Shared key.
• Validate your Azur storage user account by clicking Check connection.

281
tAzureAdlsGen2Output

• Enter the name of an existing Blob container in Filesystem. You can also click ... to the right of
this field and select the Blob container from the list in the dialog box.
• In Blobs Path, enter the name of the directory where you want to put the data.
• Select CSV for Format; Semicolon for Field Delimiter; and CRLF for Record Separator. Select
the Header option.
• Leave other options as they are.
3. In the Advanced settings view of tAzureAdlsGen2Input, enter the prefix for the Blob files
generated in the Blob Template Name field (data- in this example).
4. Do exact the same described in step 2 for the tAzureAdlsGen2Input component. Be sure to
propagate the schema to the subsequent component when prompted.
5. In the Basic settings view of tLogRow:
• Select Table (print values in cells of a table).
• Leave other options as they are.

Accessing Azure ADLS Gen2 storage: executing the Job


Procedure
1. Press F6 to run the Job.
2. Check the result in the Run console.

3. (Option) Check the Blob file generated using Microsoft Azure Storage Explorer. See Get started
with Storage Explorer for related information.

282
tAzureStorageConnection

tAzureStorageConnection
Uses authentication and the protocol information to create a connection to the Microsoft Azure
Storage system that can then be reused by other Azure Storage components.

tAzureStorageConnection Standard properties


These properties are used to configure tAzureStorageConnection running in the Standard Job
framework.
The Standard tAzureStorageConnection component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.

283
tAzureStorageConnection

Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is generally used with other Azure Storage
components.
Knowledge about Microsoft Azure Storage is required.

Related scenario
For related scenarios, see:
• Retrieving files from a Azure Storage container on page 303
• Creating a container in Azure Storage on page 286
• Handling data with Microsoft Azure Table storage on page 313

284
tAzureStorageContainerCreate

tAzureStorageContainerCreate
Creates a new storage container used to hold Azure blobs (Binary Large Object) for a given Azure
storage account.

tAzureStorageContainerCreate Standard properties


These properties are used to configure tAzureStorageContainerCreate running in the Standard Job
framework.
The Standard tAzureStorageContainerCreate component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The

285
tAzureStorageContainerCreate

SAS URL format is https://<$storagename>.<


$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Container name Enter the name of the blob container you need to create.

Access control Select the access restriction level you need to apply on the
container to be created.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

CONTAINER The name of the blob container. This is an After variable


and it returns a string.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component can be used as a standalone component of


a Job or subJob.

Prerequisites Knowledge about Microsoft Azure Storage is required.

Creating a container in Azure Storage


In this scenario, a four-component Job uses Azure Storage components to create a container in a given
Azure Storage system and check whether this container is successfully created.

286
tAzureStorageContainerCreate

Before replicating this scenario, you must have appropriate rights and permissions to read and write
files in the Azure storage account to be used. For further information, see Microsoft's documentation
for Azure Storage: http://azure.microsoft.com/en-us/documentation/services/storage/.

Linking the components


Procedure
1. In the Integration perspective of the Studio, create an empty Job, named azureTalend for
example, from the Job Designs node in the Repository tree view.
2. Drop tAzureStorageConnection, tAzureStorageContainerCreate, tAzureStorageContainerExist and
tJava onto the workspace.
3. Connect them using the Trigger > OnSubjobOk link.

Connecting to an Azure storage account


Procedure
1. Double-click tAzureStorageConnection to open its Component view.

2. In the Account name field, enter the name of the storage account to be connected to. In this exam
ple, it is talendstorage, an account that has been created for demonstration purposes.
3. In the Account key field, paste the primary or the secondary key associated with the storage
account to be used. These keys can be found in the Manage Access Key dashboard in the Azure St
orage system to be connected to.
4. From the Protocol list, select the protocol for the endpoint of the storage account to be used. In
this example, it is HTTPS.

287
tAzureStorageContainerCreate

Creating a container
Procedure
1. Double-click tAzureStorageContainerCreate to open its Component view.

2. Select the component whose connection details will be used to set up the Azure storage
connection. In this example, it is tAzureStorageConnection_1.
3. In the Container name field, enter the name of the container you need to create. If a container
using the same name exists, that container will be overwritten at runtime.
4. From the Access control list, select the access restriction level for the container to be created. In
this example, select Private.

Verifying the creation


Procedure
1. Double-click tAzureStorageContainerExist to open its Component view.

2. Select the component whose connection details will be used to set up the Azure storage c
onnection. In this example, it is tAzureStorageConnection_1.
3. In the Container name field, enter the name of the container you need to check whether it exists.
4. Double-click tJava to open its Component view.

5. In the Code field, enter System.out.println();


6. In the Outline panel, which, by default, is found to the left side of the Component view, expand
the tAzureStorageContainerExist node.

288
tAzureStorageContainerCreate

7. From the Outline panel, drop the CONTAINER_EXSIT global variable into the parentheses
in the code in the Component view in order to make the code read: System.out.pri
ntln(((Boolean)globalMap.get("tAzureStorageContainerExist_1_CONTAINER_
EXIST")));

Executing the Job


Procedure
1. Press F6 to run this Job.
2. Check the execution result on the Run console.

You can read that the Job returns true as the verification result, that is to say, the
talendcontainer container has been created in the storage account being used.
3. Double-check the result in the web console of the Azure storage account.

289
tAzureStorageContainerCreate

You can read as well that the talendcontainer container has been created.

290
tAzureStorageContainerDelete

tAzureStorageContainerDelete
Automates the removal of a given blob container from the space of a specific storage account.

tAzureStorageContainerDelete Standard properties


These properties are used to configure tAzureStorageContainerDelete running in the Standard Job
framework.
The Standard tAzureStorageContainerDelete component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<

291
tAzureStorageContainerDelete

$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Container name Enter the name of the blob container to be removed.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

CONTAINER The name of the blob container. This is an After variable


and it returns a string.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component can be used as a standalone component of


a Job or subJob.

Prerequisites Knowledge about Microsoft Azure Storage is required.

Related scenarios
No scenario is available for the Standard version of this component yet.

292
tAzureStorageContainerExist

tAzureStorageContainerExist
Automates the verification of whether a given blob container exists or not within a storage account.

tAzureStorageContainerExist Standard properties


These properties are used to configure tAzureStorageContainerExist running in the Standard Job
framework.
The Standard tAzureStorageContainerExist component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<

293
tAzureStorageContainerExist

$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Container name Enter the name of the blob container you need to verify
whether it exists.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

CONTAINER The name of the blob container. This is an After variable


and it returns a string.

CONTAINER_EXIST The result of whether the given container exists or not. This
is an After variable and it returns a boolean.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component can be used as a standalone component of


a Job or subJob.

Prerequisites Knowledge about Microsoft Azure Storage is required.

Related scenario
For a related scenario, see Creating a container in Azure Storage on page 286

294
tAzureStorageContainerList

tAzureStorageContainerList
Lists all containers in a given Azure storage account.

tAzureStorageContainerList Standard properties


These properties are used to configure tAzureStorageContainerList running in the Standard Job
framework.
The Standard tAzureStorageContainerList component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<

295
tAzureStorageContainerList

$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with a single
column ContainerName of String type, which indicates
the name of each container to be listed.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Die on error Select the check box to stop the execution of the Job when
an error occurs.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

NB_LINE The number of rows processed. This is an After variable and


it returns an integer.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

296
tAzureStorageContainerList

Usage

Usage rule This component is usually used as a start component of a


Job or subJob and it always needs an output link.

Prerequisites Knowledge about Microsoft Azure Storage is required.

Related scenario
No scenario is available for this component yet.

297
tAzureStorageDelete

tAzureStorageDelete
Deletes blobs from a given container for an Azure storage account according to the specified blob
filters.

tAzureStorageDelete Standard properties


These properties are used to configure tAzureStorageDelete running in the Standard Job framework.
The Standard tAzureStorageDelete component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<

298
tAzureStorageDelete

$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Container name Enter the name of the container from which you need to
delete blobs.

Blob filter Complete this table to select the blobs to be deleted. The
parameters to be provided are:
• Blob prefix: enter the common prefix of the names of
the blobs you need to delete. This prefix allows you to
filter the blobs which have the specified prefix in their
names in the given container.
A blob name contains the virtual hierarchy of the blob
itself. This hierarchy is a virtual path to that blob and is
relative to the container where that blob is stored. For
example, in a container named photos, the name of a
photo blob might be 2014/US/Oakland/Talend.jpg.
For this reason, when you define a prefix, you are
actually designating a directory level as the blob filter,
for example, 2014/ or 2014/US/.
• Include subdirectories: select this check box to select
all of the sub-folders and the blobs in those folders
beneath the designated directory level. If you leave
this check box clear, tAzureStorageDelete deletes only
the blobs directly beneath that directory level.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

CONTAINER The name of the blob container. This is an After variable


and it returns a string.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

299
tAzureStorageDelete

Usage

Usage rule This component can be used as a standalone component of


a Job or subJob.

Prerequisites Knowledge about Microsoft Azure Storage is required.

Related scenarios
No scenario is available for the Standard version of this component yet.

300
tAzureStorageGet

tAzureStorageGet
Retrieves blobs from a given container for an Azure storage account according to the specified filters
applied on the virtual hierarchy of the blobs and then write selected blobs in a local folder.

tAzureStorageGet Standard properties


These properties are used to configure tAzureStorageGet running in the Standard Job framework.
The Standard tAzureStorageGet component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<

301
tAzureStorageGet

$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Container Enter the name of the container you need to retrieve blobs
from.

Local folder Enter the path, or browse to the folder in which you need to
store the retrieved blobs.

Blobs Complete this table to select the blobs to be retrieved. The


parameters to be provided are:
• Prefix: enter the common prefix of the names of the
blobs you need to retrieve. This prefix allows you to
filter the blobs which have the specified prefix in their
names in the given container.
A blob name contains the virtual hierarchy of the blob
itself. This hierarchy is a virtual path to that blob and is
relative to the container where that blob is stored. For
example, in a container named photos, the name of a
photo blob might be 2014/US/Oakland/Talend.jpg.
For this reason, when you define a prefix, you are
actually designating a directory level as the blob filter,
for example, 2014/ or 2014/US/.
If you want to select the blobs stored directly beneath
the container level, that is to say, the blobs without
virtual path in their names, remove quotation marks
and enter null.
• Include sub-directories: select this check box to r
etrieve all of the sub-folders and the blobs in those
folders beneath the designated directory level in
the Blob prefix column. If you leave this check box
clear, tAzureStorageGet returns only the blobs directly
beneath that directory level.
• Create parent directories: select this check box to r
eplicate the virtual directory of the retrieved blobs in t
he local folder.
Note that if you leave this check box clear, there
must be the same directory in the local folder as the
retrieved blobs have in the container; otherwise, those
blobs cannot be retrieved.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

302
tAzureStorageGet

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

CONTAINER The name of the blob container. This is an After variable


and it returns a string.

LOCAL_FOLDER The local directory used in this component. This is an After


variable and it returns a string.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component can be used as a standalone component of


a Job or subJob.

Prerequisites Knowledge about Microsoft Azure Storage is required.

Retrieving files from a Azure Storage container


In this scenario, a five-component Job uses Azure Storage components to write files in a given Azure
Storage system and then retrieve selected files (blobs in terms of Azure Storage) from that system.

Before replicating this scenario, you must have appropriate rights and permissions to read and write
files in the Azure storage account to be used. For further information, see Microsoft's documentation
for Azure Storage: http://azure.microsoft.com/en-us/documentation/services/storage/.

303
tAzureStorageGet

The talendcontainer container used in this scenario was created using tAzureStorageContainerCreate
in the scenario Creating a container in Azure Storage on page 286.

Linking the components


Procedure
1. In the Integration perspective of the Studio, create an empty Job, named azureTalend for
example, from the Job Designs node in the Repository tree view.
2. Drop tAzureStoragePut, tAzureStorageList, tJava and tAzureStorageGet onto the workspace.
3. Connect the Azure Storage components using the Trigger > OnSubjobOk link while connect
tAzureStorageList to tJava using the Row > Iterate link.

Connecting to an Azure storage account


Procedure
1. Double-click tAzureStorageConnection to open its Component view.

2. In the Account name field, enter the name of the storage account to be connected to. In this exam
ple, it is talendstorage, an account that has been created for demonstration purposes.
3. In the Account key field, paste the primary or the secondary key associated with the storage
account to be used. These keys can be found in the Manage Access Key dashboard in the Azure St
orage system to be connected to.
4. From the Protocol list, select the protocol for the endpoint of the storage account to be used. In
this example, it is HTTPS.

Writing files in Azure Storage


Procedure
1. Double-click tAzureStoragePut to open its Component view.

304
tAzureStorageGet

2. Select the component whose connection details will be used to set up the Azure storage c
onnection. In this example, it is tAzureStorageConnection_1.
3. In the Container name field, enter the name of the container you need to write files in. In this
example, it is talendcontainer, a container created in the scenario Creating a container in
Azure Storage on page 286.
4. In the Local folder field, enter the path, or browse, to the directory where the files to be used are
stored. In this scenario, they are some pictures showing technical process and stored locally in
E:/photos. Therefore, put E:/photos; this allows tAzureStoragePut to upload all the files of
this folder and its sub-folders into the talendcontainer container.
For demonstration purposes, the example photos are organized as follows in the E:/photos
folder.
• Directly beneath the E:/photos level:

components-use_case_triakinput_1.png
components-use_case_triakinput_2.png
components-use_case_triakinput_3.png
components-use_case_triakinput_4.png

• In the E:/photos/mongodb/step1 directory:

components-use_case_tmongodbbulkload_1.png
components-use_case_tmongodbbulkload_2.png
components-use_case_tmongodbbulkload_3.png
components-use_case_tmongodbbulkload_4.png

• In the E:/photos/mongodb/step2 directory:

components-use_case_tmongodbbulkload_5.png
components-use_case_tmongodbbulkload_6.png
components-use_case_tmongodbbulkload_7.png
components-use_case_tmongodbbulkload_8.png

5. In the Azure Storage folder field, enter the directory where you want to write files. This directory
will be created in the container to be used if it does not exist. In this example, enter photos.

Verifying the file transfer


Configuring tAzureStorageList

Procedure
1. Double-click tAzureStorageList to open its Component view.

305
tAzureStorageGet

2. Select the component whose connection details will be used to set up the Azure storage c
onnection. In this example, it is tAzureStorageConnection_1.
3. In the Container name field, enter the name of the container in which you need to check whether
the given files exist. In this scenario, it is talendcontainer.
4. Under the Blob filter table, click the [+] button to add one row in the table.
5. In the Prefix column, enter the common prefix of the names of the files (blobs) to be checked.
This prefix represents a virtual directory level you designate as the starting point down from
which files (blobs) are checked. In this example, it is photos/.
For further information about blob names, see http://msdn.microsoft.com/en-us/library/dd
135715.aspx.
6. In the Include sub-directories column, select the check box in the newly added row. This allows
tAzureStorageList to check all the files at any hierarchical level beneath the designated starting
point.

Configuring tJava

Procedure
1. Double-click tJava to open its Component view.

2. In the Code field, enter System.out.println();


3. In the Outline panel, which, by default, is found to the left side of the Component view, expand
the tAzureStorageList node.

306
tAzureStorageGet

4. From the Outline panel, drop the CONTAINER_BLOB global variable into the parentheses in the
code in the Component view so as to make the code read: System.out.println(((Boolean
)globalMap.get("tAzureStorageList_1_CURRENT_BLOB")));

Retrieving selected files


Procedure
1. Double-click tAzureStorageGet to open its Component view.

2. Select the component whose connection details will be used to set up the Azure storage c
onnection. In this example, it is tAzureStorageConnection_1.
3. In the Container name field, enter the name of the container from which you need to retrieve files.
In this scenario, it is talendcontainer.
4. In the Local folder field, enter the path, or browse, to the directory where you want to put the retri
eved files. In this example, it is E:/screenshots.
5. Under the Blob table, click the [+] button to add one row in the table.
6. In the Prefix column, enter the common name prefix of the files (blobs) to be retrieved. In this
example, it is photos/mongodb/.
7. In the Include sub-directories column, select the check box in the newly added row. This allows
tAzureStorageGet to retrieve all the files (blobs) beneath the photos/mongodb/ level.
8. In the Create parent directories column, select the check box in the newly added row to create the
same directory in the specified local folder as the retrieved blobs have in the container.

307
tAzureStorageGet

Note that having this same directory is necessary for successfully retrieving blobs. If you leave
this check box clear, then you need to create the same directory yourself in the target local folder.

Executing the Job


Procedure
1. Press F6 to run this Job.
2. Check the execution result on the Run console.

You can read that the Job returns the list of the blobs with the photos prefix in the container.
3. Double-check the resut in the web console of the Azure storage account.

4. Check the retrieved files in the specified local folder.

308
tAzureStorageGet

You can see the blobs with the photos/mongodb/ prefix have been retrieved and their prefix
transformed to directories.

309
tAzureStorageInputTable

tAzureStorageInputTable
Retrieves a set of entities that satisfy the specified filter criteria from an Azure storage table.

tAzureStorageInputTable Standard properties


These properties are used to configure tAzureStorageInputTable running in the Standard Job
framework.
The Standard tAzureStorageInputTable component belongs to the Cloud family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The

310
tAzureStorageInputTable

SAS URL format is https://<$storagename>.<


$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Table name Specify the name of the table from which the entities will
be retrieved.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with the
following columns that describe the three system properties
of each entity:
• PartitionKey: the partition key for the partition that the
entity belongs to.
• RowKey: the row key for the entity within the partition.
PartitionKey and RowKey are string type values that
uniquely identify every entity in a table, and the user
must include them in every insert, update, and delete
operation.
• Timestamp: the time that the entity was last modified.
This DateTime value is maintained by the Azure server
and it can not be modified by the user.
For more information about these system properties, see
Understanding the Table Service Data Model.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Use filter expression Select this check box and complete the Filter expressions
table displayed to specify the conditions used to filter the
entities to be retrieved by clicking the [+] button to add as

311
tAzureStorageInputTable

many rows as needed, each row for a condition, and setting


the value for the following parameters for each condition.
• Column: specify the name of the property on which you
want to apply for the condition.
• Function: click the cell and select the comparison
operator you want to use from the drop-down list.
• Value: specify the value used to compare the property
to.
• Predicate: select the predicate used to combine the
conditions.
• Field type: click the cell and select the type of the
column from the drop-down list.
The generated filter expression will be displayed in the
read-only Effective filter field.
For more information about the filter expressions, see
Querying Tables and Entities.

Die on error Select the check box to stop the execution of the Job when
an error occurs.

Advanced settings

Name mappings Complete this table to map the column name of the
component schema with the property name of the Azure
table entity if they are different.
• Schema column name: enter the column name of the
component schema between double quotation marks.
• Entity property name: enter the property name of the
Azure table entity between double quotation marks.
For example, if there are three schema columns
CompanyID, EmployeeID, and EmployeeName that
are used to feed the values for the PartitionKey,
RowKey, and Name entity properties respectively, since
the PartitionKey and RowKey columns have already
been added to the schema automatically and you do not
need to specify the mapping relationship for them, you
only need to add one row and set the value of the Schema
column name cell with "EmployeeName" and the value of
the Entity property name cell with "Name" to specify the
mapping relationship for the EmployeeName column when
retrieving data from the Azure table.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global variables

NB_LINE The number of rows processed. This is an After variable and


it returns an integer.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

312
tAzureStorageInputTable

Usage

Usage rule This component is usually used as a start component of a


Job or subJob and it always needs an output link.

Handling data with Microsoft Azure Table storage


Here is an example of using Talend components to connect to a Microsoft Azure storage account that
gives you access to Azure storage table service, write some employee data into an Azure storage table,
and then retrieve the employee data from the table and display it on the console.
The employee data used in this example is as follows:
#Id;Name;Site;Job;Date;Salary
12000;Gerald Roosevelt;Beijing;Software Developer;2008-01-01;15000.01
12001;Benjamin Harrison;Paris;Software Developer;2008-11-22;13000.11
12002;Bob Clinton;Beijing;Software Tester;2008-05-12;12000.22
12003;James Quincy;Paris;Technical Writer;2009-03-10;12000.33
12004;Gerald Harrison;Beijing;Software Tester;2009-06-20;12500.44
12005;Harry Madison;Paris;Software Developer;2009-10-15;14000.55
12006;Helen Roosevelt;Beijing;Software Tester;2009-03-25;13500.66
12007;Mary Clinton;Beijing;Software Developer;2010-02-20;16000.77
12008;Cathey Quincy;Paris;Software Developer;2010-07-15;14000.88
12009;John Smith;Beijing;Technical Writer;2011-02-10;12500.99

Creating a Job for handling data with Azure Table storage


Create a Job to connect to an Azure storage account, write some employee data into an Azure storage
table, and then retrieve that information from the table and display it on the console.

313
tAzureStorageInputTable

Procedure
1. Create a new Job and add a tAzureStorageConnection component, a tFixedFlowInput component,
a tAzureStorageOutputTable component, a tAzureStorageInputTable component, and a tLogRow
component by typing their names in the design workspace or dropping them from the Palette.
2. Link the tFixedFlowInput component to the tAzureStorageOutputTable component using a Row >
Main connection.
3. Do the same to link the tAzureStorageInputTable component to the tLogRow component.
4. Link the tAzureStorageConnection component to the tFixedFlowInput component using a Trigger
> OnSubjobOk connection.
5. Do the same to link the tFixedFlowInput component to the tAzureStorageInputTable component.

Connecting to an Azure Storage account


Configure the tAzureStorageConnection component to open the connection to an Azure Storage
account.

Before you begin


The Azure Storage account, which allows you to access the Azure Table storage service and store
the provided employee data, has already been created. For more information about how to create an
Azure Storage account, see About Azure storage accounts.

Procedure
1. Double-click the tAzureStorageConnection component to open its Basic settings view on the
Component tab.

2. In the Account Name field, specify the name of the storage account you need to access.
3. In the Account Key field, specify the key associated with the storage account you need to access.

Writing data into an Azure Storage table


Configure the tFixedFlowInput component and the tAzureStorageOutputTable component to write
the employee data into an Azure Storage table.

Procedure
1. Double-click the tFixedFlowInput component to open its Basic settings view on the Component
tab.

314
tAzureStorageInputTable

2. Click next to Edit schema to open the schema dialog box and define the schema by adding
six columns: Id, Name, Site, and Job of String type, Date of Date type, and Salary of Double
type. Then click OK to save the changes and accept the propagation prompted by the pop-up
dialog box.

Note that in this example, the Site and Id columns are used to feed the values of the
PartitionKey and RowKey system properties of each entity and they should be of String type,
and the Name column is used to feed the value of the EmployeeName property of each entity.
3. In the Mode area, select Use Inline Content(delimited file) and in the Content field displayed,
enter the employee data that will be written into the Azure Storage table.
4. Double-click the tAzureStorageOutputTable component to open its Basic settings view on the
Component tab

315
tAzureStorageInputTable

5. From the connection component drop-down list, select the component whose connection
details will be used to set up the connection to the Azure Storage service, tAzureStorageC
onnection_1 in this example.
6. In the Table name field, enter the name of the table into which the employee data will be written,
employee in this example.
7. From the Action on table drop-down list, select the operation to be performed on the specified
table, Drop table if exist and create in this example.
8. Click Advanced settings to open its view.

9. Click under the Name mappings table to add three rows and map the schema column name
with the property name of each entity in the Azure table. In this example,
• the Site column is used to feed the value of the PartitionKey system property, in the
first row you need to set the Schema column name cell with the value "Site" and the Entity
property name cell with the value "PartitionKey".
• the Id column is used to feed the value of the RowKey system property, in the second row
you need to set the Schema column name cell with the value "Id" and the Entity property
name cell with the value "RowKey".
• the Name column is used to feed the value of the EmployeeName property, in the third row
you need to set the Schema column name cell with the value "Name" and the Entity property
name cell with the value "EmployeeName".

Retrieving data from the Azure Storage table


Configure the tAzureStorageInputTable component and the tLogRow component to retrieve the
employee data from the Azure Storage table.

Procedure
1. Double-click the tAzureStorageInputTable component to open its Basic settings view.

316
tAzureStorageInputTable

2. From the connection component drop-down list, select the component whose connection
details will be used to set up the connection to the Azure Storage service, tAzureStorageC
onnection_1 in this example.
3. In the Table name field, enter the name of the table from which the employee data will be
retrieved, employee in this example.
4. Click next to Edit schema to open the schema dialog box.

Note that the schema has already been predefined with two read-only columns RowKey and
PartitionKey of String type, and another column Timestamp of Date type. The RowKey
and PartitionKey columns correspond to the Id and Site columns of the tAzureStorageO
utputTable schema.
5. Define the schema by adding another four columns that hold other employee data, Name and Job
of String type, Date of Date type, and Salary of Double type. Then click OK to save the changes
and accept the propagation prompted by the pop-up dialog box.
6. Click Advanced settings to open its view.

317
tAzureStorageInputTable

7. Click under the Name mappings table to add one row and set the Schema column name cell
with the value "Name" and the Entity property name cell with the value "EmployeeName" to
map the schema column name with the property name of each entity in the Azure table.
Note that for the tAzureStorageInputTable component, the PartitionKey and RowKey
columns have already been added automatically to the schema and you do not need to specify the
mapping relationship for them.
8. Double-click the tLogRow component to open its Basic settings view and in the Mode area, select
Table (print values in cells of a table) for a better display of the result.

Executing the Job to handle data with Azure Table storage


After setting up the Job and configuring the components used in the Job for handling data with Azure
Table storage, you can then execute the Job and verify the Job execution result.

Procedure
1. Press Ctrl + S to save the Job.
2. Press F6 to execute the Job.

As shown above, the Job is executed successfully and the employee data is displayed on the
console, with the timestamp value that indicates when each entity was inserted.
3. Double-check the employee data that has been written into the Azure Storage table employee
using Microsoft Azure Storage Explorer if you want.

318
tAzureStorageInputTable

319
tAzureStorageList

tAzureStorageList
Lists blobs in a given container according to the specified blob filters.

tAzureStorageList Standard properties


These properties are used to configure tAzureStorageList running in the Standard Job framework.
The Standard tAzureStorageList component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,

320
tAzureStorageList

<$service> is the allowed service name (blob, file,


queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Container name Enter the name of the container from which you need to
select blobs to be listed.

Blob filter Complete this table to select the blobs to be listed. The par
ameters to be provided are:
• Prefix: enter the common prefix of the names of the
blobs you need to list. This prefix allows you to filter
the blobs which have the specified prefix in their
names in the given container.
A blob name contains the virtual hierarchy of the blob
itself. This hierarchy is a virtual path to that blob and is
relative to the container where that blob is stored. For
example, in a container named photos, the name of a
photo blob might be 2014/US/Oakland/Talend.jpg.
For this reason, when you define a prefix, you are
actually designating a directory level as the blob filter,
for example, 2014/ or 2014/US/.
If you want to select the blobs stored directly beneath
the container level, that is to say, the blobs without
virtual path in their names, remove quotation marks
and enter null.
• Include sub-directories: select this check box to select
all of the sub-folders and the blobs in those folders
beneath the designated directory level. If you leave
this check box clear, tAzureStorageList returns only
the blobs, if any, directly beneath that directory level.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with a single
column BlobName of String type, which indicates the nam
e of each blob to be listed.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and

321
tAzureStorageList

decide whether to propagate the changes to all the


Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

CONTAINER The name of the blob container. This is an After variable


and it returns a string.

CURRENT_BLOB The blob name being processed by this component. This is


an After variable and it returns a string.

NB_LINE The number of rows processed. This is an After variable and


it returns an integer.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component can be used as a standalone component of


a Job or subJob.

Prerequisites Knowledge about Microsoft Azure Storage is required.

Related scenario
For a related scenario, see Retrieving files from a Azure Storage container on page 303

322
tAzureStorageOutputTable

tAzureStorageOutputTable
Performs the defined action on a given Azure storage table and inserts, replaces, merges or deletes
entities in the table based on the incoming data from the preceding component.

tAzureStorageOutputTable Standard properties


These properties are used to configure tAzureStorageOutputTable running in the Standard Job
framework.
The Standard tAzureStorageOutputTable component belongs to the Cloud family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service

323
tAzureStorageOutputTable

on Microsoft Azure portal after generating SAS. The


SAS URL format is https://<$storagename>.<
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Table name Specify the name of the table into which the entities will be
written.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Partition Key Select the schema column that holds the partition key value
from the drop-down list.

Row Key Select the schema column that holds the row key value
from the drop-down list.

Action on data Select an action to be performed on data of the table


defined.
• Insert: insert a new entity into the table.
• Insert or replace: replace an existing entity or insert
a new entity if it does not exist. When replace an
entity, any properties from the previous entity will be
removed if the new entity does not define them.
• Insert or merge: merge an existing entity or insert a
new entity if it does not exist. When merge an entity,
any properties from the previous entity will be retained
if the new entity does not define or include them.

324
tAzureStorageOutputTable

• Merge: update an existing entity without removing the


property value of the previous entity if the new entity
does not define its value.
• Replace: update an existing entity and remove the
property value of the previous entity if the new entity
does not define its value.
• Delete: delete an existing entity.
For performance reasons, the incoming data is processed
in parallel and in random order. Therefore, it is not
recommended to perform any order-sensitive data operation
(for example, insert or replace) if there are duplicated rows
in your data.

Action on table Select an operation to be performed on the table defined.


• Default: No operation is carried out.
• Drop and create table: The table is removed and
created again.
• Create table: The table does not exist and gets created.
• Create table if does not exist: The table is created if it
does not exist.
• Drop table if exist and create: The table is removed if it
already exists and created again.

Process in batch Select this check box to process the input entities in batch.
Note that the entities to be processed in batch should
belong to the same partition group, which means, they
should have the same partition key value.

Die on error Select the check box to stop the execution of the Job when
an error occurs.

Advanced settings

Name mappings Complete this table to map the column name of the
component schema with the property name of the Azure
table entity if they are different.
• Schema column name: enter the column name of the
component schema between double quotation marks.
• Entity property name: enter the property name of the
Azure table entity between double quotation marks.
For example, if there are three schema columns
CompanyID, EmployeeID, and EmployeeName that are
used to feed the values for the PartitionKey, RowKey,
and Name entity properties respectively, then you need to
add the following rows for the mapping when writing data
into the Azure table.
• the Schema column name cell with the value
"CompanyID" and the Entity property name cell with
the value "PartitionKey".
• the Schema column name cell with the value
"EmployeeID" and the Entity property name cell
with the value "RowKey".
• the Schema column name cell with the value
"EmployeeName" and the Entity property name cell
with the value "Name".

325
tAzureStorageOutputTable

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global variables

NB_LINE The number of rows processed. This is an After variable and


it returns an integer.

NB_SUCCESS The number of rows successfully processed. This is an After


variable and it returns an integer.

NB_REJECT The number of rows rejected. This is an After variable and it


returns an integer.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is usually used as an end component of a


Job or subJob and it always needs an input link.

Related scenario
For a related scenario, see Handling data with Microsoft Azure Table storage on page 313.

326
tAzureStoragePut

tAzureStoragePut
Uploads local files into a given container for an Azure storage account.

tAzureStoragePut Standard properties


These properties are used to configure tAzureStoragePut running in the Standard Job framework.
The Standard tAzureStoragePut component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<
$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,

327
tAzureStoragePut

<$service> is the allowed service name (blob, file,


queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Container name Enter the name of the container you need to write files in.
This container must exist in the Azure Storage system you
are using.

Local folder Enter the path, or browse to the folder from which you need
to upload files.

Azure storage folder Enter the path to the virtual blob folder in the remote Azure
storage system you want to upload files into.
If you do not put any value in this field but leave this
field as it is with only its default quotation marks,
tAzureStoragePut writes files directly beneath the
container level.

Use file list Select this check box to be able to define file filtering
conditions. Once selecting it, the Files table is displayed.

Files Complete this table to select the files to be uploaded into


Azure. The parameters to be provided are:
• Filemask: file names or path to the files to be
uploaded.
• New name: name to give to the files after they are
uploaded.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

CONTAINER The name of the blob container. This is an After variable


and it returns a string.

LOCAL_FOLDER The local directory used in this component. This is an After


variable and it returns a string.

REMOTE_FOLDER The remote directory used in this component. This is an


After variable and it returns a string.

328
tAzureStoragePut

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component can be used as a standalone component of


a Job or subJob.

Prerequisites Knowledge about Microsoft Azure Storage is required.

Related scenario
For a related scenario, see Retrieving files from a Azure Storage container on page 303

329
tAzureStorageQueueCreate

tAzureStorageQueueCreate
Creates a new queue under a given Azure storage account.

tAzureStorageQueueCreate Standard properties


These properties are used to configure tAzureStorageQueueCreate running in the Standard Job
framework.
The Standard tAzureStorageQueueCreate component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<

330
tAzureStorageQueueCreate

$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Queue name Specify the name of the Azure queue to be created. For
more information about the queue naming rules, see
Naming Queues and Metadata.

Die on error Select the check box to stop the execution of the Job when
an error occurs.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

QUEUE_NAME The name of the Azure queue. This is an After variable and
it returns a string.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component can be used as a standalone component of


a Job or subJob.

Related scenario
No scenario is available for this component yet.

331
tAzureStorageQueueDelete

tAzureStorageQueueDelete
Deletes a specified queue permanently under a given Azure storage account.

tAzureStorageQueueDelete Standard properties


These properties are used to configure tAzureStorageQueueDelete running in the Standard Job
framework.
The Standard tAzureStorageQueueDelete component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<

332
tAzureStorageQueueDelete

$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Queue name Specify the name of the Azure queue to be deleted.

Die on error Select the check box to stop the execution of the Job when
an error occurs.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

QUEUE_NAME The name of the Azure queue. This is an After variable and
it returns a string.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component can be used as a standalone component of


a Job or subJob.

Related scenario
No scenario is available for this component yet.

333
tAzureStorageQueueInput

tAzureStorageQueueInput
Retrieves one or more messages from the front of an Azure queue.

tAzureStorageQueueInput Standard properties


These properties are used to configure tAzureStorageQueueInput running in the Standard Job
framework.
The Standard tAzureStorageQueueInput component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<

334
tAzureStorageQueueInput

$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Queue name Specify the name of the Azure queue from which the
messages will be retrieved.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with the
following columns:
• MessageId: the id of the message.
• MessageContent: the body of the message.
• InsertionTime: the time when the message was added
to the queue.
• ExpirationTime: the time when the message will be
expired.
• NextVisibleTime: the time when the message becomes
visible next time.
• DequeueCount: the number of times that the message
has been dequeued. This value is incremented each
time the message is dequeued, but it will not be
incremented when the message is peeked.
• PopReceipt: the pop receipt value that is required to
delete the message.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Number of messages Enter the number of messages to be retrieved from the


specified queue at a time, up to a maximum of 32.

335
tAzureStorageQueueInput

Peek messages Select this check box to retrieve messages without


removing them from the queue and altering the visibility
of them. The messages will remain available to other
consumers.

Delete the message while streaming Select this check box to delete the messages while
retrieving them from the queue.

Die on error Select the check box to stop the execution of the Job when
an error occurs.

Advanced settings

Visibility timeout in seconds Enter the visibility timeout value (in seconds) relative
to the server time. This timeout value is added to the
time at which the message is retrieved to determine its
NextVisibleTime value. The message will not be visible
to other consumers for this time interval after it has been
retrieved.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

NB_LINE The number of rows processed. This is an After variable and


it returns an integer.

QUEUE_NAME The name of the Azure queue. This is an After variable and
it returns a string.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is usually used as a start component of a


Job or subJob and it always needs an output link.

Related scenario
No scenario is available for this component yet.

336
tAzureStorageQueueInputLoop

tAzureStorageQueueInputLoop
Runs an endless loop to retrieve messages from the front of an Azure queue.

tAzureStorageQueueInputLoop Standard properties


These properties are used to configure tAzureStorageQueueInputLoop running in the Standard Job
framework.
The Standard tAzureStorageQueueInputLoop component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<

337
tAzureStorageQueueInputLoop

$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Queue name Specify the name of the Azure queue from which the
messages will be retrieved.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with the
following columns:
• MessageId: the id of the message.
• MessageContent: the body of the message.
• InsertionTime: the time when the message was added
to the queue.
• ExpirationTime: the time when the message will be
expired.
• NextVisibleTime: the time when the message becomes
visible next time.
• DequeueCount: the number of times that the message
has been dequeued. This value is incremented each
time the message is dequeued, but it will not be
incremented when the message is peeked.
• PopReceipt: the pop receipt value that is required to
delete the message.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Number of messages Enter the number of messages to be retrieved from the


specified queue at a time, up to a maximum of 32.

338
tAzureStorageQueueInputLoop

Loop wait time Specify the duration (in seconds) for which the loop will
wait for the message to arrive in the queue before returning.

Die on error Select the check box to stop the execution of the Job when
an error occurs.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

NB_LINE The number of rows processed. This is an After variable and


it returns an integer.

QUEUE_NAME The name of the Azure queue. This is an After variable and
it returns a string.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is usually used as a start component of a


Job or subJob and it always needs an output link.

Related scenario
No scenario is available for this component yet.

339
tAzureStorageQueueList

tAzureStorageQueueList
Returns all queues associated with the given Azure storage account.

tAzureStorageQueueList Standard properties


These properties are used to configure tAzureStorageQueueList running in the Standard Job
framework.
The Standard tAzureStorageQueueList component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<

340
tAzureStorageQueueList

$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with one single
column QueueName that stores the name of each queue to
be returned.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Die on error Select the check box to stop the execution of the Job when
an error occurs.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

NUMBER_OF_QUEUES The number of queues returned. This is an After variable


and it returns an integer.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

341
tAzureStorageQueueList

Usage

Usage rule This component is usually used as a start component of a


Job or subJob and it always needs an output link.

Related scenario
No scenario is available for this component yet.

342
tAzureStorageQueueOutput

tAzureStorageQueueOutput
Adds messages to the back of an Azure queue.
Note that this component can only be used with Java 8.

tAzureStorageQueueOutput Standard properties


These properties are used to configure tAzureStorageQueueOutput running in the Standard Job
framework.
The Standard tAzureStorageQueueOutput component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The

343
tAzureStorageQueueOutput

SAS URL format is https://<$storagename>.<


$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Queue name Specify the name of the Azure queue to which the messages
will be added.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
The schema of this component is predefined with one single
column MessageContent that stores the body of each
message.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Die on error Select the check box to stop the execution of the Job when
an error occurs.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

NB_LINE The number of messages processed. This is an After variable


and it returns an integer.

344
tAzureStorageQueueOutput

NB_SUCCESS The number of messages successfully enqueued. This is an


After variable and it returns an integer.

NB_REJECT The number of messages rejected. This is an After variable


and it returns an integer.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is usually used as an end component of a


Job or subJob and it always needs an input link.

Related scenario
No scenario is available for this component yet.

345
tAzureStorageQueuePurge

tAzureStorageQueuePurge
Purges messages in an Azure queue.

tAzureStorageQueuePurge Standard properties


These properties are used to configure tAzureStorageQueuePurge running in the Standard Job
framework.
The Standard tAzureStorageQueuePurge component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component whose connection details will be


used to set up the connection to Azure storage from the
drop-down list.

Account Name Enter the name of the storage account you need to access. A
storage account name can be found in the Storage accounts
dashboard of the Microsoft Azure Storage system to be
used. Ensure that the administrator of the system has
granted you the appropriate access permissions to this
storage account.

Account Key Enter the key associated with the storage account you need
to access. Two keys are available for each account and by
default, either of them can be used for this access.

Protocol Select the protocol for this connection to be created.

Use Azure Shared Access Signature Select this check box to use a shared access signature
(SAS) to access the storage resources without need for the
account key. For more information, see Using Shared Access
Signatures (SAS).
In the Azure Shared Access Signature field displayed,
enter your account SAS URL between double quotation
marks. You can get the SAS URL for each allowed service
on Microsoft Azure portal after generating SAS. The
SAS URL format is https://<$storagename>.<

346
tAzureStorageQueuePurge

$service>.core.windows.net/<$sastoken>,
where <$storagename> is the storage account name,
<$service> is the allowed service name (blob, file,
queue or table), and <$sastoken> is the SAS token
value. For more information, see Constructing the Account
SAS URI.
Note that the SAS has valid period, you can set the start
time at which the SAS becomes valid and the expiry time
after which the SAS is no longer valid when generating
it, and you need to make sure your SAS is still valid when
running your Job.

Queue name Specify the name of the Azure queue in which the messages
will be purged.

Die on error Select the check box to stop the execution of the Job when
an error occurs.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component can be used as a standalone component of


a Job or subJob.

Related scenario
No scenario is available for this component yet.

347
tBarChart

tBarChart
Generates a bar chart from the input data to ease technical analysis.
tBarChart reads data from an input flow and transforms the data into a bar chart in a PNG image file.

tBarChart Standard properties


These properties are used to configure tBarChart running in the Standard Job framework.
The Standard tBarChart component belongs to the Business Intelligence family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Note:
The schema of tBarChart contains three read-only
columns named series (string), category (string), and
value (integer) respectively, in a fixed order. The data
in any extra columns will be only passed to the next
component, if any, without being presented in the bar c
hart.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the output component.

348
tBarChart

Generated image path Name and path of the output image file.

Chart title Enter the title of the bar chart to be generated.

Include legend Select this check box if you want the bar chart to include a
legend, indicating all series in different colors.

3Dimensions Select this check box to create an image with 3D effect. By


default, this check box is selected and the bars representing
the series of each category will be stacked one over
another. If this check box is cleared, a 2D image will be
created, with the bars displayed one besides another along
the category axis.

Image width and Image height Enter the width and height of the image file, in pixels.

Category axis name and Value axis name Enter the category axis name and value axis name.

Foreground alpha Enter an integer in the range of 0 to 100 to define the


transparency of the image. The smaller the number you
enter, the more transparent the image will be.

Plot orientation Select the plot orientation of the bar chart: VERTICAL or
HORIZONTAL.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is mainly used as Output component. It


requires an Input component and Row main link as input.

349
tBarChart

Creating a bar chart from the input data


This scenario describes a Job that reads source data from a CSV file and transforms the data into a bar
chart showing a comparison of several large cities. The input file is shown below:

City;Population(x1000);LandArea(km2);PopulationDensity(people/km2)
Beijing;10233;1418;7620
Moscow;10452;1081;9644
Seoul;10422;605;17215
Tokyo;8731;617;14151
Jakarta;8490;664;12738
New York;8310;789;10452

Because the input file has a different structure than the one required by the tBarChart component,
this use case uses the tMap component to adapt the source data to the three-column schema of
tBarChart so that a temporary CSV file can be created as the input to the tBarChart component.

Note:
You will usually use the tMap component to adjust the input schema in accordance with the
schema structure of the tBarChart component. For more information about how to use the tMap
component, see Talend Studio User Guide and tMap on page 1983.

To ensure correct generation of the temporary input file, a pre-treatment subJob is used to delete the
temporary file in case it already exists before the main Job is executed; as this temporary file serves
this specific Job only, a post-treatment subJob is used to deleted it after the main Job is executed.

Dropping and linking components


Procedure
1. Drop the following components from the Palette to the design workspace: a tPrejob, a tPostjob,
two tFileDelete components, two tFileInputDelimited components, a tMap, three tFileOutputDel
imited components, and a tBarChart.
2. Connect the tPrejob component to one tFileDelete component using a Trigger > On Component
Ok connection, and connect the tPostjob component to the other tFileDelete component using the
same type of connection.
3. Connect the first tFileInputDelimited to the tMap component using a Row > Main connection.
4. Connect the tMap component to the first tFileOutputDelimited component using a Row > Main
connection, and name the connection Population.
5. Repeat the step above to connect the tMap component to the other two tFileOutputDelimited
components using Row > Main connections, and name the connections Area and Density
respectively.
6. Connect the section tFileInputDelimited to the tBarChart component using a Row > Main
connection.
7. Connect the first tFileInputDelimited component to the second tFileInputDelimited component
using a Trigger > On Subjob Ok connection.
8. Relabel the components to best describe their functionality.

350
tBarChart

Results

Reading the source data


Procedure
1. Double-click the first tFileInputDelimited component, which is labelled Large_Cities, to display its
Basic settings view.

2. Fill in the File name field by browsing to the input file.


3. In the Header field, specify the number of header rows. In this use case, you have only one header
row.
4. Click Edit schema to describe the data structure of the input file. In this use case, the input
schema is made of four columns: City, Population, Area, and Density. Upon defining the column
names and data types, click OK to close the schema dialog box.

351
tBarChart

Adapting the source data to the tBarChart schema


Procedure
1. Double-click the tMap to open the Map Editor.
You can see an input table on the input panel, row1 in this example, and three empty output
tables, named Population, Area, and Density on the output panel.
2. Use the Schema editor to add three columns to each output table: series (string), category (string),
and value (integer).
3. In the relevant Expression field of the output tables, enter the text to be presented in the
legend area of the bar chart, "Population (x1000 people)", "Land area (km2)", and
"Population density (people/km2)" respectively in this example.
4. Drop the City column of the input table onto the category column of each output table.
5. Drop the Population column of the input table onto the value column of the Population table.
6. Drop the Area column of the input table onto the value column of the Area table.
7. Drop the Density column of the input table onto the value column of the Density table.

352
tBarChart

8. Click OK to save the mappings and close the Map Editor and propagate the output schemas to the
output components.

Generating the temporary input file


Procedure
1. Double-click the first tFileOutputDelimited component to display its Basic settings view.

2. In the File Name field, define a temporary CSV file to send the mapped data flows to. In this use
case, we name this file Temp.csv. This file will be used as the input to the tBarChart component.
3. Select the Append check box.
4. Repeat the steps above to define the properties of the other two tFileOutputDelimited
components, using exactly the same settings as in the first tFileOutputDelimited component.

353
tBarChart

Note:
Note that the order of output flows from the tMap component is not necessarily the actual
order of writing data to the target file. To ensure the target file is correctly generated, delete
the file by the same name if it already exists before Job execution and select the Append check
box in all the tFileOutputDelimited components in this step.

Configuring bar chart generation


Procedure
1. Double-click the second tFileInputDelimited component, which is labelled Temp_Input, to display
its Basic settings view.

2. Fill in the File name field with the path to the temporary input file generated by the
tFileOutputDelimited components. In this use case, the temporary input file to the tBarChart is
Temp.csv.
3. Double-click the tBarChart component to display its Basic settings view.

4. In the Generated image path field, define the file path of the image file to be generated.
5. In the Chart title field, define a title for the bar chart.
6. Define the category and series axis names.

354
tBarChart

7. Define the size and transparency degree of the image if needed. In this use case, we simply use
the default settings.
8. Click Edit schema to open the schema dialog box.

9. Copy all the columns from the output schema to the input schema by clicking the left-pointing
double arrow button. Then, click OK to close the schema dialog box.

Deleting the temporary file


About this task
As the tPrejob and tPostjob components simply trigger the connected subJobs and do not have any
settings to define, all you need to do is to define the properties of the two tFileDelete components.

Procedure
1. Double-click the first tFileDelete component to display its Basic settings view.

2. Fill in the File name field with the path to the temporary input file.
If the Fail on error check box is selected while the pre-treatment subJob fails because of errors
such as the file to delete does not exist, this failure will prevent the main subJob from being
launched. In this situation, you can clear the Fail on error check box to avoid this interruption.

355
tBarChart

3. Specify the same file path in the other tFileDelete component.

Executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 to launch it.
A bar chart is generated, showing a graphical comparison of the specified large cities.

356
tBigQueryBulkExec

tBigQueryBulkExec
Transfers given data to Google BigQuery.
The tBigQueryOutputBulk and tBigQueryBulkExec components are generally used together as parts
of a two step process. In the first step, an output file is generated. In the second step, this file is used
to feed a dataset. These two steps are fused together in the tBigQueryOutput component, detailed
in a separate section. The advantage of using two separate components is that the data can be
transformed before it is loaded in the dataset.
This component transfers a given file from Google Cloud Storage to Google BigQuery, or uploads a
given file into Google Cloud Storage and then transfers it to Google BigQuery.

tBigQueryBulkExec Standard properties


These properties are used to configure tBigQueryBulkExec running in the Standard Job framework.
The Standard tBigQueryBulkExec component belongs to the Big Data family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.

Note: If you make changes, the schema automatically


becomes built-in.

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

• The Record type of BigQuery is not supported.


• The columns for table metadata such as the
Description column or the Mode column cannot be
retrieved.
• The Timestamp data from your BigQuery system is
formated to be String data.

357
tBigQueryBulkExec

• The numeric data of BigQuery is converted to


BigDecimal.

Authentication mode Select the mode to be used to authenticate to your project.


• OAuth 2.0: authenticate the access using OAuth
credentials. When selecting this mode, the parameters
to be defined in the Basic settings view are Client ID,
Client secret and Authorization code.
• Service account: authenticate using a Google account
that is associated with your Google Cloud Platform
project. When selecting this mode, the parameter to be
defined in the Basic settings view is Service account
credentials file.

Service account credentials file Enter the path to the credentials file created for the service
account to be used. This file must be stored in the machine
in which your Talend Job is actually launched and executed.
For further information about how to create a Google
service account and obtain the credentials file, see
Getting Started with Authentication from the Google
documentation.

Client ID and Client secret Paste the client ID and the client secret, both created and
viewable on the API Access tab view of the project hosting
the Google BigQuery service and the Cloud Storage service
you need to use.
To enter the client secret, click the [...] button next to the
client secret field, and then in the pop-up dialog box enter
the client secret between double quotes and click OK to
save the settings.

Project ID Paste the ID of the project hosting the Google BigQuery


service you need to use.
The ID of your project can be found in the URL of the
Google API Console, or by hovering your mouse pointer over
the name of the project in the BigQuery Browser Tool.

Authorization code Paste the authorization code provided by Google for the
access you are building.
To obtain the authorization code, you need to execute
the Job using this component and when this Job pauses
execution to print out an URL address, you navigate to this
address to copy the authorization code displayed.

Dataset Enter the name of the dataset you need to transfer data to.

Table Enter the name of the table you need to transfer data to.
If this table does not exist, select the Create the table if it
doesn't exist check box.

Action on data Select the action to be performed from the drop-down list
when transferring data to the target table. The action may
be:
• Truncate: it empties the contents of the table and
repopulates it with the transferred data.
• Append: it adds rows to the existing data in the table.
• Empty: it populates the empty table.

358
tBigQueryBulkExec

Bulk file already exists in Google storage Select this check box to reuse the authentication
information for Google Cloud Storage connection, then,
complete the File and the Header fields.

Access key and Secret key Paste the authentication information obtained from Google
for making requests to Google Cloud Storage.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project.

File to upload When the data to be transferred to Google BigQuery is not


stored on Google Cloud Storage, browse to, or enter the
path to it.

Bucket Enter the name of the bucket, the Google Cloud Storage
container, which holds the data to be transferred to Google
BigQuery.

File Enter the directory of the data stored on Google Cloud


Storage and to be transferred to Google BigQuery. This data
must be stored directly under the bucket root. For example,
enter gs://my_bucket/my_file.csv.
If the data is not on Google Cloud Storage, this directory
is used as the intermediate destination before the data is
transferred to Google BigQuery.

Header Set values to ignore the header of the transferred data. For
example, enter 0 to ignore no rows for the data without
header.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

token properties File Name Enter the path to, or browse to the refresh token file you
need to use.
At the first Job execution using the Authorization code you
have obtained from Google BigQuery, the value in this field
is the directory and the name of that refresh token file to
be created and used; if that token file has been created and
you need to reuse it, you have to specify its directory and
file name in this field.
With only the token file name entered, Talend Studio
considers the directory of that token file to be the root of
the Studio folder.
For further information about the refresh token, see the
manual of Google BigQuery.

Set the field delimiter Enter character, string or regular expression to separate
fields for the transferred data.

359
tBigQueryBulkExec

Drop table if exists Select the Drop table if exists check box to remove the
table specified in the Table field, if this table already exists.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

tStatCatcher Statistics Select this check box to collect the log data at the
component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This is a standalone component.


This component automatically detects and supports both
multi-regional locations and regional locations. When using
the regional locations, the buckets and the datasets to be
used must be in the same locations.

Related Scenario
For related topic, see Writing data in Google BigQuery on page 371

360
tBigQueryInput

tBigQueryInput
Performs the queries supported by Google BigQuery.
This component connects to Google BigQuery and performs queries in it.

tBigQueryInput Standard properties


These properties are used to configure tBigQueryInput running in the Standard Job framework.
The Standard tBigQueryInput component belongs to the Big Data family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.

Note: If you make changes, the schema automatically


becomes built-in.

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

• The Record type of BigQuery is not supported.


• The columns for table metadata such as the
Description column or the Mode column cannot be
retrieved.
• The Timestamp data from your BigQuery system is
formated to be String data.
• The numeric data of BigQuery is converted to
BigDecimal.

Authentication mode Select the mode to be used to authenticate to your project.


• OAuth 2.0: authenticate the access using OAuth
credentials. When selecting this mode, the parameters
to be defined in the Basic settings view are Client ID,
Client secret and Authorization code.

361
tBigQueryInput

• Service account: authenticate using a Google account


that is associated with your Google Cloud Platform
project. When selecting this mode, the parameter to be
defined in the Basic settings view is Service account
credentials file.

Service account credentials file Enter the path to the credentials file created for the service
account to be used. This file must be stored in the machine
in which your Talend Job is actually launched and executed.
For further information about how to create a Google
service account and obtain the credentials file, see
Getting Started with Authentication from the Google
documentation.

Client ID and Client secret Paste the client ID and the client secret, both created and
viewable on the API Access tab view of the project hosting
the Google BigQuery service and the Cloud Storage service
you need to use.
To enter the client secret, click the [...] button next to the
client secret field, and then in the pop-up dialog box enter
the client secret between double quotes and click OK to
save the settings.

Project ID Paste the ID of the project hosting the Google BigQuery


service you need to use.
The ID of your project can be found in the URL of the
Google API Console, or by hovering your mouse pointer over
the name of the project in the BigQuery Browser Tool.

Authorization code Paste the authorization code provided by Google for the
access you are building.
To obtain the authorization code, you need to execute
the Job using this component and when this Job pauses
execution to print out an URL address, you navigate to this
address to copy the authorization code displayed.

Use legacy SQL and Query Enter the query you need to use.
If the query to be used is the legacy SQL of BigQuery, select
this Use legacy SQL check box. For further information
about this legacy SQL, see Legacy SQL query reference from
the Google BigQuery documentation.

Result size Select the option depending on the volume of the query
result.
By default, the Small option is used, but when the query
result is larger than the maximum response size, you need
to select the Large option.
If the volume of the result is not certain, select Auto.

Advanced settings

token properties File Name Enter the path to, or browse to the refresh token file you
need to use.
At the first Job execution using the Authorization code you
have obtained from Google BigQuery, the value in this field
is the directory and the name of that refresh token file to
be created and used; if that token file has been created and

362
tBigQueryInput

you need to reuse it, you have to specify its directory and
file name in this field.
With only the token file name entered, Talend Studio
considers the directory of that token file to be the root of
the Studio folder.
For further information about the refresh token, see the
manual of Google BigQuery.

Advanced Separator (for number) Select this check box to change the separator used for the
numbers.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

Use custom temporary Dataset name Select this check box to use an existing dataset to which
you have access, instead of creating one, and in the field
that is displayed, enter the name of this dataset. This way,
you avoid rights and permissions issues related to dataset
creation.
This check box is available only when you have selected
Large from the Result size drop-down list in the Basic
settings tab.

tStatCatcher Statistics Select this check box to collect the log data at the
component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This is an input component. It sends the extracted data to


the component that follows it.
This component automatically detects and supports both
multi-regional locations and regional locations. When using
the regional locations, the buckets and the datasets to be
used must be in the same locations.

363
tBigQueryInput

Performing a query in Google BigQuery


This scenario uses two components to perform the SELECT query in BigQuery and present the result
in the Studio.

The following figure shows the schema of the table, UScustomer, we use as example to perform the
SELECT query in.

We will select the State records and count the occurrence of each State among those records.

Linking the components


Procedure
1. In the Integration perspective of Studio, create an empty Job, named BigQueryInput for example,
from the Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2. Drop tBigQueryInput and tLogRow onto the workspace.
3. Connect them using the Row > Main link.

364
tBigQueryInput

Creating the query


Building access to BigQuery

Procedure
1. Double-click tBigQueryInput to open its Component view.

2. Click Edit schema to open the editor

3. Click the button twice to add two rows and enter the names of your choice for each of them in
the Column column. In this scenario, they are: States and Count.
4. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog
box.
5. In the Authentication area, add the authentication information. In most cases, the Service account
mode is more straight-forward and easy to handle.
Authentication mode Description

Service account Authenticate using a Google account that is


associated with your Google Cloud Platform
project.
When selecting this mode, the Service account
credentials file field is displayed. In this field,
enter the path to the credentials file created
for the service account to be used. This file

365
tBigQueryInput

Authentication mode Description


must be stored in the machine in which your
Talend Job is actually launched and executed.
For further information about how to
create a Google service account and obtain
the credentials file, see Getting Started
with Authentication from the Google
documentation.

OAuth 2.0 Authenticate the access using OAuth


credentials. When selecting this mode,
the parameters to be defined in the Basic
settings view are Client ID, Client secret and
Authorization code.
1. Navigate to the Google APIs Console in
your web browser to access the Google
project hosting the BigQuery and the
Cloud Storage services you need to use.
2. Click the API Access tab to open its view.
and copy Client ID, Client secret and
Project ID.
3. In the Component view of the Studio,
paste Client ID, Client secret and Project
ID from the API Access tab view to the
corresponding fields, respectively.
4. In the Run view of the Studio, click Run
to execute this Job. The execution will
pause at a given moment to print out in
the console the URL address used to get
the authorization code.
5. Navigate to this address in your web
browser and copy the authorization code
displayed.
6. In the Component view of tBigQueryOutpu
t, paste the authorization code in the
Authorization Code field.

Writing the query

Procedure
In the Query field, enter select States, count (*) as Count from documentation.
UScustomer group by States

366
tBigQueryInput

Executing the Job


About this task
The tLogRow component presents the execution result of the Job. You can configure the presentation
mode on its Component view.
To do this, double-click tLogRow to open the Component view and in the Mode area, select the Table
(print values in cells of a table) option.

Procedure
To execute this Job, press F6.

Results
Once done, the Run view is opened automatically, where you can check the execution result.

367
tBigQueryOutput

tBigQueryOutput
Transfers the data provided by its preceding component to Google BigQuery.
This component writes the data it receives in a user-specified directory and transfers the data to
Google BigQuery via Google Cloud Storage.

tBigQueryOutput Standard properties


These properties are used to configure tBigQueryOutput running in the Standard Job framework.
The Standard tBigQueryOutput component belongs to the Big Data family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.

Note: If you make changes, the schema automatically


becomes built-in.

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

• The Record type of BigQuery is not supported.


• The columns for table metadata such as the
Description column or the Mode column cannot be
retrieved.
• The Timestamp data from your BigQuery system is
formated to be String data.
• The numeric data of BigQuery is converted to
BigDecimal.

Property type Built-In: You create and store the schema locally for this
component only.

368
tBigQueryOutput

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Local filename Browse to, or enter the path to the file you want to write the
received data in.

Append Select this check box to add rows to the existing data in the
file specified in Local filename.

Authentication mode Select the mode to be used to authenticate to your project.


• OAuth 2.0: authenticate the access using OAuth
credentials. When selecting this mode, the parameters
to be defined in the Basic settings view are Client ID,
Client secret and Authorization code.
• Service account: authenticate using a Google account
that is associated with your Google Cloud Platform
project. When selecting this mode, the parameter to be
defined in the Basic settings view is Service account
credentials file.

Service account credentials file Enter the path to the credentials file created for the service
account to be used. This file must be stored in the machine
in which your Talend Job is actually launched and executed.
For further information about how to create a Google
service account and obtain the credentials file, see
Getting Started with Authentication from the Google
documentation.

Client ID and Client secret Paste the client ID and the client secret, both created and
viewable on the API Access tab view of the project hosting
the Google BigQuery service and the Cloud Storage service
you need to use.
To enter the client secret, click the [...] button next to the
client secret field, and then in the pop-up dialog box enter
the client secret between double quotes and click OK to
save the settings.

Project ID Paste the ID of the project hosting the Google BigQuery


service you need to use.
The ID of your project can be found in the URL of the
Google API Console, or by hovering your mouse pointer over
the name of the project in the BigQuery Browser Tool.

Authorization code Paste the authorization code provided by Google for the
access you are building.
To obtain the authorization code, you need to execute
the Job using this component and when this Job pauses
execution to print out an URL address, you navigate to this
address to copy the authorization code displayed.

Dataset Enter the name of the dataset you need to transfer data to.

Table Enter the name of the table you need to transfer data to.
If this table does not exist, select the Create the table if it
doesn't exist check box.

369
tBigQueryOutput

Action on data Select the action to be performed from the drop-down list
when transferring data to the target table. The action may
be:
• Truncate: it empties the contents of the table and
repopulates it with the transferred data.
• Append: it adds rows to the existing data in the table.
• Empty: it populates the empty table.

Access key and Secret key Paste the authentication information obtained from Google
for making requests to Google Cloud Storage.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project.

Bucket Enter the name of the bucket, the Google Cloud Storage
container, which holds the data to be transferred to Google
BigQuery.

File Enter the directory of the data stored on Google Cloud


Storage and to be transferred to Google BigQuery. This data
must be stored directly under the bucket root. For example,
enter gs://my_bucket/my_file.csv.
If the data is not on Google Cloud Storage, this directory
is used as the intermediate destination before the data is
transferred to Google BigQuery.
Note that this file name must be identical with the name of
the file specified in the Local filename field.

Header Set values to ignore the header of the transferred data. For
example, enter 0 to ignore no rows for the data without
header and set 1 for the data with header at the first row.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

token properties File Name Enter the path to, or browse to the refresh token file you
need to use.
At the first Job execution using the Authorization code you
have obtained from Google BigQuery, the value in this field
is the directory and the name of that refresh token file to
be created and used; if that token file has been created and
you need to reuse it, you have to specify its directory and
file name in this field.
With only the token file name entered, Talend Studio
considers the directory of that token file to be the root of
the Studio folder.
For further information about the refresh token, see the
manual of Google BigQuery.

370
tBigQueryOutput

Field Separator Enter character, string or regular expression to separate


fields for the transferred data.

Drop table if exists Select the Drop table if exists check box to remove the
table specified in the Table field, if this table already exists.

Create directory if not exists Select this check box to create the directory you defined in
the File field for Google Cloud Storage, if it does not exist.

Custom the flush buffer size Enter the number of rows to be processed before the
memory is freed.

Check disk space Select this check box to throw an exception during
execution if the disk is full.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

tStatCatcher Statistics Select this check box to collect the log data at the
component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This is an output component used at the end of a Job.


It receives data from its preceding component such as
tFileInputDelimited, tMap or tMysqlInput.
This component automatically detects and supports both
multi-regional locations and regional locations. When using
the regional locations, the buckets and the datasets to be
used must be in the same locations.

Writing data in Google BigQuery


This scenario uses two components to write data in Google BigQuery.

371
tBigQueryOutput

Linking the components


Procedure
1. In the Integration perspective of Talend Studio , create an empty Job, named WriteBigQuery for
example, from the Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2. Drop tRowGenerator and tBigQueryOutput onto the workspace.
The tRowGenerator component generates the data to be transferred to Google BigQuery in this
scenario. In the real-world case, you can use other components such as tMysqlInput or tMap in the
place of tRowGenerator to design a sophisticated process to prepare your data to be transferred.
3. Connect them using the Row > Main link.

Preparing the data to be transferred


Procedure
1. Double-click tRowGenerator to open its Component view.

372
tBigQueryOutput

2. Click RowGenerator Editor to open the editor.


3. Click three times to add three rows in the Schema table.
4. In the Column column, enter the name of your choice for each of the new rows. For example,
fname, lname and States.
5. In the Functions column, select TalendDataGenerator.getFirstName for the fname row,
TalendDataGenerator.getLastName for the lname row and TalendDataGenerator.getUsState for the
States row.
6. In the Number of Rows for RowGenerator field, enter, for example, 100 to define the number of
rows to be generated.
7. Click OK to validate these changes.

Configuring the access to BigQuery and Cloud Storage


Building access to Cloud Storage

Procedure
1. Double-click tBigQueryOutput to open its Component view.

373
tBigQueryOutput

2. Click Sync columns to retrieve the schema from its preceding component.
3. In the Local filename field, enter the directory where you need to create the file to be transferred
to BigQuery.
4. Navigate to the Google APIs Console in your web browser to access the Google project hosting
the BigQuery and the Cloud Storage services you need to use.
5. Click Google Cloud Storage > Interoperable Access to open its view.
6. In Google storage configuration area of the Component view, paste Access key, Access secret from
the Interoperable Access tab view to the corresponding fields, respectively.
7. In the Bucket field, enter the path to the bucket you want to store the transferred data in. In this
example, it is talend/documentation
This bucket must exist in the directory in Cloud Storage

8. In the File field, enter the directory where in Google Clould Storage you receive and create the file
to be transferred to BigQuery. In this example, it is gs://talend/documentation/biquery_UScustom
er.csv. The file name must be the same as the one you defined in the Local filename field.

374
tBigQueryOutput

Troubleshooting: if you encounter issues such as Unable to read source URI of the file stored in
Google Cloud Storage, check whether you put the same file name in these two fields.
9. Enter 0 in the Header field to ignore no rows in the transferred data.

Building access to BigQuery

Procedure
1. In the Dataset field of the Component view, enter the dataset you need to transfer data in. In this
scenario, it is documentation.
This dataset must exist in BigQuery. The following figure shows the dataset used by this scenario.

2. In the Table field, enter the name of the table you need to write data in, for example, UScustomer.
3. In the Action on data field, select the action. In this example, select Truncate to empty the
contents, if there are any, of target table and to repopulate it with the transferred data.
4. In the Authentication area, add the authentication information. In most cases, the Service account
mode is more straight-forward and easy to handle.
Authentication mode Description

Service account Authenticate using a Google account that is


associated with your Google Cloud Platform
project.
When selecting this mode, the Service account
credentials file field is displayed. In this field,
enter the path to the credentials file created
for the service account to be used. This file
must be stored in the machine in which your
Talend Job is actually launched and executed.
For further information about how to
create a Google service account and obtain
the credentials file, see Getting Started
with Authentication from the Google
documentation.

OAuth 2.0 Authenticate the access using OAuth


credentials. When selecting this mode,
the parameters to be defined in the Basic
settings view are Client ID, Client secret and
Authorization code.
1. Navigate to the Google APIs Console in
your web browser to access the Google

375
tBigQueryOutput

Authentication mode Description


project hosting the BigQuery and the
Cloud Storage services you need to use.
2. Click the API Access tab to open its view.
3. In the Component view of the Studio,
paste Client ID, Client secret and Project
ID from the API Access tab view to the
corresponding fields, respectively.
In the Advanced settings tab, see the file
path in the token properties File Name
field. The Studio automatically generates
this file during the first successful login
and stores all future successful logins in it.
4. In the Run view of the Studio, click Run
to execute this Job. The execution will
pause at a given moment to print out in
the console the URL address used to get
the authorization code.
5. Navigate to this address in your web
browser and copy the authorization code
displayed.
6. In the Component view of tBigQueryOutpu
t, paste the authorization code in the
Authorization Code field.

5. If you have been using the OAuth 2.0 authentication mode, in the Action on data field, select
the action to be performed on your data. In this example, select Truncate to empty the contents,
if there are any, of target table and to repopulate it with the transferred data. If your are using
Service account, ignore this step.
If the table to be used does not exist in BigQuery, select Create the table if it doesn't exist.

Executing the Job


Procedure
Press F6.

Results
Once done, the Run view is opened automatically, where you can check the execution result.

376
tBigQueryOutput

The data is transferred to Google BigQuery.

377
tBigQueryOutput

378
tBigQueryOutputBulk

tBigQueryOutputBulk
Creates a .txt or .csv file for the data of large size so that you can process it according to your
needs before transferring it to Google BigQuery.
The tBigQueryOutputBulk and tBigQueryBulkExec components are generally used together as parts
of a two step process. In the first step, an output file is generated. In the second step, this file is used
to feed a dataset. These two steps are fused together in the tBigQueryOutput component, detailed
in a separate section. The advantage of using two separate components is that the data can be
transformed before it is loaded in the dataset.
This component writes given data into a .txt or .csv file, ready to be transferred to Google
BigQuery.

tBigQueryOutputBulk Standard properties


These properties are used to configure tBigQueryOutputBulk running in the Standard Job framework.
The Standard tBigQueryOutputBulk component belongs to the Big Data family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.

Note: If you make changes, the schema automatically


becomes built-in.

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

• The Record type of BigQuery is not supported.


• The columns for table metadata such as the
Description column or the Mode column cannot be
retrieved.
• The Timestamp data from your BigQuery system is
formated to be String data.

379
tBigQueryOutputBulk

• The numeric data of BigQuery is converted to


BigDecimal.

File name Browse, or enter the path to the .txt or .csv file you need to
generate.

Append Select the check box to write new data at the end of
the existing data. Otherwise, the existing data will be
overwritten.

Advanced settings

Field Separator Enter character, string or regular expression to separate


fields for the transferred data.

Create directory if not exists Select this check box to create the directory you defined in
the File field for Google Cloud Storage, if it does not exist.

Custom the flush buffer size Enter the number of rows to be processed before the
memory is freed.

Check disk space Select the this check box to throw an exception during
execution if the disk is full.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

tStatCatcher Statistics Select this check box to collect the log data at the
component level/

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This is an output component which needs the data provided
by its preceding component.

380
tBigQueryOutputBulk

This component automatically detects and supports both


multi-regional locations and regional locations. When using
the regional locations, the buckets and the datasets to be
used must be in the same locations.

Related Scenario
For related topic, see Writing data in Google BigQuery on page 371

381
tBigQuerySQLRow

tBigQuerySQLRow
Connects to Google BigQuery and performs queries to select data from tables row by row or create or
delete tables in Google BigQuery.

tBigQuerySQLRow Standard properties


These properties are used to configure tBigQuerySQLRow running in the Standard Job framework.
The Standard tBigQueryInput component belongs to the Big Data family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Built-In: You create and store the schema locally for this
component only.

Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Authentication mode Select the mode to be used to authenticate to your project.


• OAuth 2.0: authenticate the access using OAuth
credentials. When selecting this mode, the parameters
to be defined in the Basic settings view are Client ID,
Client secret and Authorization code.
• Service account: authenticate using a Google account
that is associated with your Google Cloud Platform
project. When selecting this mode, the parameter to be
defined in the Basic settings view is Service account
credentials file.

Service account credentials file Enter the path to the credentials file created for the service
account to be used. This file must be stored in the machine
in which your Talend Job is actually launched and executed.

382
tBigQuerySQLRow

For further information about how to create a Google


service account and obtain the credentials file, see
Getting Started with Authentication from the Google
documentation.

Client ID and Client secret Paste the client ID and the client secret, both created and
viewable on the API Access tab view of the project hosting
the Google BigQuery service and the Cloud Storage service
you need to use.
To enter the client secret, click the [...] button next to the
client secret field, and then in the pop-up dialog box enter
the client secret between double quotes and click OK to
save the settings.

Project ID Paste the ID of the project hosting the Google BigQuery


service you need to use.
The ID of your project can be found in the URL of the
Google API Console, or by hovering your mouse pointer over
the name of the project in the BigQuery Browser Tool.

Authorization code Paste the authorization code provided by Google for the
access you are building.
To obtain the authorization code, you need to execute
the Job using this component and when this Job pauses
execution to print out an URL address, you navigate to this
address to copy the authorization code displayed.

Use legacy SQL and Query Enter the query you need to use.
If the query to be used is the legacy SQL of BigQuery, select
this Use legacy SQL check box. For further information
about this legacy SQL, see Legacy SQL query reference from
the Google BigQuery documentation.

Advanced settings

token properties File Name Enter the path to, or browse to the refresh token file you
need to use.
At the first Job execution using the Authorization code you
have obtained from Google BigQuery, the value in this field
is the directory and the name of that refresh token file to
be created and used; if that token file has been created and
you need to reuse it, you have to specify its directory and
file name in this field.
With only the token file name entered, Talend Studio
considers the directory of that token file to be the root of
the Studio folder.
For further information about the refresh token, see the
manual of Google BigQuery.

Advanced Separator (for number) Select this check box to change the separator used for the
numbers.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

383
tBigQuerySQLRow

Result size Select the option depending on the volume of the query
result.
By default, the Small option is used, but when the query
result is larger than the maximum response size, you need
to select the Large option.
If the volume of the result is not certain, select Auto.

tStatCatcher Statistics Select this check box to collect the log data at the
component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule It can be a starting or an end component. When starting


a Job, it sends the extracted data to the component that
follows it; When ending a Job, it deletes a given table.
This component automatically detects and supports both
multi-regional locations and regional locations. When using
the regional locations, the buckets and the datasets to be
used must be in the same locations.

384
tBonitaDeploy

tBonitaDeploy
Deploys a specific Bonita process to a Bonita Runtime.
This component configures any Bonita Runtime engine and deploys a specific Bonita process (a .bar
file exported from the Bonita solution) to this engine.

tBonitaDeploy Standard properties


These properties are used to configure tBonitaDeploy running in the Standard Job framework.
The Standard tBonitaDeploy component belongs to the Business family.
The component in this framework is available in all Talend products.

Basic settings

Bonita version Select a version number for the Bonita Runtime engine.

Bonita Runtime Environment File Browse to, or enter the path to the Bonita Runtime
environment file.

Note:
This field is displayed only when you select Bonita
version 5.3.1 from the Bonita version list.

Bonita Runtime Home Browse to, or enter the path to the Bonita Runtime
environment directory.

Note:
This field is displayed only when you select Bonita
version 5.6.1 from the Bonita version list.

Bonita Runtime Jaas File Browse to, or enter the path to the Bonita Runtime jaas file.

Bona Runtime logging file Browse to, or enter the path to the Bonita Runtime logging
file.

Login Module Type in the name of login module for logging in Bonita
Runtime engine which is defined in the Bonita Runtime jaas
file.

Business Archive Browse to, or enter the path to the Bonita process .bar file
you want to use.

User name Type in your user name used to log in Bonita studio.

Password Type in your password used to log in Bonita studio.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

385
tBonitaDeploy

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ProcessDefinitionUUID: the identifier number of the process


being deployed. This is a Flow variable and it returns a
string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Usually used as a stand-alone component.


To use this component, you have to manually download the
Bonita solution you need to use.

Connections Outgoing links (from this component to another):


Trigger: Run if; On Component Ok; On Component Error, On
Subjob Ok, On Subjob Error.

Incoming links (from one component to this one):


Trigger: Run if, On Component Ok, On Component Error, On
Subjob Ok, On Subjob Error

For further information regarding connections, see


Connection types in Talend Studio User Guide.

Limitation The Bonita Runtime environment file, the Bonita Runtime


jaas file and the Bonita Runtime logging file must be
all stored on the excution server of the Job using this
component.

Related Scenario
For related topic, see Executing a Bonita process via a Talend Job on page 390.

386
tBonitaInstantiateProcess

tBonitaInstantiateProcess
Starts an instance for a specific process deployed in a Bonita Runtime engine.
This component instantiates a process already deployed in a Bonita Runtime engine.

tBonitaInstantiateProcess Standard properties


These properties are used to configure tBonitaInstantiateProcess running in the Standard Job
framework.
The Standard tBonitaInstantiateProcess component belongs to the Business family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
The schema of this component is read-only. You can click
Edit schema to view the schema.
In this component the schema is related to the Module
selected.

Note:
The ProcessInstanceUUID column is pre-defined in the
schema of this component, reserved for the identifier
number of the process instance being created.

Bonita Client Mode Select the client mode you want to use to instantiate a
Bonita process.
For more information about all the Bonita client modes, see
Bonita's manuals.

URL Enter the URL of the Bonita Web application server you
need to access for the process instantiation.
This field is available only in the HTTP client mode.

Auth Username and Auth Password Enter the authentication details used to connect to the
Bonita Web application server as technical user.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
The default authentication information is provided in these
fields. For further information about them, see Bonita's
manuals.
These fields are available only in the HTTP client mode.

Bonita version Select the version number of the Bonita Runtime engine to
be used.

387
tBonitaInstantiateProcess

This field is available only in the Java client mode.

Bonita Runtime Environment File Browse to, or enter the path to the Bonita Runtime
environment file.
This field is available only in the Java client mode.

Note:
This field is displayed only when you select Bonita
version 5.3.1 from the Bonita version list.

Bonita Runtime Home Browse to, or enter the path to the Bonita Runtime
environment directory.

Note:
This field is displayed only when you select Bonita
version 5.6.1 from the Bonita version list.

Bonita Runtime Jaas File Browse to, or enter the path to the Bonita Runtime jaas file.
This field is available only in the Java client mode.

Bonita Runtime logging file Browse to, or enter the path to the Bonita Runtime logging
file.
This field is available only in the Java client mode.

Use Process ID Select this check box to instantiate an existing process.


Once checked, the Process definition ID field is activated in
which you can enter the Definition ID of this process
This field is available only in the Java client mode.

Note:
The process definition ID is created when the process is
deployed into the Bonita Runtime engine.

Process Name and Process Version Enter the ID information of a specific process you want
to instantiate. This information is used to automatically
generate the ID of this process.
This field is available in both of the Java client mode and
the HTTP client mode.

User name Type in your user name used to instantiate this process.
This filed is available in both of the Java client mode and
the HTTP client mode.

Password Type in your password used to instantiate this process.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
This field is available only in the Java client mode.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

388
tBonitaInstantiateProcess

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ProcessInstanceUUID: the identifier number of the process


instance being created. This is a Flow variable and it returns
a string. It can also be retrieved over the Row> Main output
link.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Usually used as a stand-alone component or as an output


component.
To use this component, you have to manually download the
Bonita solution you need to use.

Connections Outgoing links (from this component to another):


Row: Main (providing the output parameters from this
process)
Trigger: Run if; On Component Ok; On Component Error, On
Subjob Ok, On Subjob Error.

Incoming links (from one component to this one):


Row: Main (providing the input parameters to this process)
Trigger: Run if, On Component Ok, On Component Error, On
Subjob Ok, On Subjob Error

For further information regarding connections, see


Connection types in Talend Studio User Guide.

Limitation The Bonita Runtime environment file, the Bonita Runtime


jaas file and the Bonita Runtime logging file must be
all stored on the execution server of the Job using this
component.

389
tBonitaInstantiateProcess

Executing a Bonita process via a Talend Job


This scenario describes a Job that deploys a Bonita process into the Bonita Runtime and executes this
process, in which a personnel request is treated.

The Job in this scenario uses three components.


• tBonitaDeploy: this component deploys a Bonita process into the Bonita Runtime.
• tFixedFlowInput: this component generates the schema used as execution parameters of this
deployed process.
• tBonitaInstantiateProcess: this component executes this deployed process.
Before beginning to replicate this schema, prepare your Bonita.bar file. You need to manually export
this file from the Bonita system and then deploy it into the Bonita Runtime engine, using, for example,
tBonitaDeploy as presented later in this scenario. In this scenario, this file is TEST--4.0.bar. Once
deployed, this process can be checked via the Bonita interface.

Setting up the Job


Procedure
1. Drop tBonitaDeploy, tFixedFlowInput and tBonitaInstantiateProcess onto the design workspace.
2. Right-click tBonitaDeploy and connect tBonitaDeploy to tFixedFlowInput using a Trigger> On
Subjob Ok connection.
3. Right-click tFixedFlowInput and connect this component to tBonitaInstantiateProcess using a
Row > Main connection.

390
tBonitaInstantiateProcess

Configuring the deployment of the process


About this task
To replicate this scenario, proceed as follows:

Procedure
1. Double-click tBonitaDeploy to open its Basic settings view.

2. Select Bonita version 5.3.1 from the Bonita version list. The version you select should be in sync
with the version number of the Bonita Runtime engine you are using.
3. In the Bonita Runtime Configuration area, browse to the Bonita Runtime variable files. In the
Bonita Runtime Environment file field, browse to the bonita-environnement.xml file; in the Bonita
Runtime Jaas File field, browse to the jaas-standard.cfg file; in the Bonita Runtime Logging File
field, browse to the logging.properties file.
For users based on Bonita version 5.2.3, only the Bonita Runtime Jaas File field and the Bonita
Runtime Logging File field need to be filled.

For users based on Bonita version 5.6.1, in the Bonita Runtime Home field, browse to the Bonita
Runtime environment directory.

391
tBonitaInstantiateProcess

4. In the Business Archive field, browse to the Bonita .bar file that is the process exported from your
Bonita system and will be deployed into the Bonita Runtime engine.
5. In the Username and the Password fields, type in your authentication information to connect to
your Bonita.

Configuring the input flow


Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.

2. Click the three-dot button next to Edit schema to open the schema editor.

392
tBonitaInstantiateProcess

3. Click the plus button to add one row and rename it as Name.
This name is identical with the parameter set in Bonita to execute the same process. This way,
Bonita can recognize this column as valid parameter and read its value to instantiate this process.
4. Click OK.
5. In the Mode area of the Basic settings view, select the Use inline table option and click the plus
button to add one row in the table.
6. In the inline table, click the added row and type in the person's name from your personnel
between the quotation marks: ychen, whose request will be treated by this deployed process.

Configuring the Basic settings of tBonitaInstantiateProcess


Procedure
1. Double-click tBonitaInstantiateProcess to open its Basic settings view.

2. Select Bonita version 5.3.1 from the Bonita version list. The version you select should be in sync
with the version number of the Bonita Runtime engine you are using.
3. In the Bonita Runtime Configuration area, browse to the Bonita Runtime variable files. In the
Bonita Runtime Environment file field, browse to the bonita-environnement.xml file; in the Bonita
Runtime Jaas File field, browse to the jaas-standard.cfg file; in the Bonita Runtime Logging File
field, browse to the logging.properties file.
For users based on Bonita version 5.2.3, only the Bonita Runtime Jaas File field and the Bonita
Runtime Logging File field need to be filled.

393
tBonitaInstantiateProcess

For users based on Bonita version 5.6.1, in the Bonita Runtime Home field, browse to the Bonita
Runtime environment directory.

4. Select the Use Process ID check box to activate the Process Definition Id field.
5. In the Process Definition Id field, click between the quotation marks and press Ctrl+space to open
the auto-completion drop-down list containing the available global variables for this Job.
6. Double-click the variable you need use to add it between the quotation marks. In this scenario,
double-click tBonitaDeploy_1_ProcessDefinitionUUID, which retrieves the process definition ID of
the process being deployed by tBonitaDeploy.

Note:
You can as well clear the Use Process ID check box to activate the Process name and the
Process version fields and enter the corresponding information in the two fields. tBonitaInstant
iateProcess concatenates the process name and the process version you type in to construct the
process definition ID.

7. In the Username and Password fields, enter the username and password to connect to your Bonita.

394
tBonitaInstantiateProcess

Executing the Job


Procedure
Press F6 to run the Job.

Results

This process is deployed into the Bonita Runtime and an instance is created for the personnel
requests.

Outputting the process instance UUID over the Row > Main
link
This scenario deploys a Bonita process into the Bonita Runtime, starts an instance and outputs the
process instance UUID via the Row > Main link.

Linking the components


Procedure
1. Drop tBonitaDeploy, tBonitaInstantiateProcess and tLogRow onto the workspace.
2. Rename tBonitaDeploy as deploy_process, tBonitaInstantiateProcess as start_instance and
tLogRow as show_instance_uuid.
3. Link tBonitaDeploy to tBonitaInstantiateProcess using the OnSubjobOk trigger.
4. Link tBonitaInstantiateProcess to tLogRow using a Row > Main connection.

395
tBonitaInstantiateProcess

Configuring the components


Procedure
1. Double-click tBonitaDeploy to open its Basic settings view.

2. In the Bonita Runtime Jaas File field, specify the path and name of the jaas file.
In the Bonita Runtime Logging File field, specify the path and name of the logging file.
In the Business Archive field, specify the path and name of the Bonita process.
3. In the Username and Password fields, enter the user authentication credentials.
4. Double-click tBonitaInstantiateProcess to open its Basic settings view.

5. In the Bonita Runtime Jaas File field, specify the path and name of the jaas file.
In the Bonita Runtime Logging File field, specify the path and name of the logging file.
6. In the Process Name and Process Version fields, enter the process information.
7. In the Username and Password fields, enter the user authentication credentials.
8. Double-click tLogRow to open its Basic settings view.
9. In the Mode area, select Table (print values in cells of a table for better display.

396
tBonitaInstantiateProcess

Executing the Job


Procedure
1. Press Ctrl+S to save the Job.
2. Press F6 to run the Job.

As shown above, the instance is created and the UUID is output.

397
tBoxConnection

tBoxConnection
Creates a Box connection that the other Box components can reuse.
This component creates the connection to a given Box account.

tBoxConnection Standard properties


These properties are used to configure tBoxConnection running in the Standard Job framework.
The Standard tBoxConnection component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.

Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.

Access token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.

Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.

Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

398
tBoxConnection

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is used standalone as a subJob to create


the Box connection to be used. In a Job design, it is often
connected to the other Box components using the Trigger
links such as OnSubjobOk link.

Related scenario
For a related scenario, see Uploading and downloading files from Box on page 411.

399
tBoxCopy

tBoxCopy
Copies or moves a given folder or file from Box.

tBoxCopy Standard properties


These properties are used to configure tBoxCopy running in the Standard Job framework.
The Standard tBoxCopy component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Connection/Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.

Connection/Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.

Connection/Access Token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.

Connection/Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.

Connection/Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.

Move Directory Select this check box to move a directory in Box.

Copy Directory Select this check box to copy a directory in Box.

File Name Enter file name with the path in Box you want to copy.

Source Directory This option appears when the Move Directory or Copy
Directory check box is selected. Enter the source directory
in Box to be moved or copied.

400
tBoxCopy

Destination Directory Enter the destination directory in Box where the specified
file or directory will be copied or moved.

Rename Select this check box to rename the file or directory to be


copied. When copying a file, specify the new file name in
the Destination File Name field. When copying a directory,
enter the new directory name in the New Directory Name
field.

Remove Source File Select this check box to remove the source file during the
copy action.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Note that the schema of this component is read-only with
four columns named destinationFilePath, destinationFil
eName, sourceDirectory, and destinationDirectory.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
DESTINATION_FILENAME: the destination file name. This is
an After variable and it returns a string.
DESTINATION_FILEPATH: the destination file path. This is
an After variable and it returns a string.
SOURCE_DIRECTORY: the source directory. This is an After
variable and it returns a string.
DESTINATION_DIRECTORY: the destination directory. This is
an After variable and it returns a string.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is usually used standalone in a subJob to


copy or move data from Box.

401
tBoxCopy

Related scenarios
No scenario is available for the Standard version of this component yet.

402
tBoxDelete

tBoxDelete
Removes a given folder or file from Box.
This component connects to a given Box account and removes a specified file or folder.

tBoxDelete Standard properties


These properties are used to configure tBoxDelete running in the Standard Job framework.
The Standard tBoxDelete component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Connection/Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.

Connection/Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.

Connection/Access Token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.

Connection/Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.

Connection/Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.

Path Enter the path on Box pointing to the folder or the file you
need to remove.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Note that the schema of this component is read-only with
one column named filepath.

403
tBoxDelete

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
REMOVED_PATH: the path of the folder or file being deleted
on Box. This is a Flow variable and it returns a string.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is usually used standalone in a subJob to


remove data from Box.

Related scenarios
No scenario is available for the Standard version of this component yet.

404
tBoxGet

tBoxGet
Downloads a selected file from a Box account.
This component connects to a given Box account and downloads files to a specified local directory.

tBoxGet Standard properties


These properties are used to configure tBoxGet running in the Standard Job framework.
The Standard tBoxGet component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Connection/Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.

Connection/Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.

Connection/Access Token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.

Connection/Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.

Connection/Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.

Path Enter the path on Box pointing to the file you need to
download.

Save as file Select this check box to display the Save To field and br
owse to, or enter the local directory where you want to store
the downloaded file. The existing file, if any, is replaced.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next

405
tBoxGet

component. When you create a Spark Job, avoid the reserved


word line when naming the fields.
Note that the schema of this component is read-only with
two columns named fileName and content.
The Schema field is not available when you have selected
the Save as file check box.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
FILE_NAME: the name of the file being processed. This is a
Flow variable and it returns a string.
INPUT_STREAM: the content of the file being fetched. This
is a Flow variable and it returns an InputStream.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used alone or along with other
components via the Iterate link or a trigger link such as
OnSubjobOk.

Related scenario
For a related scenario, see Uploading and downloading files from Box on page 411.

406
tBoxList

tBoxList
Lists the files stored in a specified directory in Box.
This component reads the file(s) in Box held in the directory you specify and lists the metadata and
the contents of that file or those files.

tBoxList Standard properties


These properties are used to configure tBoxList running in the Standard Job framework.
The Standard tBoxList component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Connection/Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.

Connection/Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.

Connection/Access Token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.

Connection/Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.

Connection/Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.

Path Enter the path pointing to the folder you need to list the
files from, or enter the path pointing to the exact file you
need to read.

List type Select the type of data you need to list from the specified
path, Files, Folders, or Both.

Include subdirectories Select this check box to list files from any existing sub-
folders in addition to the files in the directory defined in
the Path field.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next

407
tBoxList

component. When you create a Spark Job, avoid the reserved


word line when naming the fields.
Note that the schema of this component is read-only with
six columns named name, path, lastModified, size, id, and
type.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NAME: the name of the remote file being processed. This is
a Flow variable and it returns a string.
FILE_PATH: the path pointing to the folder or the file being
processed on Box. This is a Flow variable and it returns a
string.
FILE_DIRECTORY: the directory of the folder or the file
being processed on Box. This is a Flow variable and it
returns a string.
LAST_MODIFIED: the timestamp of the last modification
of the file being processed. This is a Flow variable and it
returns a long.
SIZE: the volume of the file being processed. This is a Flow
variable and it returns a long.
ID: the ID of the folder or the file being processed on Box.
This is a Flow variable and it returns a string.
TYPE: the type of the objects being processed on Box, file or
folder. This is a Flow variable and it returns a string.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is typically used standalone.

Related scenarios
No scenario is available for the Standard version of this component yet.

408
tBoxPut

tBoxPut
Uploads files to a Box account.
This component uploads data to Box from either a local file or a given data flow.

tBoxPut Standard properties


These properties are used to configure tBoxPut running in the Standard Job framework.
The Standard tBoxPut component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Connection/Client Key Enter the client key required by Box to access the Box API.
To obtain the client key and client secret you need to create
an account at https://developers.box.com/ and then create
a Box App under the Box account to be used. The client
key and client secret can be obtained from the account
application settings.

Connection/Client Secret Enter the client secret required by Box to access the Box
API. To obtain the client key and client secret you need
to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used.
The client key and client secret can be obtained from the
account application settings.

Connection/Access Token Enter the access token required by Box to access a Box
account and operate it. For how to get the access token and
refresh token, check the Box documentation you can access
from https://developers.box.com/.

Connection/Refresh Token Enter the refresh token required by Box to refresh the
access token automatically. For how to get the access token
and refresh token, check the Box documentation you can
access from https://developers.box.com/.

Connection/Use HTTP proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.

Remote Path Enter the path pointing to the file you need to write
contents in. This file will be created on the fly if it does not
exist.

Replace if Existing Select this check box to use the uploaded file to replace the
existing one.

Upload mode Select the upload mode to be used:

409
tBoxPut

• Upload incoming content as file: Select this radio


button to read data directly from the input flow of the
preceding component and write the data into the file
specified in the Remote Path field.
• Upload local file: Select this radio button to upload
a locally stored file to Box. In the File field that is
displayed, you need to enter the path or browse to
this file.
• Expose as OutputStream: Select this check box to
expose the output stream of this component, which
can be used by the other components to write the file
content. For example, you can use the Use output
stream feature of the tFileOutputDelimited component
to feed a given tBoxPut's exposed output stream. For
further information, see tFileOutputDelimited on page
1113.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Note that the schema of this component is read-only with
a single column named content and it receives data from
the content column of its input schema only. This means
that you must use a content column in the input data flow
to carry the data to be uploaded. This type of column is typ
ically provided by the tFileInputRaw component. For further
information, see tFileInputRaw on page 1085.
The Schema field is not available when you have selected
the Expose as OutputStream or the Upload local file upload
mode.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

410
tBoxPut

Usage

Usage rule This component is used either standalone in a subJob to


directly upload a local file to Box or as an end component
of a Job flow to upload given data being handled in this
flow.

Uploading and downloading files from Box


In this scenario, a three-component Job consisting of three subJobs is created to upload a file to Box
and then download a file from Box to the local file system.

Before replicating this scenario, you need to create an account at https://developers.box.com/ and
then create a Box App under the Box account to be used. For more information about Box App, see
https://app.box.com/developers/services/edit/. The client key and client secret can be obtained from
the account application settings. For how to get the access token and refresh token, check the Box
documentation you can access from https://developers.box.com/.

Linking the components


Procedure
1. In the Integration perspective of the Studio, create an empty Job from the Job Designs node in
the Repository tree view.
For further information about how to create a Job, see Talend Studio User Guide.
2. In the workspace, enter the name of the component to be used and select this component from
the list that opens. In this scenario, the components are tBoxConnection, tBoxPut and tBoxGet.
3. Connect tBoxConnection to tBoxPut using the Trigger > OnSubjobOk link.
4. Connect tBoxPut to tBoxGet using the Trigger > OnSubjobOk link.

Configuring the components


Procedure
1. Double-click tBoxConnection to open its Component view.

411
tBoxPut

2. Enter the client key, client secret, access token and refresh token in double quotation marks in
the relevant fields for accessing the Box account.
3. Double-click tBoxPut to open its Component view.

4. Select the Use Existing Connection check box to reuse the connection created by tBoxConnection.
In the Remote Path field, enter the destination path where you want to upload the file.
In the Upload mode area, select Upload Local File. In the File field, enter the file path or browse to
the file you want to upload.
5. Double-click tBoxGet to open its Component view.

6. Select the Use Existing Connection check box to reuse the connection created by tBoxConnection.
In the Path field, enter the path of the file that you want to download.
Select the Save As File check box. In the Save To field, enter the file path where to save the file
on the local file system.
7. Save the Job.

Executing the Job


Execute the Job by pressing F6 or clicking the Run button on the Run tab.
The local file, hello.txt in this example, is uploaded in your Box.

412
tBoxPut

The file box.txt from Box is downloaded to the local file system.

413
tBufferInput

tBufferInput
Retrieves data bufferized via a tBufferOutput component, for example, to process it in another
subJob.
This component retrieves bufferized data in order to process it in a second subJob.

tBufferInput Standard properties


These properties are used to configure tBufferInput running in the Standard Job framework.
The Standard tBufferInput component belongs to the Misc family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either built-in or remote in the
Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
In the case of tBufferInput, the column position is more
important than the column label as this will be taken into
account.

  Built-in: You create the schema and store it locally for this
component only. Related topic: see Talend Studio User
Guide.

  Repository: You have already created the schema and stored


it in the Repository, hence can be reused in various projects
and Job designs. Related topic: see Talend Studio User Guide.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the

414
tBufferInput

Die on error check box is cleared, if the component has this


check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is the start component of a secondary Job


which is triggered automatically at the end of the main Job.

Retrieving bufferized data


This scenario describes a Job that retrieves bufferized data from a subJob and displays it on the
console.

• Drop the following components from the Palette onto the design workspace: tFileInputDelimited
and tBufferOutput.
• Select the tFileInputDelimited and on the Basic Settings tab of the Component view, set the
access parameters to the input file.

• In the File Name field, browse to the delimited file holding the data to be bufferized.
• Define the Row and Field separators, as well as the Header.

415
tBufferInput

• Click [...] next to the Edit schema field to describe the structure of the file.

• Describe the Schema of the data to be passed on to the tBufferOutput component.


• Select the tBufferOutput component and set the parameters on the Basic Settings tab of the
Component view.

Note:
Generally speaking, the schema is propagated from the input component and automatically fed into
the tBufferOutput schema. But you can also set part of the schema to be bufferized if you want to.

• Drop the tBufferInput and tLogRow components from the Palette onto the design workspace
below the subJob you just created.
• Connect tFileInputDelimited and tBufferInput via a Trigger > OnSubjobOk link and connect
tBufferInput and tLogRow via a Row > Main link.
• Double-click tBufferInput to set its Basic settings in the Component view.
• In the Basic settings view, click [...] next to the Edit Schema field to describe the structure of the
file.

• Use the schema defined for the tFileInputDelimited component and click OK.
• The schema of the tBufferInput component is automatically propagated to the tLogRow.
Otherwise, double-click tLogRow to display the Component view and click Sync column.
• Save your Job and press F6 to execute it.

The standard console returns the data retrieved from the buffer memory.

416
tBufferOutput

tBufferOutput
Collects data in a buffer in order to access it later via webservice for example.
tBufferOutput has been designed to be exported as Webservice in order to access data on the web
application server directly. For more information, see Talend Studio User Guide.

tBufferOutput Standard properties


These properties are used to configure tBufferOutput running in the Standard Job framework.
The Standard tBufferOutput component belongs to the Misc family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either built-in or remote in the
Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
In the case of the tBufferOutput, the column position is
more important than the column label as this will be taken
into account.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
designs. Related topic: see Talend Studio User Guide.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

417
tBufferOutput

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is not startable (green background) and it


requires an output component.

Buffering data
This scenario describes an intentionally basic Job that bufferizes data in a child job while a parent Job
simply displays the bufferized data onto the standard output console. For an example of how to use
tBufferOutput to access output data directly on the Web application server, see Buffering output data
on the webapp server on page 421.

• Create two Jobs: a first Job (BufferFatherJob) runs the second Job and displays its content onto the
Run console. The second Job (BufferChildJob) stores the defined data into a buffer memory.
• On the first Job, drop the following components: tRunJob and tLogRow from the Palette to the
design workspace.
• On the second Job, drop the following components: tFileInputDelimited and tBufferOutput the
same way.
Let's set the parameters of the second Job first:
• Select the tFileInputDelimited and on the Basic Settings tab of the Component view, set the
access parameters to the input file.

418
tBufferOutput

• In File Name, browse to the delimited file whose data are to be bufferized.
• Define the Row and Field separators, as well as the Header.

• Describe the Schema of the data to be passed on to the tBufferOutput component.


• Select the tBufferOutput component and set the parameters on the Basic Settings tab of the
Component view.

• Generally the schema is propagated from the input component and automatically fed into the
tBufferOutput schema. But you could also set part of the schema to be bufferized if you want to.
• Now on the other Job (BufferFatherJob) Design, define the parameters of the tRunJob component.

• Edit the Schema if relevant and select the column to be displayed. The schema can be identical to
the bufferized schema or different.
• You could also define context parameters to be used for this particular execution. To keep it
simple, the default context with no particular setting is used for this use case.
Press F6 to execute the parent Job. The tRunJob looks after executing the child Job and returns the
data onto the standard console:

419
tBufferOutput

Buffering data to be used as a source system


This scenario describes a Job that buffers data to be used as a source system by MDM.
An MDM process will invoke this Job to retrieve data by looking up the defined elements (agent region
values) from the buffered data. The process can then display the retrieved data in the Talend MDM
Web User Interface without really saving them in the MDM hub.

Creating a data buffer Job


Procedure
1. Create a Job named DetermineRegion.
2. Drop the following components from the Palette onto the design workspace: tJava,
tFixedFlowInput, and tBufferOutput.
3. Connect tJava to tFixedFlowInput using a Trigger > On Component Ok link.
4. Connect tFixedFlowInput to tBufferOutput using a Row > Main link.

Configuring the Job to buffer data


Procedure
1. In the Contexts view, add a new context variable with the Name of xmlInput and the Type of
String.
In this example, the context variable xmlInput of the Job will be specified in the MDM process
which wants to invoke this Job.
You can search for further information about MDM processess on Talend Help Center (https://
help.talend.com).

420
tBufferOutput

If you cannot find the Contexts view, go to Window > Show view > Talend, and select Contexts.
For more information about how to define context variables, see Talend Studio User Guide.
You can search for further information about how to define context variables on Talend Help
Center (https://help.talend.com).
2. Double-click the tJava component to open its Component view, and in the Code area, enter the
code according to your needs.
In this example, enter System.out.println("######################
#######"+context.xmlInput);.
3. Double-click the tFixedFlowInput component to open its Component view.
4. Click the [...] button next to Edit schema to open the dialog box and define the schema for the
data to be used by the source system.
In this example, add one new column col0 of the type String.
5. After the schema is defined, click Yes in the Propagate dialog box to propagate the schema
changes to the following component tBufferOutput.
6. In the Number of rows field, enter 1.
7. In the Mode area, select Use Single Table and enter "Paris" in the Value column that
corresponds to the column col0 you have defined.
In this example, the value of the col0 provides the agent region information to be retrieved by
MDM.
8. Double-click the tBufferOutput component to open its Component view, and then make sure its
schema is synchronized with the previous component tFixedFlowInput.
9. Run the Job and make sure the execution succeeds.

Buffering output data on the webapp server


This scenario describes a Job that is called as a Webservice and stores the output data in a buffer
directly on the server of the Web application. This scenario creates first a Webservice oriented Job
with context variables, and next exports the Job as a Webservice.

Creating a Job
Procedure
1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput and
tBufferOutput.
2. Connect tFixedFlowInput to tBufferOutput using a Row Main link.

421
tBufferOutput

Creating a context variable


About this task
For this scenario, you will define two context variables: nb_lines and lastname. The first variable will
set the number of lines the tFixedFlowInput component will generate, and the second one will set
the last name to display in the output list. For more information about how to create and use context
variables, see Talend Studio User Guide.
To define the two context variables:

Procedure
1. Select the Contexts tab view of your Job, and click the [+] button at the bottom of the view to add
two variables, respectively nb_lines of type Integer and lastname of type String.
2. In the Value field for the variables, set the last name to be displayed and the number of lines to
be generated, respectively Ford and 3 in this example.

Configuring the input data


Procedure
1. In the design workspace, select tFixedFlowInput.
2. Click the Component tab to define the basic settings for tFixedFlowInput.
3. Click the three-dot [...] button next to Edit Schema to describe the data structure you want to
create from internal variables. In this scenario, the schema is made of three columns, now of type
Date, firstname of type String, and lastname of type String.

422
tBufferOutput

4. Click OK to close the dialog box and accept propagating the changes when prompted by the
system. The three defined columns display in the Values panel of the Basic settings view of
tFixedFlowInput.

5. Click in the Value cell of each of the first two defined columns and press Ctrl+Space to access the
global variable list.
6. From the global variable list, select Talend Date.getCurrentDate() and talendDatagenerator.getFirst
Name, for the now and firstname columns respectively.
7. Click in the Value cell of lastname column and press Ctrl+Space to access the global variable list.
8. From the global variable list, select context.lastname, the context variable you created for the last
name column.

Building your Job as a Webservice


About this task
Before building your Job as a Web service, see Talend Studio User Guide for more information.

Procedure
1. In the Repository tree view, right-click on the above created Job and select Build Job. The Build
Job dialog box appears.

423
tBufferOutput

2. Click the Browse... button to select a directory to archive your Job in.
3. In the Build type panel, select the build type you want to use in the Tomcat webapp directory
(WAR in this example) and click Finish. The Build Job dialog box disappears.
4. Copy the War folder and paste it in a Tomcat webapp directory.

Calling a Job with context variables from a browser


This scenario describes how to call the Job you created in Buffering output data on the webapp server
on page 421 from your browser with/without modifying the values of the context variables.
Type the following URL into your browser: http://localhost:8080//export_job/services/export_job3?
method=runJob where "export_job" is the name of the webapp directory deployed in Tomcat and
"export_job3" is the name of the Job.

Click Enter to execute your Job from your browser.

424
tBufferOutput

The Job uses the default values of the context variables: nb_lines and lastname, that is it generates
three lines with the current date, first name and Ford as a last name.
You can modify the values of the context variables directly from your browser. To call the Job from
your browser and modify the values of the two context variables, type the following URL:
http://localhost:8080//export_job/services/export_job3?method=runJob&arg1=--context_param%20lastna
me=MASSY&arg2=--context_param%20nb_lines=2.
%20 stands for a blank space in the URL language. In the first argument "arg1", you set the value
of the context variable to display "MASSY" as last name. In the second argument "arg2", you set the
value of the context variable to "2" to generate only two lines.
Click Enter to execute your Job from your browser.

425
tBufferOutput

The Job generates two lines with MASSY as last name.

Calling a Job exported as Webservice in another Job


This scenario describes a Job that calls another Job exported as a Webservice using the
tWebServiceInput. This scenario will call the Job created in Buffering output data on the webapp
server on page 421.
• Drop the following components from the Palette onto the design workspace: tWebServiceInput
and tLogRow.
• Connect tWebserviceInput to tLogRow using a Row Main link.

• In the design workspace, select tWebServiceInput.


• Click the Component tab to define the basic settings for tWebServiceInput.

• Set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema to
describe the data structure you want to call from the exported Job. In this scenario, the schema is
made of three columns, now, firstname, and lastname.

426
tBufferOutput

• Click the plus button to add the three parameter lines and define your variables. Click OK to close
the dialog box.
• In the WSDL field of the Basic settings view of tWebServiceInput, enter the URL http://localho
st:8080/export_job/services/export_job3?WSDL where "export_job" is the name od the webapp
directory where the Job to call is stored and "export_job3" is the name of the Job itself.

• In the Method name field, enter runJob.


• In the Parameters panel, Click the plus button to add two parameter lines to define your context
variables.
• Click in the first Value cell to enter the parameter to set the number of generated lines using the
following syntax: --context_param nb_line=3.
• Click in the second Value cell to enter the parameter to set the last name to display using the
following syntax: --context_param lastname=Ford.
• Select tLogRow and click the Component tab to display the component view.
• Set the Basic settings for the tLogRow component to display the output data in a tabular mode.
For more information, see tLogRow on page 1977.
• Save your Job and press F6 to execute it.

427
tBufferOutput

The system generates three columns with the current date, first name, and last name and displays
them onto the log console in a tabular mode.

428
tCassandraBulkExec

tCassandraBulkExec
Improves performance during Insert operations to a Cassandra column family.
The tCassandraOutputBulk and tCassandraBulkExec components are generally used together as parts
of a two step process. In the first step, an SSTable is generated. In the second step, this SSTable
is written into Cassandra. These two steps are fused together in thetCassandraOutputBulkExec
component, detailed in a separate section. The advantage of using two separate components is that
the data can be transformed before it is loaded into Cassandra.
tCassandraBulkExec writes data from an SSTable into Cassandra.

tCassandraBulkExec Standard properties


These properties are used to configure tCassandraBulkExec running in the Standard Job framework.
The Standard tCassandraBulkExec component belongs to the Big Data and the Databases NoSQL
families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

DB Version Select the Cassandra version you are using.Cassandra 2.0.0


only works with JVM1.7.

Host Hostname or IP address of the Cassandra server.

Port Listening port number of the Cassandra server.

Required authentication Select this check box to provide credentials for the
Cassandra authentication.

Username Fill in this field with the username for the Cassandra
authentication.

Password Fill in this field with the password for the Cassandra
authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use configuration file Select this check box and in the field that is displayed,
enter the path, or browse to cassandra.yaml, the main
configuration file for Cassandra.
This way, this component can import and directly use the
configuration from cassandra.yaml, which can contain many
advanced Cassandra properties, such as the properties for
SSL encryption.
When you need to run your Job in different Cassandra
environments, this feature allows your Job to easily switch
between the configurations.

429
tCassandraBulkExec

For further information about this cassandra.yaml file, see


Cassandra configuration.

Keyspace Type in the name of the keyspace into which you want to
write the SSTable.

Column family Type in the name of the column family into which you want
to write the SSTable.

SSTable directory Specify the local directory of the SSTable to be loaded into
Cassandra. Note that the complete path to the SSTable will
be the local directory appended by the specified keyspace
name and column family name.
For example, if you set the local directory to /home/talend/
sstable, and specify testk as the keyspace name and testc as
the column family name, the complete path to the SSTable
will be /home/talend/sstable/testk/testc/.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as a standalone component.

Limitation Currently, the execution of this component ends the entire


Job.

Related scenarios
No scenario is available for the Standard version of this component yet.

430
tCassandraClose

tCassandraClose
Disconnects a connection to a Cassandra server so as to release occupied resources.

tCassandraClose Standard properties


These properties are used to configure tCassandraClose running in the Standard Job framework.
The Standard tCassandraClose component belongs to the Big Data and the Databases NoSQL families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Component List Select an active Cassandra connection to be closed.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is generally used with other Cassandra


components, particularly tCassandraConnection.

Related Scenario
For a scenario in which tCassandraClose is used, see Handling data with Cassandra on page 439.

431
tCassandraConnection

tCassandraConnection
Enables the reuse of the connection it creates to a Cassandra server.
tCassandraConnection opens a connection to a Cassandra server.

tCassandraConnection Standard properties


These properties are used to configure tCassandraConnection running in the Standard Job framework.
The Standard tCassandraConnection component belongs to the Big Data and the Databases NoSQL
families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

DB Version Select the Cassandra version you are using.

Server Type in the IP address or hostname of the Cassandra server.

Port Type in the listening port number of the Cassandra server.

Required authentication Select this check box to enable the database authentication.

Username Fill in this field with the username for the Cassandra
authentication.

Password Fill in this field with the password for the Cassandra
authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use SSL connection Select this check box to enable the SSL or TLS encrypted
connection.
Then you need to use the tSetKeystore component in the
same Job to specify the encryption information.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

432
tCassandraConnection

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is generally used with other Cassandra


components, particularly tCassandraClose.

Related scenario
For a scenario in which tCassandraConnection is used, see Handling data with Cassandra on page
439.

433
tCassandraInput

tCassandraInput
Extracts the desired data from a standard or super column family of a Cassandra keyspace so as to
apply changes to the data.
tCassandraInput allows you to read data from a Cassandra keyspace and send data in the Talend flow.

Mapping tables between Cassandra type and Talend data


type
The first of the following two tables presents the mapping relationships between Cassandra type with
Cassandra API, Datastax, and Talend data type .

Cassandra 2.0 or later versions

Cassandra Type Talend Data Type

Ascii String; Character

BigInt Long

Blob Byte[]

Boolean Boolean

Counter Long

Inet Object

Int Integer; Short; Byte

List List

Map Object

Set Object

Text String; Character

Timestamp Date

UUID String

TimeUUID String

VarChar String; Character

VarInt Object

Boolean Boolean

Float Float

Double Double

434
tCassandraInput

Cassandra Type Talend Data Type

Decimal BigDecimal

Cassandra Hector API ( for Cassandra versions older than 2.0)


The following table presents the mapping relationships between Cassandra type with the Hector API
and Talend data type.

Cassandra Type Talend Data Type

BytesType byte[]

AsciiType String

UTF8Type String

IntegerType Object

Int32Type Integer

LongType Long

UUIDType String

TimeUUIDType String

DateType Date

BooleanType Boolean

FloatType Float

DoubleType Double

DecimalType BigDecimal

tCassandraInput Standard properties


These properties are used to configure tCassandraInput running in the Standard Job framework.
The Standard tCassandraInput component belongs to the Big Data and the Databases NoSQL families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

435
tCassandraInput

Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

DB Version Select the Cassandra version you are using.

API type This drop-down list is displayed only when you have
selected the 2.0 version (deprecated) of Cassandra from the
DB version list. From this API type list, you can either select
Datastax to use CQL 3 (Cassandra Query Language) with
Cassandra, or select Hector (deprecated) to use CQL 2.
Note that the Hector API is deprecated along with the
support for Cassandra V2.0.
Along with the evolution of the CQL commands, the
parameters to be set in the Basic settings view varies.

Host Hostname or IP address of the Cassandra server.

Port Listening port number of the Cassandra server.

Required authentication Select this check box to provide credentials for the
Cassandra authentication.
This check box appears only if you do not select the Use
existing connection check box.

Username Fill in this field with the username for the Cassandra
authentication.

Password Fill in this field with the password for the Cassandra
authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Keyspace Type in the name of the keyspace from which you want to
read data.

Column family Type in the name of the column family from which you want
to read data.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon

436
tCassandraInput

completion and choose this schema metadata again in


the Repository Content window.

Query Enter the query statements to be used to read data from the
Cassandra database.
By default, the query is not case-sensitive. This means that
at runtime, the column names you put in the query are
always taken in lower case. If you need to make the query
case-sensitive, put the column names in double quotation
marks.
The [...] button next to this field allows you to generate the
sample code that shows what the pre-defined variables are
for the data to be read and how these variables can be used.
This feature is available only for the Datastax API of
Cassandra 2.0 (deprecated) or a later version.

Column family type Standard: Column family is of standard type.


Super: Column family is of super type.

Include key in output columns Select this check box to include the key of the column
family in output columns.
• Key column: select the key column from the list.

Row key type Select the appropriate Talend data type for the row key
from the list.

Row key Cassandra type Select the corresponding Cassandra type for the row key
from the list.

Warning:
The value of the Default option varies with the selected
row key type. For example, if you select String from the
Row key type list, the value of the Default option will be
UTF8.

For more information about the mapping table between


Cassandra type and Talend data type, see Mapping tables
between Cassandra type and Talend data type on page
434.

Include super key output columns Select this check box to include the super key of the column
family in output columns.
• Super key column: select the desired super key column
from the list.
This check box appears only if you select Super from the
Column family type drop-down list.

Super column type Select the type of the super column from the list.

Super column Cassandra type Select the corresponding Cassandra type for the super
column from the list.
For more information about the mapping table between
Cassandra type and Talend data type, see Mapping tables
between Cassandra type and Talend data type on page
434.

437
tCassandraInput

Specify row keys Select this check box to specify the row keys of the column
family directly.

Row Keys Type in the specific row keys of the column family in the
correct format depending on the row key type.
This field appears only if you select the Specify row keys
check box.

Key start Type in the start row key of the correct data type.

Key end Type in the end row key of the correct data type.

Key limit Type in the number of rows to be read between the start
row key and the end row key.

Specify columns Select this check box to specify the column names of the
column family directly.

Columns Type in the specific column names of the column family in


the correct format depending on the column type.
This field appears only if you select the Specify columns
check box.

Columns range start Type in the start column name of the correct data type.

Columns range end Type in the end column name of the correct data type.

Columns range limit Type in the number of columns to be read between the start
column and the end column.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

438
tCassandraInput

Usage

Usage rule This component always needs an output link.

Handling data with Cassandra


This scenario applies only to Talend products with Big Data.
This scenario describes a simple Job that reads the employee data from a CSV file, writes the data to
a Cassandra keyspace, then extracts the personal information of some employees and displays the
information on the console.

This scenario requires six components, which are:


• tCassandraConnection: opens a connection to the Cassandra server.
• tFileInputDelimited: reads the input file, defines the data structure and sends it to the next
component.
• tCassandraOutput: writes the data it receives from the preceding component into a Cassandra
keyspace.
• tCassandraInput: reads the data from the Cassandra keyspace.
• tLogRow: displays the data it receives from the preceding component on the console.
• tCassandraClose: closes the connection to the Cassandra server.

439
tCassandraInput

Dropping and linking the components


Procedure
1. Drop the following components from the Palette onto the design workspace: tCassandraConn
ection, tFileInputDelimited, tCassandraOutput, tCassandraInput, tLogRow and tCassandraClose.
2. Connect tFileInputDelimited to tCassandraOutput using a Row > Main link.
3. Do the same to connect tCassandraInput to tLogRow.
4. Connect tCassandraConnection to tFileInputDelimited using a Trigger > OnSubjobOk link.
5. Do the same to connect tFileInputDelimited to tCassandraInput and tCassandraInput to
tCassandraClose.
6. Label the components to better identify their functions.

Configuring the components


Opening a Cassandra connection

Procedure
1. Double-click the tCassandraConnection component to open its Basic settings view in
theComponent tab.

2. Select the Cassandra version that you are using from the DB Version list. In this example, it is
Cassandra 1.1.2.
3. In the Server field, type in the hostname or IP address of the Cassandra server. In this example, it
is localhost.
4. In the Port field, type in the listening port number of the Cassandra server.
5. If required, type in the authentication information for the Cassandra connection: Username and
Password.

Reading the input data

Procedure
1. Double-click the tFileInputDelimited component to open its Component view.

440
tCassandraInput

2. Click the [...] button next to the File Name/Stream field to browse to the file that you want to
read data from. In this scenario, the directory is D:/Input/Employees.csv. The CSV file contains four
columns: id, age, name and ManagerID.id;age;name;ManagerID 1;20;Alex;1 2;40;Pet
er;1 3;25;Mark;1 4;26;Michael;1 5;30;Christophe;2 6;26;Stephane;3 7;37
;Cedric;3 8;52;Bill;4 9;43;Jack;2 10;28;Andrews;4
3. In the Header field, enter 1 so that the first row in the CSV file will be skipped.
4. Click Edit schema to define the data to pass on to the tCassandraOutput component.

Writing data to a Cassandra keyspace

Procedure
1. Double-click the tCassandraOutput component to open its Basic settings view in the Component
tab.

441
tCassandraInput

2. Type in required information for the connection or use the existing connection you have
configured before. In this scenario, the Use existing connection check box is selected.
3. In the Keyspace configuration area, type in the name of the keyspace: Employee in this example,
and select Drop keyspace if exists and create from the Action on keyspace list.
4. In the Column family configuration area, type in the name of the column family: Employee_Info in
this example, and select Drop column family if exists and create from the Action on column family
list.
The Define column family structure check box appears. In this example, clear this check box.
5. In the Action on data list, select the action you want to carry on, Upsert in this example.
6. Click Sync columns to retrieve the schema from the preceding component.
7. Select the key column of the column family from the Key column list. In this example, it is id.
If needed, select the Include key in columns check box.

Reading data from the Cassandra keyspace

Procedure
1. Double-click the tCassandraInput component to open its Component view.

2. Type in required information for the connection or use the existing connection you have
configured before. In this scenario, the Use existing connection check box is selected.
3. In the Keyspace configuration area, type in the name of the keyspace: Employee in this example.

442
tCassandraInput

4. In the Column family configuration area, type in the name of the column family: Employee_Info in
this example.
5. Select Edit schema to define the data structure to be read from the Cassandra keyspace. In this
example, three columns id, name and age are defined.

6. If needed, select the Include key in output columns check box, and then select the key column of
the column family you want to include from the Key column list.
7. From the Row key type list, select Integer because id is of integer type in this example.
Keep the Default option for the row key Cassandra type because its value will become the
corresponding Cassandra type Int32 automatically.
8. In the Query configuration area, select the Specify row keys check box and specify the row keys
directly. In this example, three rows will be read. Next, select the Specify columns check box and
specify the column names of the column family directly. This scenario will read three columns
from the keyspace: id, name and age.
9. If needed, the Key start and the Key end fields allow you to define the range of rows, and the
Key limit field allows you to specify the number of rows within the range of rows to be read.
Similarly, the Columns range start and the Columns range end fields allow you to define the range
of columns of the column family, and the Columns range limit field allows you to specify the
number of columns within the range of columns to be read.

Displaying the information of interest

Procedure
1. Double-click the tLogRow component to open its Component view.
2. In the Mode area, select Table (print values in cells of a table).

Closing the Cassandra connection

Procedure
1. Double-click the tCassandraClose component to open its Component view.

443
tCassandraInput

2. Select the connection to be closed from the Component List.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Execute the Job by pressing F6 or clicking Run on the Run tab.
The personal information of three employees is displayed on the console.

444
tCassandraOutput

tCassandraOutput
Writes data into or deletes data from a column family of a Cassandra keyspace.
tCassandraOutput receives data from the preceding component, and writes data into Cassandra.

tCassandraOutput Standard properties


These properties are used to configure tCassandraOutput running in the Standard Job framework.
The Standard tCassandraOutput component belongs to the Big Data and the Databases NoSQL
families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

DB Version Select the Cassandra version you are using.

API type This drop-down list is displayed only when you have
selected the 2.0 version (deprecated) of Cassandra from the
DB version list. From this API type list, you can either select
Datastax to use CQL 3 (Cassandra Query Language) with
Cassandra, or select Hector (deprecated) to use CQL 2.
Note that the Hector API is deprecated along with the
support for Cassandra V2.0.
Along with the evolution of the CQL commands, the
parameters to be set in the Basic settings view varies.

Host Hostname or IP address of the Cassandra server.

Port Listening port number of the Cassandra server.

Required authentication Select this check box to provide credentials for the
Cassandra authentication.
This check box appears only if you do not select the Use
existing connection check box.

Username Fill in this field with the username for the Cassandra
authentication.

Password Fill in this field with the password for the Cassandra
authentication.

445
tCassandraOutput

To enter the password, click the [...] button next to the


password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use SSL Select this check box to enable the SSL or TLS encrypted
connection.
Then you need to use the tSetKeystore component in the
same Job to specify the encryption information.

Keyspace Type in the name of the keyspace into which you want to
write data.

Action on keyspace Select the operation you want to perform on the keyspace
to be used:
• None: No operation is carried out.
• Drop and create keyspace: The keyspace is removed
and created again.
• Create keyspace: The keyspace does not exist and gets
created.
• Create keyspace if not exists: A keyspace gets created if
it does not exist.
• Drop keyspace if exists and create: The keyspace is
removed if it already exists and created again.

Column family Type in the name of the keyspace into which you want to
write data.

Action on column family Select the operation you want to perform on the column
family to be used:
• None: no operation is carried out.
• Drop and create column family: the column family is
removed and created again.
• Create column family: the column family does not exist
and gets created.
• Create column family if not exists: a column family gets
created if it does not exist.
• Drop column family if exists and create: the column
family is removed if it already exists and created again.

Action on data On the data of the table defined, you can perform:
• Upsert: insert the columns if they do not exist or
update the existing columns.
• Insert: insert the columns if they do not exist. This
action also updates the existing ones.
• Update: update the existing columns or add the
columns that do not exist. This action does not support
the Counter Cassandra data type.
• Delete: remove columns corresponding to the input
flow.
Note that the action list varies depending on the Hector
(deprecated) or Datastax API you are using. When the API is
Datastax, more actions become available.
For more advanced actions, use the Advanced settings view.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next

446
tCassandraOutput

component. When you create a Spark Job, avoid the reserved


word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Sync columns Click this button to retrieve schema from the previous
component connected in the Job.

Die on error Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Features available only with the Hector API (deprecated)

Row key column Select the row key column from the list.

Include row key in columns Select this check box to include row key in columns.

Super columns Select the super column from the list.


This drop-down list appears only if you select Super from
the Column family type drop-down list.

Include super columns in standard columns Select this check box to include the super columns in
standard columns.

Delete row Select this check box to delete the row.


This check box appears only if you select Delete from the
Action on data drop-down list.

Delete columns Customize the columns you want to delete.

447
tCassandraOutput

Delete super columns Select this check box to delete super columns.
This check box appears only if you select the Delete Row
check box.

Advanced settings

Batch Size Number of lines in each processed batch.


When you are using the Datastax API, this feature is
displayed only when you have selected the Use unlogged
batch check box.

Use unlogged batch Select this check box to handle data in batch but with
Cassandra's UNLOGGED approach. This feature is available
to the following three actions: Insert, Update and Delete.
Then you need to configure how the batch mode works:
• Batch size: enter the number of lines in each batch to
be processed.
• Group batch method: select how to group rows into
batches:
1. Partition: rows sharing the same partition keys are
grouped.
2. Replica: rows to be written to the same replica are
grouped.
3. None: rows are grouped randomly. This option is
suitable for a single node Cassandra.
• Cache batch group: select this check box to load rows
into memory before grouping them. This way, grouping
is not impacted by the order of the rows.
If you leave this check box clear, only successive rows
that meet the same criteria are grouped.
• Async execute: select this check box if you want
tCassandraOutput to send batches in parallel. If you
leave it clear, tCassandraOutput waits for the result of
a batch before sending another batch to Cassandra.
• Maximum number of batches executed in parallel: once
you have selected Async execute, enter the number of
batches to be sent in parallel to Cassandra.
This number should not be a negative number or 0 and
it is also recommended not to use too large a value.
The ideal situation to use batches with Cassandra is when
a small number of tables must synchronize the data to be
inserted or updated.
In this UNLOGGED approach, the Job does not write batches
into Cassandra's batchlog system and thus avoids the
performance issue incurred by this writing. For further
information about Cassandra BATCH statement and
UNLOGGED approach, see Batches.

Insert if not exists Select this check box to insert rows. This row insertion takes
place only when they do not exist in the target table.
This feature is available to the Insert action only.

Delete if exists Select this check box to remove from the target table only
the rows that have the same records in the incoming flow.

448
tCassandraOutput

This feature is available only to the Delete action.

Use TTL Select this check box to write the TTL data in the target
table. In the column list that is displayed, you need to select
the column to be used as the TTL column. The DB type of
this column must be Int.
This feature is available to the Insert action and the Update
action only.

Use Timestamp Select this check box to write the timestamp data in the
target table. In the column list that is displayed, you need to
select the column to be used to store the timestamp data.
The DB type of this column must be BigInt.
This feature is available to the following actions: Insert,
Update and Delete.

IF condition Add the condition to be met for the Update or the Delete
action to take place. This condition allows you to be more
precise about the columns to be updated or deleted.

Special assignment operation Complete this table to construct advanced SET commands
of Cassandra to make the Update action more specific.
For example, add a record to the beginning or a particular
position of a given column.
In the Update column column of this table, you need
to select the column to be updated and then select the
operations to be used from the Operation column. The
following operations are available:
• Append: it adds incoming records to the end of the
column to be updated. The Cassandra data types it can
handle are Counter, List, Set and Map.
• Prepend: it adds incoming records to the beginning of
the column to be updated. The only Cassandra data
type it can handle is List.
• Remove: it removes records from the target table
when the same records exist in the incoming flow. The
Cassandra data types it can handle are Counter, List,
Set and Map.
• Assign based on position/key: it adds records to a
particular position of the column to be updated. The
Cassandra data types it can handle are List and Map.
Once you select this operation, the Map key/list
position column becomes editable. From this column,
you need to select the column to be used as reference
to locate the position to be updated.
For more details about these operations, see Datastax's
related documentation in http://docs.datastax.com/en/
cql/3.1/cql/cql_reference/update_r.html?scroll=reference
_ds_g4h_qzq_xj__description_unique_34.

Row key in the List type Select the column to be used to construct the WHERE clause
of Cassandra to perform the Update or the Delete action on
only selected rows. The column(s) to be used in this table
should be from the set of the Primary key columns of the
Cassandra table.

Delete collection column based on postion/key Select the column to be used as reference to locate the
particular row(s) to be removed.

449
tCassandraOutput

This feature is available only to the Delete action.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is used as an output component and it


always needs an incoming link.

Related Scenario
For a scenario in which tCassandraOutput is used, see Handling data with Cassandra on page 439.

450
tCassandraOutputBulk

tCassandraOutputBulk
Prepares an SSTable of large size and processes it according to your needs before loading this
SSTable into a column family of a Cassandra keyspace.
The tCassandraOutputBulk and tCassandraBulkExec components are generally used together as parts
of a two step process. In the first step, an SSTable is generated. In the second step, this SSTable
is written into Cassandra. These two steps are fused together in thetCassandraOutputBulkExec
component, detailed in a separate section. The advantage of using two separate components is that
the data can be transformed before it is loaded into Cassandra.
tCassandraOutputBulk receives data from the preceding component, and creates an SSTable locally.

tCassandraOutputBulk Standard properties


These properties are used to configure tCassandraOutputBulk running in the Standard Job framework.
The Standard tCassandraOutputBulk component belongs to the Big Data and the Databases NoSQL
families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.

451
tCassandraOutputBulk

You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Table type Select the type of the data model to be used for the table
to be created. It can be CQL (actually CQL3) or non-CQL (the
legacy thrift-based API of Cassandra before CQL3).
This drop-down list is available only when the DB version
you are using is Cassandra 2.0.0 (deprecated). For the
Cassandra versions later than 2.0.0, CQL becomes the only
model used by this component and so this list is no longer
available.

DB Version Select the Cassandra version you are using.

Host Hostname or IP address of the Cassandra server.

Port Listening port number of the Cassandra server.

Required authentication Select this check box to provide credentials for the
Cassandra authentication.

Username Fill in this field with the username for the Cassandra
authentication.

Password Fill in this field with the password for the Cassandra
authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use configuration file Select this check box and in the field that is displayed,
enter the path, or browse to cassandra.yaml, the main
configuration file for Cassandra.
This way, this component can import and directly use the
configuration from cassandra.yaml, which can contain many
advanced Cassandra properties, such as the properties for
SSL encryption.
When you need to run your Job in different Cassandra
environments, this feature allows your Job to easily switch
between the configurations.
For further information about this cassandra.yaml file, see
Cassandra configuration.

Keyspace Type in the name of the keyspace into which you want to
write the SSTable.

Column family Type in the name of the column family into which you want
to write the SSTable.

Partitioner Select the partitioner which determines how data is


distributed across the Cassandra cluster.
• Random
• Murmur3
• Order preserving: not recommended because it
assumes keys are UTF8 strings.

452
tCassandraOutputBulk

For more information about the partitioner, see http://


wiki.apache.org/cassandra/Partitioners.

Schema statement Enter the statement to define the schema of the column
family to be used or to be created on the fly.
• This statement is a Cassandra prepared statement,
which stores query results locally in the SSTable
directory you define with this component before
sending them to the server. For further information
about the prepared statements, see Prepared
statements.
• A Cassandra column family is a container for a
collection of rows of records that have a similar kind.
Its schema must contain strictly the same columns as
the component schema you have defined, that is to
say, the column names and the order of the columns in
both the schemas must be identical.
An example of this schema statement is provided in the
Schema statement field:

create table ks.tb (id int,


name text, birthday timestamp,
primary key(id, birthday)) with
clustering order by (birthday
desc)
It will create a column family called tb containing the id, the
name and the birthday columns under the keyspace ks.
For further information about a column family, see Standard
column family.
This field is available only when the version of your
Cassandra database is later than 2.0.0. When it is 2.0.0
(deprecated), it is available only when you have selected
CQL from the Table type drop-down list.

Insert statement Enter the statement to instruct how to write the data from
the input flow into the columns of the column family to be
used.
This statement is a Cassandra prepared statement, which
stores query results locally in the SSTable directory you
define with this component before sending them to
the server. For further information about the prepared
statements, see Prepared statements.
An example of this insert statement is provided in the Insert
statement field:

insert into ks.tb (id, name, birthday)


values (?, ?, ?)

It will write data into the id, the name and the birthday
columns, respectively, of a column family called tb in the
keyspace ks. The question marks in the statement are the
bind variable markers for the three columns. For further
information about bind variables and their usage, see Bound
parameters.
This field is available only when the version of your
Cassandra database is later than 2.0.0. When it is 2.0.0
(deprecated), it is available only when you have selected
CQL from the Table type drop-down list.

453
tCassandraOutputBulk

Column name comparator Select the data type for the column names, which is used to
sort columns. This list is not available when the data model
to be used is CQL3.
For more information about the comparators, see http://
www.datastax.com/docs/1.1/ddl/column_family#about-
data-types-comparators-and-validators.

SSTable directory Specify the local directory for the SSTable. Note that the
complete path to the SSTable will be the local directory
appended by the specified keyspace name and column
family name.
For example, if you set the local directory to /home/talend/
sstable, and specify testk as the keyspace name and testc as
the column family name, the complete path to the SSTable
will be /home/talend/sstable/testk/testc/.

Buffer size Specify what size the SSTable must reach before it is
written into Cassandra.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component always needs an incoming link.

Related scenarios
No scenario is available for the Standard version of this component yet.

454
tCassandraOutputBulkExec

tCassandraOutputBulkExec
Improves performance during Insert operations to a column family of a Cassandra keyspace.
The tCassandraOutputBulk and tCassandraBulkExec components are generally used together to
output data to an SSTable and then to write the SSTable into Cassandra, in a two step process. These
two steps are fused together in the tCassandraOutputBulkExec component.
tCassandraOutputBulkExec receives data from the preceding component, creates an SSTable and then
writes the SSTable into Cassandra.

tCassandraOutputBulkExec Standard properties


These properties are used to configure tCassandraOutputBulkExec running in the Standard Job
framework.
The Standard tCassandraOutputBulkExec component belongs to the Big Data and the Databases
NoSQL families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.

455
tCassandraOutputBulkExec

You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Table type Select the type of the data model to be used for the table
to be created. It can be CQL (actually CQL3) or non-CQL (the
legacy thrift-based API of Cassandra before CQL3).
This drop-down list is available only when the DB version
you are using is Cassandra 2.0.0 (deprecated). For the
Cassandra versions later than 2.0.0, CQL becomes the only
model used by this component and so this list is no longer
available.

DB Version Select the Cassandra version you are using.

Warning:
• Cassandra 2.0.0 (deprecated) only works with
JVM1.7.

Host Hostname or IP address of the Cassandra server.

Port Listening port number of the Cassandra server.

Required authentication Select this check box to provide credentials for the
Cassandra authentication.

Username Fill in this field with the username for the Cassandra
authentication.

Password Fill in this field with the password for the Cassandra
authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Keyspace Type in the name of the keyspace into which you want to
write the SSTable.

Column family Type in the name of the column family into which you want
to write the SSTable.

Partitioner Select the partitioner which determines how the data is


distributed across the Cassandra cluster.
• Random
• Murmur3
• Order preserving: not recommended because it
assumes keys are UTF8 strings.
For more information about the partitioner, see http://
wiki.apache.org/cassandra/Partitioners.

Schema statement Enter the statement to define the schema of the column
family to be used or to be created on the fly.
• This statement is a Cassandra prepared statement,
which stores query results locally in the SSTable
directory you define with this component before
sending them to the server. For further information

456
tCassandraOutputBulkExec

about the prepared statements, see Prepared


statements.
• A Cassandra column family is a container for a
collection of rows of records that have a similar kind.
Its schema must contain strictly the same columns as
the component schema you have defined, that is to
say, the column names and the order of the columns in
both the schemas must be identical.
An example of this schema statement is provided in the
Schema statement field:

create table ks.tb (id int,


name text, birthday timestamp,
primary key(id, birthday)) with
clustering order by (birthday
desc)
It will create a column family called tb containing the id, the
name and the birthday columns under the keyspace ks.
For further information about a column family, see Standard
column family.
This field is available only when the version of your
Cassandra database is later than 2.0.0. When it is 2.0.0
(deprecated), it is available only when you have selected
CQL from the Table type drop-down list.

Insert statement Enter the statement to instruct how to write the data from
the input flow into the columns of the column family to be
used.
This statement is a Cassandra prepared statement, which
stores query results locally in the SSTable directory you
define with this component before sending them to
the server. For further information about the prepared
statements, see Prepared statements.
An example of this insert statement is provided in the Insert
statement field:

insert into ks.tb (id, name, birthday)


values (?, ?, ?)

It will write data into the id, the name and the birthday
columns, respectively, of a column family called tb in the
keyspace ks. The question marks in the statement are the
bind variable markers for the three columns. For further
information about bind variables and their usage, see Bound
parameters.
This field is available only when the version of your
Cassandra database is later than 2.0.0. When it is 2.0.0
(deprecated), it is available only when you have selected
CQL from the Table type drop-down list.

Column name comparator Select the data type for the column names, which is used to
sort columns.
For more information about the comparators, see http://
www.datastax.com/docs/1.1/ddl/column_family#about-
data-types-comparators-and-validators.

SSTable directory Specify the local directory for the SSTable. Note that the
complete path to the SSTable will be the local directory

457
tCassandraOutputBulkExec

appended by the specified keyspace name and column


family name.
For example, if you set the local directory to /home/talend/
sstable, and specify testk as the keyspace name and testc as
the column family name, the complete path to the SSTable
will be /home/talend/sstable/testk/testc/.

Buffer size Specify what size the SSTable must reach before it is
written into Cassandra.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is mainly used when no particular


transformation is required on the data to be loaded into the
database.

Limitation Currently, the execution of this component ends the entire


Job.

Related scenarios
No scenario is available for the Standard version of this component yet.

458
tCassandraRow

tCassandraRow
Acts on the actual DB structure or on the data, depending on the nature of the query and the
database.
tCassandraRow is the specific component for this database query. It executes the Cassandra Query
Language (CQL) query stated in the specified database. The row suffix means the component
implements a flow in the Job design although it does not provide output.

tCassandraRow Standard properties


These properties are used to configure tCassandraRow running in the Standard Job framework.
The Standard tCassandraRow component belongs to the Big Data and the Databases NoSQL families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

DB Version Select the Cassandra version you are using.

Host Type in the IP address or hostname of the Cassandra server.

Port Type in the listening port number of the Cassandra server.

Required Authentication Select this check box to provide credentials for the
Cassandra authentication.
This check box appears only if you do not select the Use
existing connection check box.

Username Fill in this field with the username for the Cassandra
authentication.

Password Fill in this field with the password for the Cassandra
authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Keyspace Type in the name of the keyspace on which you want to


execute the CQL commands.

Column family Name of the column family.

459
tCassandraRow

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query Type in the CQL command to be executed.


By default, the query is not case-sensitive. This means that
at runtime, the column names you put in the query are
always taken in lower case. If you need to make the query
case-sensitive, put the column names in double quotation
marks.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Related scenario
For related topics, see

460
tCassandraRow

• Removing and regenerating a MySQL table index on page 2497.


• Using PreparedStatement objects to query data on page 2498.

461
tChangeFileEncoding

tChangeFileEncoding
Transforms the character encoding of a given file and generates a new file with the transformed
character encoding.
tChangeFileEncoding changes the encoding of a given file.

tChangeFileEncoding Standard properties


These properties are used to configure tChangeFileEncoding running in the Standard Job framework.
The Standard tChangeFileEncoding component belongs to the Data Quality and the File families.
The component in this framework is available in all Talend products.

Basic settings

Use Custom Input Encoding Select this check box to customize input encoding type.
When it is selected, a list of input encoding types appears,
allowing you to select an input encoding type or specify an
input encoding type by selecting CUSTOM.

Encoding From this list of character encoding types, you can select
one of the offered options or customize the character
encoding by selecting CUSTOM and specifying a character
encoding type.

Input File Name Path of the input file.

Output File Name Path of the output file.

Advanced settings

Create directory if does This check box is selected by default. It creates a directory to hold the output table if required.
not exist

tStatCatcher Statistics Select this check box to collect log data at the component level.

Global Variables

Global Variables EXISTS: the result of whether a specified file exists. This is a
Flow variable and it returns a boolean.
FILENAME: the name of the file processed. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

462
tChangeFileEncoding

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as standalone component.

Transforming the character encoding of a file


This Java scenario describes a very simple Job that transforms the character encoding of a text file and
generates a new file with the new character encoding.

Procedure
Procedure
1. Drop a tChangeFileEncoding component onto the design workspace.

2. Double-click the tChangeFileEncoding component to display its Basic settings view.

3. Select Use Custom Input Encoding check box. Set the Encoding type to GB2312.
4. In the Input File Name field, enter the file path or browse to the input file.
5. In the Output File Name field, enter the file path or browse to the output file.
6. Select CUSTOM from the second Encoding list and enter UTF-16 in the text field.

463
tChangeFileEncoding

7.

Press F6 to execute the Job.

Results
The encoding type of the file in.txt is transformed and out.txt is generated with the UTF-16 encoding
type.

464
tChronometerStart

tChronometerStart
Operates as a chronometer device that starts calculating the processing time of one or more subJobs
in the main Job, or that starts calculating the processing time of part of your subJob.
Starts measuring the time a subJob takes to be executed.

tChronometerStart Standard properties


These properties are used to configure tChronometerStart running in the Standard Job framework.
The Standard tChronometerStart component belongs to the Logs & Errors family.
The component in this framework is available in all Talend products.

Global Variables

Global Variables STARTTIME: the start time to calculate the processing time
of subjob(s). This is a Flow variable and it returns a long.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule You can use tChronometerStart as a start or middle


component. It can precede one or more processing tasks in
the subJob. It can precede one or more subJobs in the main
Job.

Related scenario
For related scenario, see Measuring the processing time of a subJob and part of a subJob on page
467.

465
tChronometerStop

tChronometerStop
Operates as a chronometer device that stops calculating the processing time of one or more subJobs
in the main Job, or that stops calculating the processing time of part of your subJob. tChronometerSt
op displays the total execution time.
Measures the time a subJob takes to be executed.

tChronometerStop Standard properties


These properties are used to configure tChronometerStop running in the Standard Job framework.
The Standard tChronometerStop component belongs to the Logs & Errors family.
The component in this framework is available in all Talend products.

Basic settings

Since options Select either check box to select measurement starting


point:
Since the beginning: stops time measurement launched at
the beginning of a subJob.
Since a tChronometerStart: stops time measurement
launched at one of the tChronometerStart components used
on the data flow of the subJob.

Display duration in console When selected, it displays subJob execution information on


the console.

Display component name When selected, it displays the name of the component on
the console.

Caption Enter desired text, to identify your subJob for example.

Display human readable duration When selected, it displays subJob execution information in
readable time unites.

Global Variables

Global Variables STOPTIME: the stop time to calculate the processing time of
subjob(s). This is a Flow variable and it returns a long.
DURATION: the processing time of subjob(s). This is a Flow
variable and it returns a long.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

466
tChronometerStop

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule Cannot be used as a start component.

Measuring the processing time of a subJob and part of a


subJob
This scenario is a subJob that does the following in a sequence:
• generates 1000 000 rows of first and last names,
• gathers first names with their corresponding last names,
• stores the output data in a delimited file,
• measures the duration of the subJob as a whole,
• measures the duration of the name replacement operation,
• displays the gathered information about the processing time on the Run log console.
To measure the processing time of the subJob:
• Drop the following components from the Palette onto the design workspace: tRowGenerator,
tMap, tFileOutputDelimited, and tChronometerStop.
• Connect the first three components using Main Row links.

Note: When connecting tMap to tFileOutputDelimited, you will be prompted to name the output
table. The name used in this example is "new_order".

• Connect tFileOutputDelimited to tChronometerStop using an OnComponentOk link.


• Select tRowGenerator and click the Component tab to display the component view.
• In the component view, click Basic settings. The Component tab opens on the Basic settings view
by default.

467
tChronometerStop

• Click Edit schema to define the schema of the tRowGenerator. For this Job, the schema is
composed of two columns: First_Name and Last_Name, so click twice the [+] button to add two
columns and rename them.
• Click the RowGenerator Editor three-dot button to open the editor and define the data to be
generated.

• In the RowGenerator Editor, specify the number of rows to be generated in the Number of Rows
for RowGenerator field and click OK. The RowGenerator Editor closes.
• You will be prompted to propagate changes. Click Yes in the popup message.
• Double-click on the tMap component to open the Map editor. The Map editor opens displaying the
input metadata of the tRowGenerator component.

• In the Schema editor panel of the Map editor, click the plus button of the output table to add two
rows and define them.

468
tChronometerStop

• In the Map editor, drag the First_Name row from the input table to the Last_Name row in the
output table and drag the Last_Name row from the input table to the First_Name row in the output
table.
• Click Apply to save changes.
• You will be prompted to propagate changes. Click Yes in the popup message.
• Click OK to close the editor.

• Select tFileOutputDelimited and click the Component tab to display the component view.
• In the Basic settings view, set tFileOutputDelimited properties as needed.

• Select tChronometerStop and click the Component tab to display the component view.
• In the Since options panel of the Basic settings view, select Since the beginning option to measure
the duration of the subJob as a whole.

469
tChronometerStop

• Select/clear the other check boxes as needed. In this scenario, we want to display the subJob
duration on the console preceded by the component name.
• If needed, enter a text in the Caption field.
• Save your Job and press F6 to execute it.

Note: You can measure the duration of the subJob the same way by placing tChronometerStop
below tRowGenerator, and connecting the latter to tChronometerStop using an OnSubjobOk link.

470
tCloudStart

tCloudStart
Starts instances on Amazon EC2 (Amazon Elastic Compute Cloud).
This component accesses the cloud provider to be used (Amazon EC2) and launches instances, which
are virtual servers in that cloud. If an instance to be launched does not exist, tCloudStart creates it.

tCloudStart Standard properties


These properties are used to configure tCloudStart running in the Standard Job framework.
The Standard tCloudStart component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Access key and Secret key Enter or paste the access key and the secret key required by
Amazon to authenticate your requests to its web services.
These access credentials are generated from the Security
Credential tab of your Amazon account page.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Cloud provider Select the cloud provider to be used.

Image Enter the name of the Amazon Machine Image (AMI) to


be used to launch an instance. This AMI defines the basic
configuration of that instance.

Region and Zone Enter the region and the zone to be used as the geographic
location where you want to launch an instance.
The syntax used to express a location is predefined by
Amazon, for example, us-east-1 representing the US East
(Northern Virginia) region and us-east-1a representing
one of the Availability Zones within that region. For fu
rther information about available regions for Amazon, see
Amazon's documentation about regions and endpoints and
as well Amazon's FAQ about region and Availability Zone.

Instance name Enter the name of the instance to be launched. For example,
you can enter Talend.
Note that the upper letter will be converted to lower letter.

Instance count Enter the number of instances to be launched. At runtime,


the name specified in the Instance name field, for example
Talend, will be used as the initial part of each instance
name, and letters and numbers will be randomly added to
complete each name.

Instance type Select the type of the instance(s) to be launched. Each type
is predefined by Amazon and defines the performance of
every instance you want to launch.

471
tCloudStart

This drop-down list presents the API name of each instance


type. For further information, see Amazon's documentation
about instance types.

Proceed with a Key pair Select this check box to use Amazon Key Pair for your login
to Amazon EC2. Once selecting it, a drop-down list appears
to allow you to select :
• Use an existing Key Pair to enter the name of that Key
Pair in the field next to the drop-down list. If required,
Amazon will prompt you at runtime to find and use
that Key Pair.
• Create a Key Pair to enter the name of the new Key
Pair in the field next to the drop-down list and define
the location where you want to store this Key Pair in
the Advanced settings tab view.

Security group Add rows to this table and enter the names of the security
groups to which you need to assign the instance(s) to be
launched. The security groups set in this table must exist on
your Amazon EC2.
A security group applies specific rules on inbound traffic
to instances assigned to the group, such as the ports to be
used. For further information about security groups, see
Amazon's documentation about security groups.
Note that an instance can be assigned to a group by setting
its security group name or key pair name to jclouds#<
$group_name>, where <$group_name> identifies the
group to which the instance belongs. In this way, you can
change the status of all instances or running instances
in one group at the same time using the tCloudStop
component.

Advanced settings

Key Pair folder Browse to, or enter the path to the folder you use to store
the created Key Pair file.
This field appears when you select Creating a Key Pair in
the Basic settings tab view.

Volumes Add rows and define the volume(s) to be created for the
instances to be launched in addition to the volumes
predefined and allocated by the given Amazon EC2.
The parameters to be set in this table are the same
parameters used by Amazon for describing a volume.
If you need to remove automatically an additional volume
after terminating its related instance, select the check box
in the Delete on termination column.

tStatCatcher Statistics Select this check box to collect the log data at the
component level.

Global Variables

Global Variables NODE_GROUP: the name of the instance. This is an After


variable and it returns a string.

472
tCloudStart

NODES: the instances launched. This is an After variable and


it returns an object.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component works standalone to launch an instance


on Amazon EC2. You can use this component to start the
instance you need to deploy Jobs on.

Related scenarios
No scenario is available for the Standard version of this component yet.

473
tCloudStop

tCloudStop
Changes the status of a launched instance on Amazon EC2 (Amazon Elastic Compute Cloud).
This component accesses the cloud provider to be used (Amazon EC2) and suspends, resumes or
terminates given instance(s).

tCloudStop Standard properties


These properties are used to configure tCloudStop running in the Standard Job framework.
The Standard tCloudStop component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Access key and Secret key Enter or paste the access key and the secret key required by
Amazon to authenticate your requests to its web services.
These access credentials are generated from the Security
Credential view of your Amazon account page.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Cloud provider Select the cloud provider to be used.

Action Select the action you need tCloudStop to take in order to


change the status of a given instance. This action may be:
• Suspend
• Resume
• Terminate
Note that if you terminate an instance, this instance will be
deleted, while a suspended instance can still be resumed.

Predicate Select the instance(s) of which you need to change the


status. The options are:
• Running instances: status of all the running instances
will be changed.
• Instances in a specific group: status of the instances of
a specific instance group will be changed. You need to
enter the name of that group in the Group name field.
• Running instances in a specific group: status of the
running instances of a specific instance group will be
changed. You need to enter the name of that group in
the Group name field.
• Instance with predefined id: status of a given instance
will be changed. You need to enter the ID of that
instance in the Id field. You can find this ID on your
Amazon EC2.
An instance group is composed of the instances using the
same instance name you have defined in the Instance name
field of tCloudStart.

474
tCloudStop

Group name Enter the name of the group in which you want to change
the status of given instances whose security group name or
key pair name is set to jclouds#<$group_name> in the
tCloudStart component, where <$group_name> identifies
the group to which the instance belongs.
This field is available only when Instances in a specific
group or Running instances in a specific group is selected
from the Predicate list.

Id Enter the ID of the instance of which you need to change


the status, for instance, "${region}/${instance id}". This field
appears when you select Instance with predefined id from
the Predicate list.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at the
component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component works standalone to change the status


of given instances on Amazon EC2. You can use this
component to suspend, resume or terminate the instance(s)
you have deployed Jobs on.
This component often works alongside tCloudStart to
change the status of the instances launched by the latter
component.

Related scenarios
No scenario is available for the Standard version of this component yet.

475
tCombinedSQLAggregate

tCombinedSQLAggregate
Provides a set of matrix based on values or calculations.
tCombinedSQLAggregate collects data values from one or more columns of a table for statistical
purposes. This component has real-time capabilities since it runs the data transformation on the
DBMS itself.

tCombinedSQLAggregate Standard properties


These properties are used to configure tCombinedSQLAggregate running in the Standard Job
framework.
The Standard tCombinedSQLAggregate component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either built-in or remote in the
Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component connected in the Job.

  Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.

  Repository: You have already created the schema and


stored it in the Repository. You can reuse it in various
projects and Jobs. Related topic: see Talend Studio User
Guide.

Group by Define the aggregation sets, the values of which will be


used for calculations.

  Output Column: Select the column label in the list offered


according to the schema structure you defined. You can add
as many output columns as you wish to make more precise
aggregations.

476
tCombinedSQLAggregate

  Input Column: Select the input column label to match the


output column's expected content, in case the output label
of the aggregation set needs to be different.

Operations Select the type of operation along with the value to use for
the calculation and the output field.

  Output Column: Select the destination field in the list.

  Function: Select any of the following operations to perform


on data: count, min, max, avg, sum, first, last, distinct and
count (distinct).

  Input column: Select the input column from which you want
to collect the values to be aggregated.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is a Flow
variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is an intermediary component. The use


of the corresponding connection and commit components
is recommended when using this component to allow a
unique connection to be open and then closed during the
Job execution.

477
tCombinedSQLAggregate

Filtering and aggregating table columns directly on the


DBMS
The following scenario creates a Job that opens a connection to a MySQL database and:
• populates a database table with the input data,
• creates the output table for the filtered data,
• instantiates the schema from a database table in part (for column filtering),
• filters two columns in the same table to get only the data that meets two filtering conditions,
• collects data from the filtered column(s), grouped by specific value(s) and writes aggregated data
in a target database table.

478
tCombinedSQLAggregate

Adding and linking the components


Procedure
1. Drop the following components from the Palette onto the design workspace: tMysqlConnecti
on, tFixedFlowInput, tMysqlOutput, tCreateTable, tCombinedSQLInput, tCombinedSQLFilter,
tCombinedSQLAggregate, tCombinedSQLOutput, tMysqlCommit, tMysqlInput and tLogRow.
2. Connect tMysqlConnection to tFixedFlowInput using a Trigger > On Subjob Ok link
3. Do the same to connect tFixedFlowInput to tCreateTable, tCreateTable to tCombinedSQLInput,
tCombinedSQLInput to tMysqlCommit, and tMysqlCommit to tMysqlInput.
4. Connect tFixedFlowInput and tMysqlOutput using a Row > Main link.
5. Connect tCombinedSQLInput to tCombinedSQLFilter using a Row > Combine link.
6. Do the same to connect tCombinedSQLFilter to tCombinedSQLAggregate, and tCombinedSQLAg
gregate to tCombinedSQLOutput
7. Connect tMysqlInput and tLogRow using a Row > Main link.

Configuring the components


The schema defined through tCombinedSQLInput can be different from that of the source table as you
can just instantiate the desired columns of the source table. Therefore, tCombinedSQLInput also plays
a role of column filtering.
In this scenario, the source database table has seven columns: id, first_name, last_name, city, state,
date_of_birth, and salary while tCombinedSQLInput only instantiates four columns that are needed for
the aggregation: id, state, date_of_birth, and salary from the source table.

Opening a MySQL connection

Procedure
1. Launch MySQL Workbench and start a local connection on port 3306.
2. Create a new schema and name it test.
3. Back in the design workspace, select tMysqlConnection and click the Component tab to define its
basic settings.

479
tCombinedSQLAggregate

4. In the Basic settings view, set the database connection details manually or select Repository from
the Property Type list and select your DB connection if it has already been defined and stored in
the Metadata area of the Repository tree view.
For more information on centralizing DB connection details in the Repository, see Talend Studio
User Guide.

Populating the database table with input data

Procedure
1. In the design workspace, select tFixedFlowInput and click the Component tab to define its basic
settings

2. In the Basic settings view, in the Number of rows field, enter 500.
3. In this scenario, the source database table has seven columns: id, first_name, last_name, city, state,
date_of_birth, and salary
Click the [...] button next to Edit schema to define the following data structure.

480
tCombinedSQLAggregate

4. Click the floppy disk icon to save the schema as a generic schema for later reuse.
5. In the Select folder window, select default and click OK.
6. Choose a name for your generic schema and click Finish.
7. Click OK.
8. The first column of the Values table automatically reflects the data structure you entered
previously.
9. In the Values table, enter a value for each column.
10. In the design workspace, select tMysqlOutput and click the Component tab to define its basic set
tings.

The output schema will automatically be the same as the previous component, in this case
tFixedFlowInput.

Creating the target database table

Procedure
1. In the design workspace, select tCreateTable and click the Component tab to define its basic set
tings.

481
tCombinedSQLAggregate

2. Click the [...] button next to Edit schema to define the following data structure.

The schema you enter at this step must reflect the the differents aggregation operations you
want to perform on the input data.

Extracting and filtering data

Procedure
1. In the design workspace, select tCombinedSQLInput and click the Component tab to access the
configuration panel.

2. Enter the source table name, in this case employees in the Table field.
3. In the Schema field, select Repository from the list and click the [...] button right to the empty
field to load the schema you saved.while configuring the settings for tFixedFlowInput.
4. In the Repository Content window, expand Generic schemas and select your schema.

482
tCombinedSQLAggregate

5. Click the [...] button right to Edit schema.


6. Select View schema, and in the first column of the table, clear the check boxes for first_name,
last_name and city.

Filtering and aggregating the input data

Procedure
1. In the design workspace, select tCombinedSQLFilter and click the Component tab to access the
configuration panel.

2. Click the Sync columns button to retrieve the schema from the previous component, or configure
the schema manually by selecting Built-in from the Schema list and clicking the [...] button next
to Edit schema.
When you define the data structure for tCombinedSQLFilter, column names automatically appear
in the Input column list in the Conditions table.
In this scenario, the tCombinedSQLFilter component instantiates four columns: id, state,
date_of_birth, and salary.
3. In the Conditions table, set input parameters, operators and expected values in order to only
extract the records that fulfill these criteria.
Click two times on the [+] button under the Conditions table, and in Input column, select state and
date_of_birth from the drop-down list.
In this scenario, the tCombinedSQLFilter component filters the state and date_of_birth columns in
the source table to extract the employees who were born after Oct. 19, 1960 and who live in the
states Utah, Ohio and Iowa.
4. For the column state, select IN as operator from the drop-down list, and enter ('Utah','Ohia','Iowa')
as value.
5. For the column date_of_birth, select > as operator from the drop-down list, and enter ('1960-10-19')
as value.
6. Select And in the Logical operator between conditions list to apply the two conditions at the same
time. You can also customize the conditions by selecting the Use custom SQL box and editing the
conditions in the code box.
7. In the design workspace, select tCombinedSQLAggregate and click the Component tab to access
the configuration panel.

483
tCombinedSQLAggregate

8. Click on the [...] button.next to Edit schema to enter the following configuration:

The tCombinedSQLAggregate component instantiates four columns: id, state, date_of_birth, and
salary, coming from the previous component.

9. The Group by table helps you define the data sets to be processed based on a defined column. In
this example: State.
In the Group by table, click the [+] button to add one line.
10. In the Output column drop-down list, select State. This column will be used to hold the data filt
ered on State.
11. The Operations table helps you define the type of aggregation operations to be performed.
The Output column list available depends on the schema you want to output (through the

484
tCombinedSQLAggregate

tCombinedSQLOutput component). In this scenario, we want to group employees based on the


state they live in. Then we want to count the number of employees per state, calculate the avera
ge/lowest/highest salaries as well as the oldest/youngest employees for each state.
12. In the Operations table, click the [+] button to add a line and then click in the Output column list
to select the output column that will hold the computed data.
13. In the Function field, select the relevant operation to be carried out.

Writing the output data into MySQL

Procedure
1. In the design workspace, select tCombinedSQLOutput and click the Component tab to access the
configuration panel.

2. On the Database type list, select the relevant database.


3. On the Component list, select the relevant database connection component if more than one
connection is used.
4. In the Table field, enter the name of the target table which will store the results of the
aggregation operations, empl_by_state in this case
The tCombinedSQLOutput component requires that an output table already exists in the database
to work. That is why the empl_by_state table was created earlier in the scenario.
In this example, the Schema field doesn't need to be filled out as the database is not Oracle.
5. Click the Sync columns button to retrieve the schema from the previous component.
In this scenario, tCombinedSQLOutput instantiates seven columns coming from the previous
component in the Job design (tCombinedSQLAggregate): state, empl_count, avg_salary, min_salary,
max_salary, oldest_empl and youngest_empl.

Committing the data into the database

Procedure
1. In the design workspace, select tCombinedSQLCommit and click the Component tab to access the
configuration panel.
2. On the Component list, select the relevant database connection component if more than one
connection is used.
3. Clear the check box Close Connection.

485
tCombinedSQLAggregate

Retrieving the filtered and aggregated data

Procedure
1. In the design workspace, select tMysqlIntput and click the Component tab to define its basic set
tings.

2. Select the check box Use an existing connection ans choose tMysqlConnection_1 from the list.
3. Click on the [...] button.next to Edit schema to enter the following schema:

4. In the field Table Name, enter empl_by_state and in the Query field, enter select * from emp
l_by_state.
5. In the design workspace, select tLogRow and click the Component tab to define its basic settings.

486
tCombinedSQLAggregate

6. Click the Sync columns button to retrieve the schema from the previous component and select the
Table (print values in cells of a table) mode.

Saving and executing the Job


Procedure
1. Save your Job and press F6 to execute it.
2. The Run tab opens, where you can observe the result of the Job execution.
3. The output data retrieved by the tLogRow is visible in a table.

Results
Rows are inserted into a seven-column table empl_by_state in the database. The table shows, per
defined state, the number of employees, the average salary, the lowest and highest salaries as well as
the oldest and youngest employees.

487
tCombinedSQLFilter

tCombinedSQLFilter
Filters data by reorganizing, deleting or adding columns based on the source table and to filter the
given data source using the filter conditions.
tCombinedSQLFilter allows you to alter the schema of a source table through column name mapping
and to define a row filter on that table. Therefore, it can be used to filter columns and rows at the
same time. This component has real-time capabilities since it runs the data filtering on the DBMS
itself.

tCombinedSQLFilter Standard properties


These properties are used to configure tCombinedSQLFilter running in the Standard Job framework.
The Standard tCombinedSQLFilter component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component connected in the Job.

  Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.

  Repository: You have already created the schema and


stored it in the Repository. You can reuse it in various
projects and Jobs. Related topic: see Talend Studio User
Guide.

Logical operator between conditions Select the logical operator between the filter conditions
defined in the Conditions panel.
Two operators are available: Or, And.

488
tCombinedSQLFilter

Conditions Select the type of WHERE clause along with the values and
the columns to use for row filtering.

  Input Column: Select the column to filter in the list.

  Operator: Select the type of the WHERE clause: =, < >, >, <,
>=, <=, LIKE, IN, NOT IN, and EXIST IN.

  Values: Type in the values to be used in the WHERE clause.

  Negate: Select this check box to enable the condition that is


opposite to the current setting.

Use custom SQL Customize a WHERE clause by selecting this check box and
editing in the SQL Condition field.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is a Flow
variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is an intermediary component. The use


of the corresponding connection and commit components
is recommended when using this component to allow a
unique connection to be open and then closed during the
Job execution.

Related Scenario
For a related scenario, see Filtering and aggregating table columns directly on the DBMS on page 478.

489
tCombinedSQLInput

tCombinedSQLInput
Extracts fields from a database table based on its schema definition.
Then it passes on the field list to the next component via a Combine row link. The schema of
tCombinedSQLInput can be different from that of the source database table but must correspond to it
in terms of the column order.
tCombinedSQLInput extracts fields from a database table based on its schema. This component also
has column filtering capabilities since its schema can be different from that of the database table.

tCombinedSQLInput Standard properties


These properties are used to configure tCombinedSQLInput running in the Standard Job framework.
The Standard tCombinedSQLInput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Table Name of the source database table.

Schema Name of the source table's schema. This field has to be


filled if the database is Oracle.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.

  Repository: You have already created the schema and


stored it in the Repository. You can reuse it in various
projects and Jobs. Related topic: see Talend Studio User
Guide.

Add additional columns This option allows you to call SQL functions to perform
actions on columns, provided that these are not insert,

490
tCombinedSQLInput

update or delete actions, or actions that require pre-


processing.

  Name: Type in the name of the schema column to be


altered.

  SQL expression: Type in the SQL statement to be executed


in order to alter the data in the corresponding column.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is a Flow
variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is an intermediary component. The use


of the corresponding connection and commit components
is recommended when using this component to allow a
unique connection to be open and then closed during the
Job execution.

Related scenario
For a related scenario, see Filtering and aggregating table columns directly on the DBMS on page 478.

491
tCombinedSQLOutput

tCombinedSQLOutput
Inserts records from the incoming flow to an existing database table.

tCombinedSQLOutput Standard properties


These properties are used to configure tCombinedSQLOutput running in the Standard Job framework.
The Standard tCombinedSQLOutput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Database Type Select the database type.

Component list Select the relevant DB connection component in the list if


more than one connection is used for the current Job.

Table Name of the target database table.

Schema Name of the target database table's schema. This field has
to be filled if the database is Oracle.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component connected in the Job.

  Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.

  Repository: You have already created the schema and


stored it in the Repository. You can reuse it in various
projects and Jobs. Related topic: see Talend Studio User
Guide.

Action on data Select INSERT from the list to insert the records from the
incoming flow to the target database table.

492
tCombinedSQLOutput

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is a Flow
variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is an intermediary component. The use


of the corresponding connection and commit components
is recommended when using this component to allow a
unique connection to be open and then closed during the
Job execution.

Related scenario
For a related scenario, see Filtering and aggregating table columns directly on the DBMS on page 478.

493
tContextDump

tContextDump
Copies the context setup of the current Job to a flat file, a database table, etc., which can then be used
by tContextLoad.
Together with tContextLoad, this component makes it simple to apply the context setup of one Job to
another.
tContextDump dumps the context setup of the current Job to the subsequent component.

tContextDump Standard properties


These properties are used to configure tContextDump running in the Standard Job framework.
The Standard tContextDump component belongs to the Misc family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the fields that will
be processed and passed on to the next component.

Note:
The schema of tContextDump is read only and made
up of two columns, Key and Value, corresponding to the
parameter name and the parameter value of the Job
context.

Hide Password Select this check box to hide the value of context
parameter password, namely displaying the value of context
parameters whose Type is Password as *.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

494
tContextDump

Usage

Usage rule As a start component, tContextDump dumps the context


setup of the current Job to a file, a database table, etc.

Related scenarios
No scenario is available for the Standard version of this component yet.

495
tContextLoad

tContextLoad
Loads a context from a flow.
This component performs also two controls. It warns when the parameters defined in the incoming
flow are not defined in the context, and the other way around, it also warns when a context value is
not initialized in the incoming flow.But note that this does not block the processing.
tContextLoad modifies dynamically the values of the active context.

tContextLoad Standard properties


These properties are used to configure tContextLoad running in the Standard Job framework.
The Standard tContextLoad component belongs to the Misc family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the fields that will
be processed and passed on to the next component.
In tContextLoad, the schema must be made of two columns,
including the parameter name and the parameter value to
be loaded.

If a variable loaded, but not in the context If a variable is loaded but does not appear in the context,
select how the notification must be displayed. In the shape
of an Error, a warning or an information (info).

If a variable in the context, but not loaded If a variable appears in the context but is not loaded, select
how the notification must be displayed. In the shape of an
Error, a warning or an information (info).

Print operations Select this check box to display the context parameters set
in the Run view.

Disable errors Select this check box to prevent the error from displaying.

Disable warnings Select this check box to prevent the warning from
displaying.

Disable infos Select this check box to prevent the information from
displaying.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows.

Advanced settings

tStat Catcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

496
tContextLoad

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
KEY_NOT_INCONTEXT: the variables are loaded but do not
appear in the context. This is an After variable and it returns
a string.
KEY_NOT_LOADED: the variables not loaded but appear in
the context. This is an After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component relies on the data flow to load the context
values to be used, therefore it requires a preceding input
component and thus cannot be a start component.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to turn on or off the Print
operations option dynamically at runtime.
When a dynamic parameter is defined, the corresponding
Print operations option in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on
Dynamic settings and context variables, see Talend Studio
User Guide.

Limitation tContextLoad does not create any non-defined variable in


the default context.

Reading data from different MySQL databases using


dynamically loaded connection parameters
The Job in this scenario is made of two subJobs. The first subJob aims at dynamically loading the
context parameters from two text files, and the second subJob uses the loaded context parameters to
connect to two different databases and to display the content of an existing database table of each

497
tContextLoad

of them. With the context settings in the Job, we can decide which database to connect to and choose
whether to display the set context parameters on the console dynamically at runtime.

Dropping and linking the components


Procedure
1. Drop a tFileInputDelimited component and a tContextLoad component from the Palette onto the
design workspace, and link them using a Row > Main connection to form the first subJob.
2. Drop a tMysqlInput component and a tLogRow component onto the design workspace, and link
them using a Row > Main connection to form the second subJob.
3. Link the two subJobs using a Trigger > On Subjob Ok connection.

Preparing the contexts and context variables


Procedure
1. Create two delimited files corresponding to the two contexts in this scenario, namely two
databases we will access, and name them test_connection.txt and prod_connection.txt, which
contain the database connection details for testing and actual production purposes respectively.
Each file is made of two columns, containing the parameter names and the corresponding values
respectively. Below is an example:

host;localhost
port;3306
database;test
username;root
password;talend

2. Select the Contexts view of the Job, and click the [+] button at the bottom of the view to add
seven rows in the table to define the following parameters:
• host, String type
• port, String type
• database, String type
• username, String type
• password, Password type
• filename, File type
• printOperations, Boolean type

498
tContextLoad

Note that the host, port, database, username and password parameters correspond to the parameter
names in the delimited files and are used to set up the desired database connection, the filename
parameter is used to define the delimited file to read at Job execution, the printOperations
parameter is used to decide whether to print the context parameters set by the tContextLoad
component on the console.
3. Click the Contexts tab and click the [+] button at the upper right corner of the panel to open the
Configure Contexts dialog box.
4. Select the default context, click the Edit button and rename the context to Test.
5. Click New to add a new context named Production. Then click OK to close the dialog box.

6. Back in the Contexts tab view, define the value of the filename variable under each context by
clicking in the respective Value field and browse to the corresponding delimited file.
7. Select the Prompt check box next to the Value field of the filename variable for both contexts to
show the Prompt fields and enter the prompt message to be displayed at the execution time.
8. For the printOperations variable, click in the Value field under the Production context and select
false from the list; click in the Value field under the Test context and select true from the list.
Then select the Prompt check box under both contexts and enter the prompt message to be
displayed at the execution time.

499
tContextLoad

Configuring the components


Procedure
1. In the tFileInputDelimited component Basic settings panel, fill the File name/Stream field with
the relevant context variable we just defined: context.filename.

2. Define the file schema manually (Built-in). It contains two columns defined as: key and value.
3. Accept the defined schema to be propagated to the next component (tContextLoad).
4. In the Dynamic settings view of the tContextLoad component, click the [+] button to add a row
in the table, and fill the Code field with context.printOperations to use context variable
printOperations we just defined. Note that the Print operations check box in the Basic settings
view now becomes highlighted and unusable.

5. Then double-click to open the tMysqlInput component Basic settings view.


6. Fill the Host, Port, Database, Username, and Password fields with the relevant variables stored
in the delimited files and defined in the Contexts tab view: context.host, context.port,
context.database, context.username, and context.password respectively in this
example, and fill the Table Name field with the actual database table name to read data from,
customers for both databases in this example.

500
tContextLoad

7. Then fill in the Schema information. If you stored the schema in the Repository Metadata, then
you can retrieve it by selecting Repository and the relevant entry in the list.
In this example, the schema of both database tables is made of four columns: id (INT, 2 characters
long), firstName (VARCHAR, 15 characters long), lastName (VARCHAR, 15 characters long), and city
(VARCHAR, 15 characters long).
8. In the Query field, type in the SQL query to be executed on the DB table specified. In this
example, simply click Guess Query to retrieve all the columns of the table, which will be displayed
on the Run tab, through the tLogRow component.
9. In the Basic settings view of the tLogRow component, select the Table option to display data
records in the form of a table.

Executing the Job


Procedure
1. Press Ctrl+S to save the Job, and press F6 to run the Job using the default context, which is Test in
this use case.
A dialog box appears to prompt you to specify the delimited file to read and decide whether to
display the set context parameters on the console.

501
tContextLoad

You can specify a file other than the default one if needed, and clear the Show loaded variables
check box if you do not want to see the set context variables on the console. To run the Job using
the default settings, click OK.

The context parameters and content of the database table in the Test context are all displayed on
the Run console.
2. Now select the Production context and press F6 to launch the Job again. When the prompt dialog
box appears, simply click OK to run the Job using the default settings.

502
tContextLoad

The content of the database table in the Production context is displayed on the Run console.
Because the printOperations variable is set to false, the set context parameters are not displayed
on the console this time.

503
tConvertType

tConvertType
Converts one Talend java type to another automatically, and thus avoid compiling errors.
tConvertType allows specific conversions at runtime from one Talend java type to another.

tConvertType Standard properties


These properties are used to configure tConvertType running in the Standard Job framework.
The Standard tConvertType component belongs to the Processing family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Auto Cast This check box is selected by default. It performs an


automatic java type conversion.

Manual Cast This mode is not visible if the Auto Cast check box is
selected. It allows you to precise manually the columns
where a java type conversion is needed.

Set empty values to Null before converting This check box is selected to set the empty values of String
or Object type to null for the input data.

Die on error This check box is selected to kill the Job when an error
occurs.

504
tConvertType

Note:
Not available for Map/Reduce Jobs.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows read by an input component
or transferred to an output component. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component cannot be used as a start component as it


requires an input flow to operate.

Converting java types


This Java scenario describes a four-component Job where the tConvertType component is used to
convert Java types in three columns, and a tMap is used to adapt the schema and have as an output
the first of the three columns and the sum of the two others after conversion.

Note:
In this scenario, the input schemas for the input delimited file are stored in the repository, you can
simply drag and drop the relevant file node from Repository - Metadata - File delimited onto the
design workspace to automatically retrieve the tFileInputDelimited component's setting. For more
information, see Talend Studio User Guide.

505
tConvertType

Dropping the components


Procedure
1. Drop the following components from the Palette onto the design workspace: tConvertType, tMap,
and tLogRow.
2. In the Repository tree view, expand Metadata and from File delimited drag the relevant node,
JavaTypes in this scenario, to the design workspace.
The Components dialog box displays.
3. From the component list, select tFileInputDelimited and click Ok.
A tFileInputComponent called Java types displays in the design workspace.
4. Connect the components using Row > Main links.

Configuring the components


Procedure
1. Double-click tFileInputDelimited to enter its Basic settings view.
2. Set Property Type to Repository since the file details are stored in the repository. The fields to
follow are pre-defined using the fetched data.

The input file used in this scenario is called input. It is a text file that holds string, integer, and
float java types.

Fill in all other fields as needed. For more information, see tFileInputDelimited on page 1015.
In this scenario, the header and the footer are not set and there is no limit for the number of
processed rows.

506
tConvertType

3. Click Edit schema to describe the data structure of this input file. In this scenario, the schema is
made of three columns, StringtoInteger, IntegerField, and FloatToInteger.

4. Click Ok to close the dialog box.


5. Double-click tConvertType to enter its Basic settings view.

6. Set Schema Type to Built in, and click Sync columns to automatically retrieve the columns from
the tFileInputDelimited component.
7. Click Edit schema to describe manually the data structure of this processing component.

In this scenario, we want to convert a string type data into an integer type and a float type data
into an integer type.
Click OK to close the Schema of tConvertType dialog box.
8. Double-click tMap to open the Map editor.
The Map editor displays the input metadata of the tFileInputDelimited component

507
tConvertType

9. In the Schema editor panel of the Map editor, click the plus button of the output table to add two
rows and name them to StringToInteger and Sum.
10. In the Map editor, drag the StringToInteger row from the input table to the StringToInteger row in
the output table.
11. In the Map editor, drag each of the IntegerField and the FloatToInteger rows from the input table to
the Sum row in the output table and click OK to close the Map editor.

12. In the design workspace, select tLogRow and click the Component tab to define its basic settings.
For more information, see tLogRow on page 1977.

508
tConvertType

Executing the Job


Procedure
1. Press Ctrl+S to save the Job.
2. Press F6 to execute it.

The string type data is converted into an integer type and displayed in the StringToInteger column
on the console. The float type data is converted into an integer and added to the IntegerField
value to give the addition result in the Sum column on the console.

509
tCosmosDBBulkLoad

tCosmosDBBulkLoad
Imports data files in different formats (CSV, TSV or JSON) into the specified Cosmos database so that
the data can be further processed.

tCosmosDBBulkLoad Standard properties


These properties are used to configure tCosmosDBBulkLoad running in the Standard Job framework.
The Standard tCosmosDBBulkLoad component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

MongoDB directory Fill in this field with the MongoDB home directory.

Use replica set address or multiple query routers Select this check box to show the Server addresses table.
In the Server addresses table, define the sharded MongoDB
databases or the MongoDB replica sets you want to connect
to.

Server and Port Enter the IP address and listening port of the database
server.
Available when the Use replica set address or multiple
query routers check box is not selected.

Database Enter the name of the MongoDB database to be connected


to.

Collection Type in the name of the collection to import data to.

Drop collection if exist Select this check box to remove the collection if it already
exists.

510
tCosmosDBBulkLoad

Authentication mechanism Among the mechanisms listed on the Authentication


mechanism drop-down list, the NEGOTIATE one is
recommended if you are not using Kerberos, because it
automatically select the authentication mechanism the
most adapted to the MongoDB version you are using.
For details about the other mechanisms in this list,
see MongoDB Authentication from the MongoDB
documentation.

Set Authentication database If the username to be used to connect to MongoDB has


been created in a specific Authentication database of
MongoDB, select this check box to enter the name of this
Authentication database in the Authentication database
field that is displayed.
For further information about the MongoDB Authentication
database, see User Authentication database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Available when the Required authentication check box is
selected.
If the security system you have selected from the
Authentication mechanism drop-down list is Kerberos, you
need to enter the User principal, the Realm and the KDC
server fields instead of the Username and the Password
fields.

Data file Type in the full path of the file from which the data will be
imported or click the [...] button to browse to the desired
data file.
Make sure that the data file is in standard format. For
example, the fields in CSV files should be separated with
commas.

File type Select the proper file type from the list. CSV, TSV and JSON
are supported.

The JSON file starts with an array Select this check box to allow tCosmosDBBulkload to read
the JSON files starting with an array.
This check box appears when the File type you have
selected is JSON.

Action on data Select the action that you want to perform on the data.
• Insert: Insert the data into the database.
Note that when inserting data from CSV or TSV files
into the MongoDB database, you need to specify fields
either by selecting the First line is header check box or
defining them in the schema.
• Upsert: Insert the data if they do not exist or update
the existing data.
Note that when upserting data into the MongoDB
database, you need to specify a list of fields for the
query portion of the upsert operation.

511
tCosmosDBBulkLoad

Upsert fields Customize the fields that you want to upsert as needed.
This table is available when you select Upsert from the
Action on data list.

First line is header Select this check box to use the first line in CSV or TSV files
as a header.
This check box is available only when you select CSV or TSV
from the File type list.

Ignore blanks Select this check box to ignore the empty fields in CSV or
TSV files.
This check box is available only when you select CSV or TSV
from the File type list.

Print log Select this check box to print logs.

Advanced settings

Additional arguments Complete this table to use the additional arguments as


required.
For example, you can use the argument "--jsonArray" to
accept the import of data expressed with multiple MongoDB
documents within a single JSON array. For more information
about the additional arguments, go to http://docs.mo
ngodb.org/manual/reference/program/mongoimport/ and
read the description of options.

tStatCatcher Statistics Select this check box to collect the log data at a component
level.

Usage

Usage rule This component can be used together with the


tCosmosDBInput component to verify if the data is
imported as expected.

Limitation The MongoDB client tool needs to be installed on the


machine where Jobs using this component are executed.

512
tCosmosDBConnection

tCosmosDBConnection
Creates a connection to a CosmosDB database and reuse that connection in other components.

tCosmosDBConnection Standard properties


These properties are used to configure tCosmosDBConnection running in the Standard Job framework.
The Standard tCosmosDBConnection component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

API Select the database API to be used. Then the corresponding


parameters to be defined are displayed in the Component
view.
In the current version of this component, only the MongoDB
API is supported. For this reason, MongoDB database is
often mentioned in the documentation of the CosmosDB
components.

Use replica set address or multiple query routers Select this check box to show the Server addresses table.
In the Server addresses table, define the sharded MongoDB
databases or the MongoDB replica sets you want to connect
to.

Server and Port Enter the IP address and listening port of the database
server.
Available when the Use replica set address or multiple
query routers check box is not selected.

Database Enter the name of the MongoDB database to be connected


to.

Authentication mechanism Among the mechanisms listed on the Authentication


mechanism drop-down list, the NEGOTIATE one is
recommended if you are not using Kerberos, because it
automatically select the authentication mechanism the
most adapted to the MongoDB version you are using.
For details about the other mechanisms in this list,
see MongoDB Authentication from the MongoDB
documentation.

Set Authentication database If the username to be used to connect to MongoDB has


been created in a specific Authentication database of
MongoDB, select this check box to enter the name of this
Authentication database in the Authentication database
field that is displayed.
For further information about the MongoDB Authentication
database, see User Authentication database.

Username and Password DB user authentication data.

513
tCosmosDBConnection

To enter the password, click the [...] button next to the


password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Available when the Use authentication check box is
selected.
If the security system you have selected from the
Authentication mechanism drop-down list is Kerberos, you
need to enter the User principal, the Realm and the KDC
server fields instead of the Username and the Password
fields.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at a component
level.

No query timeout Select this check box to prevent MongoDB servers from
stopping idle cursors at the end of 10-minute inactivity of
these cursors. In this situation, an idle cursor will stay open
until either the results of this cursor are exhausted or you
manually close it using the cursor.close() method.
A cursor for MongoDB is a pointer to the result set of a
query. By default, that is to say, with this check box being
clear, a MongoDB server automatically stops idle cursors
after a given inactivity period to avoid excess memory use.
For further information about MongoDB cursors, see https://
docs.mongodb.org/manual/core/cursors/.

Usage

Usage rule This component is generally used with other CosmosDB


components, particularly tCosmosClose.

514
tCosmosDBInput

tCosmosDBInput
Retrieves certain documents from a Cosmos database collection by supplying a query document
containing the fields the desired documents should match.

tCosmosDBInput Standard properties


These properties are used to configure tCosmosDBInput running in the Standard Job framework.
The Standard tCosmosDBInput component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

API Select the database API to be used. Then the corresponding


parameters to be defined are displayed in the Component
view.
In the current version of this component, only the MongoDB
API is supported. For this reason, MongoDB database is
often mentioned in the documentation of the CosmosDB
components.

Use replica set address or multiple query routers Select this check box to show the Server addresses table.
In the Server addresses table, define the sharded MongoDB
databases or the MongoDB replica sets you want to connect
to.

Server and Port Enter the IP address and listening port of the database
server.
Available when the Use replica set address or multiple
query routers check box is not selected.

Database Enter the name of the MongoDB database to be connected


to.

Set read preference Select this check box and from the Read preference drop-
down list that is displayed, select the member to which you
need to direct the read operations.
If you leave this check box clear, the Job uses the default
Read preference, that is to say, uses the primary member in
a replica set.
For further information, see MongoDB's documentation
about Replication and its Read preferences.

Authentication mechanism Among the mechanisms listed on the Authentication


mechanism drop-down list, the NEGOTIATE one is
recommended if you are not using Kerberos, because it
automatically select the authentication mechanism the
most adapted to the MongoDB version you are using.

515
tCosmosDBInput

For details about the other mechanisms in this list,


see MongoDB Authentication from the MongoDB
documentation.

Set Authentication database If the username to be used to connect to MongoDB has


been created in a specific Authentication database of
MongoDB, select this check box to enter the name of this
Authentication database in the Authentication database
field that is displayed.
For further information about the MongoDB Authentication
database, see User Authentication database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Available when the Use authentication check box is
selected.
If the security system you have selected from the
Authentication mechanism drop-down list is Kerberos, you
need to enter the User principal, the Realm and the KDC
server fields instead of the Username and the Password
fields.

Collection Name of the collection in the database.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
If a column in the database is a JSON document and you
need to read the entire document, put an asterisk (*) in the
DB column column, without quotation marks around.

Query Specify the query condition. This field is available only


when you have selected Find query from the Query type
drop-down list.
For example, type in "{id:4}" to retrieve the record
whose id is 4 from the collection specified in the Collection
field.
Different from the query statements required in the
MongoDB client software, the query here refers to the
contents inside find(), such as the query here {id:4}

516
tCosmosDBInput

versus the MongoDB client query db.blog.find({


id:4}).

Mapping Each column of the schema defined for this component


represents a field of the documents to be read. In this table,
you need to specify the parent nodes of these fields, if any.
For example, in the document reading as follows

{
_id: ObjectId("5099
803df3f4948bd2f98391"),
person: { first:
"Joe", last: "Walker" }
}
The first and the last fields have person as their parent node
but the _id field does not have any parent node. So once
completed, this Mapping table should read as follows:

Column Parent node path


_id
first "person"
last "person"

Sort by Specify the column and choose the order for the sort
operation.
This field is available only when you have selected Find
query from the Query type drop-down list.

Limit Type in the maximum number of records to be retrieved.


This field is available only when you have selected Find
query from the Query type drop-down list.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at the
component level.

No query timeout Select this check box to prevent MongoDB servers from
stopping idle cursors at the end of 10-minute inactivity of
these cursors. In this situation, an idle cursor will stay open
until either the results of this cursor are exhausted or you
manually close it using the cursor.close() method.
A cursor for MongoDB is a pointer to the result set of a
query. By default, that is to say, with this check box being
clear, a MongoDB server automatically stops idle cursors
after a given inactivity period to avoid excess memory use.
For further information about MongoDB cursors, see https://
docs.mongodb.org/manual/core/cursors/.

Enable external sort Since the aggregation pipeline stages have a maximum
memory use limit (100 megabytes) and a stage exceeding
this limit will produce errors, when handling large datasets,
select this check box to avoid aggregation stages exceeding
this limit.
For further information about this external sort, see Large
sort operation with external sort.

517
tCosmosDBInput

Usage

Usage rule As a start component, tCosmosDBInput allows you to


retrieve records from a collection in the Cosmos database
and transfer them to the following component for display or
storage.

518
tCosmosDBOutput

tCosmosDBOutput
Inserts, updates, upserts or deletes documents in a Cosmos database collection based on the incoming
flow from the preceding component in the Job.

tCosmosDBOutput Standard properties


These properties are used to configure tCosmosDBOutput running in the Standard Job framework.
The Standard tCosmosDBOutput component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

API Select the database API to be used. Then the corresponding


parameters to be defined are displayed in the Component
view.
In the current version of this component, only the MongoDB
API is supported. For this reason, MongoDB database is
often mentioned in the documentation of the CosmosDB
components.

Use replica set address or multiple query routers Select this check box to show the Server addresses table.
In the Server addresses table, define the sharded MongoDB
databases or the MongoDB replica sets you want to connect
to.

Server and Port Enter the IP address and listening port of the database
server.
Available when the Use replica set address or multiple
query routers check box is not selected.

Database Enter the name of the MongoDB database to be connected


to.

Set write concern Select this check box to set the level of acknowledgemen
t requested from for write operations. Then you need to
select the level of this operation.
For further information, see the related MongoDB
documentation on http://docs.mongodb.org/manual/core/
write-concern/.

Bulk write Select this check box to insert, update or remove data in
bulk. Note this feature is available only when the version of
MongoDB you are using is 2.6+.
Then you need to select Ordered or Unordered to define
how the MongoDB database processes the data sent by the
Studio.

519
tCosmosDBOutput

• If you select Ordered, MongoDB processes the queries


sequentially.
• If you select Unordered, MongoDB optimizes the bulk
write operations without keeping the order in which
the individual operations were inserted in the bulk
write.
In the Bulk write size field, enter the size of each query
group to be processed by MongoDB. In the documentation
of MongoDB, some restrictions and expected behaviors as
to this size are explained. You can find the details on http://
docs.mongodb.org/manual/core/bulk-write-operations/.

Authentication mechanism Among the mechanisms listed on the Authentication


mechanism drop-down list, the NEGOTIATE one is
recommended if you are not using Kerberos, because it
automatically select the authentication mechanism the
most adapted to the MongoDB version you are using.
For details about the other mechanisms in this list,
see MongoDB Authentication from the MongoDB
documentation.

Set Authentication database If the username to be used to connect to MongoDB has


been created in a specific Authentication database of
MongoDB, select this check box to enter the name of this
Authentication database in the Authentication database
field that is displayed.
For further information about the MongoDB Authentication
database, see User Authentication database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Available when the Use authentication check box is
selected.
If the security system you have selected from the
Authentication mechanism drop-down list is Kerberos, you
need to enter the User principal, the Realm and the KDC
server fields instead of the Username and the Password
fields.

Collection Name of the collection in the database.

Drop collection if exist Select this check box to drop the collection if it already
exists.

Action on data The following operations are available:


• Insert: insert documents.
• Set: modifies the existing fields of an existing
document and appends a field if it does not exist in
this document.
If you need to apply this action on all the documents
in the collection to be used, select the Update all
document check box that is displayed; otherwise, only
the first document is updated.

520
tCosmosDBOutput

• Update: replaces the existing documents with the


incoming data but keeps the technical ID of these
documents.
• Upsert: inserts a document if it does not exist
otherwise it applies the same rules as Update.
• Upsert with set: inserts a document if it does not exist
otherwise it applies the same rules as Set
If you need to apply this action on all the documents
in the collection to be used, select the Update all
document check box that is displayed; otherwise, only
the first document is updated.
• Delete: delete documents.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Mapping Each column of the schema defined for this component


represents a field of the documents to be read. In this table,
you need to specify the parent nodes of these fields, if any.
For example, in the document reading as follows

{
_id: ObjectId("5099
803df3f4948bd2f98391"),
person: { first:
"Joe", last: "Walker" }
}

521
tCosmosDBOutput

The first and the last fields have person as their parent node
but the _id field does not have any parent node. So once
completed, this Mapping table should read as follows:

Column Parent node path


_id
first "person"
last "person"
Not available when the Generate JSON Document check box
is selected in Advanced settings.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

Generate JSON Document Select this check box for JSON configuration:
Configure JSON Tree: click the [...] button to open the
interface for JSON tree configuration. For more information,
see Configuring a JSON Tree on page 3897.
Group by: click the [+] button to add lines and choose the
input columns for grouping the records.
Remove root node: select this check box to remove the root
node.
Data node and Query node (available for update and upsert
actions): type in the name of data node and query node
configured on the JSON tree.
These nodes are mandatory for update and upsert actions.
They are intended to enable the update and upsert actions
though will not be stored in the database.

No query timeout Select this check box to prevent MongoDB servers from
stopping idle cursors at the end of 10-minute inactivity of
these cursors. In this situation, an idle cursor will stay open
until either the results of this cursor are exhausted or you
manually close it using the cursor.close() method.
A cursor for MongoDB is a pointer to the result set of a
query. By default, that is to say, with this check box being
clear, a MongoDB server automatically stops idle cursors
after a given inactivity period to avoid excess memory use.
For further information about MongoDB cursors, see https://
docs.mongodb.org/manual/core/cursors/.

tStatCatcher Statistics Select this check box to collect the log data at the
component level.

Usage

Usage rule tCosmosDBOutput executes the action defined on the


collection in the database based on the flow incoming from
the preceding component in the Job.

Limitation • The "multi" parameter, which allows to update multiple


documents at a time, is not supported. Therefore, if
two documents have the same key, the first is always
updated, but the second never will.

522
tCosmosDBOutput

• For the update operation, the key cannot be a JSON


array.

523
tCosmosDBRow

tCosmosDBRow
Executes the commands of the Cosmos database.

tCosmosDBRow Standard properties


These properties are used to configure tCosmosDBRow running in the Standard Job framework.
The Standard tCosmosDBRow component belongs to the Cloud and the Databases families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Use existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

API Select the database API to be used. Then the corresponding


parameters to be defined are displayed in the Component
view.
In the current version of this component, only the MongoDB
API is supported. For this reason, MongoDB database is
often mentioned in the documentation of the CosmosDB
components.

Use replica set address or multiple query routers Select this check box to show the Server addresses table.
In the Server addresses table, define the sharded MongoDB
databases or the MongoDB replica sets you want to connect
to.

Server and Port Enter the IP address and listening port of the database
server.
Available when the Use replica set address or multiple
query routers check box is not selected.

Database Enter the name of the MongoDB database to be connected


to.

Authentication mechanism Among the mechanisms listed on the Authentication


mechanism drop-down list, the NEGOTIATE one is
recommended if you are not using Kerberos, because it
automatically select the authentication mechanism the
most adapted to the MongoDB version you are using.
For details about the other mechanisms in this list,
see MongoDB Authentication from the MongoDB
documentation.

Set Authentication database If the username to be used to connect to MongoDB has


been created in a specific Authentication database of
MongoDB, select this check box to enter the name of this
Authentication database in the Authentication database
field that is displayed.

524
tCosmosDBRow

For further information about the MongoDB Authentication


database, see User Authentication database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Available when the Use authentication check box is
selected.
If the security system you have selected from the
Authentication mechanism drop-down list is Kerberos, you
need to enter the User principal, the Realm and the KDC
server fields instead of the Username and the Password
fields.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.

Execute command Select this check box to enter MongoDB commands in the
Command field for execution.
• Command: in this field, enter the command to be
executed, if this command contains one single
variable.
For example, if you need to construct the command

{"isMaster": 1}

You need simply enter isMaster within quotation


marks.
• Construct command from keys and values: if the
command to be executed contains multiple variables,
select this check box and in the Command keys and
values table, add the variables and their respective
values to be used.

525
tCosmosDBRow

For example, if you need to construct the following


command

{ renameCollection : "<source_names
pace>" , to : "<target_namespace>" ,
dropTarget : < true | false > }

You need to add three rows to the Command keys and


values table and enter one variable-value pair to each
row within quotation marks:

"renameCollection" "old_name"
"to" "new_name"
"dropTarget" "false"

• Construct command from a JSON string: if you want


to directly enter the command to be used, select this
check box and enter this command in the JSON string
command field that is displayed. Only one command is
allowed per tCosmosDBRow.
For example:

"{createIndexes: 'restaurants',
indexes : [{key : {restaurant_id
: 1}, name: 'id_index_2', unique:
true}]}"

Note that you must use single quotation marks to


surround the string values used in the command and
double quotation marks to surround the command
itself.
For further information about the MongoDB commands
you can use in this field, see https://docs.mongodb.org/
manual/reference/command/.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at the
component level.

Usage

Usage rule tCosmosDBRow allows you to manipulate the Cosmos


database through the MongoDB commands.

526
tCouchbaseDCPInput

tCouchbaseDCPInput
Queries the documents from the Couchbase database, under the Database Change Protocol (DCP), a
streaming protocol.

tCouchbaseDCPInput Standard properties


These properties are used to configure tCouchbaseDCPInput running in the Standard Job framework.
The Standard tCouchbaseDCPInput component belongs to the Databases NoSQL family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Bootstrap nodes Enter the name or IP of the node to be bootstrapped by


Couchbase SDK. As Couchbase recommends to specify
multiple nodes to bootstrap, enter the names or IPs of these
nodes in this field, separating them using commas (,).
For further information about Couchbase bootstrapping, see
How Couchbase SDKs connect to the cluster.
You can find the node names on the Servers page in your
Couchbase Web Console. If you need further information,
contact the administrator of your Couchbase cluster or
consult your Couchbase documentation.
Note that the Couchbase servers do not support proxies; for
this reason, the Couchbase components from Talend do not
support proxies either.

Password Provide the authentication credentials to a bucket.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
If you are using Couchbase V5.0 and onwards, enter the
same value you put in the Bucket field as password, because
since Couchbase V5.0, no password is associated with a
bucket. However, on Couchbase, you need to create a user
with appropriate role to access the buckets.
For further information about the access control and
other important requirements on the Couchbase side, see
Couchbase release note of your version.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
The schema of this component is read-only. The content
column stores the documents to be used, the key column
the IDs of these documents and the other columns the
Couchbase technical information.

Bucket Enter, within double quotation marks, the name of the data
bucket in the Couchbase database.

527
tCouchbaseDCPInput

Ensure that the credentials you are using have the


appropriate rights and permissions to access this bucket.
If you are using Couchbase V5.0 and onwards, this bucket
name is the user name you have created in the Security tab
of your Couchbase UI.

Advanced settings

Connect Timeout Define the timeout interval (in seconds) for the connection
to be aborted.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule As a start component, tCouchbaseDCPInput reads the


documents from the Couchbase database.

528
tCouchbaseDCPOutput

tCouchbaseDCPOutput
Upserts documents in the Couchbase database based on the incoming flat data from preceding
components, under the Database Change Protocol (DCP), a streaming protocol.
This means that it adds a new document or replaces its value if it already exists.

tCouchbaseDCPOutput Standard properties


These properties are used to configure tCouchbaseOutput running in the Standard Job framework.
The Standard tCouchbaseOutput component belongs to the Databases NoSQL family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Bucket Enter, within double quotation marks, the name of the data
bucket in the Couchbase database.
Ensure that the credentials you are using have the
appropriate rights and permissions to access this bucket.
If you are using Couchbase V5.0 and onwards, this bucket
name is the user name you have created in the Security tab
of your Couchbase UI.

Password Provide the authentication credentials to a bucket.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
If you are using Couchbase V5.0 and onwards, enter the
same value you put in the Bucket field as password, because
since Couchbase V5.0, no password is associated with a
bucket. However, on Couchbase, you need to create a user
with appropriate role to access the buckets.
For further information about the access control and
other important requirements on the Couchbase side, see
Couchbase release note of your version.

Bootstrap nodes Enter the name or IP of the node to be bootstrapped by


Couchbase SDK. As Couchbase recommends to specify
multiple nodes to bootstrap, enter the names or IPs of these
nodes in this field, separating them using commas (,).
For further information about Couchbase bootstrapping, see
How Couchbase SDKs connect to the cluster.
You can find the node names on the Servers page in your
Couchbase Web Console. If you need further information,
contact the administrator of your Couchbase cluster or
consult your Couchbase documentation.
Note that the Couchbase servers do not support proxies; for
this reason, the Couchbase components from Talend do not
support proxies either.

529
tCouchbaseDCPOutput

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Field to use as ID Enter, without double quotation marks, the name of the
column from the schema to provide IDs for the documents
to be written to Couchbase.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_SUCCESS: the number of rows successfully processed.
This is an After variable and it returns an integer.

530
tCouchbaseDCPOutput

NB_REJECT: the number of rows rejected. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Preceded by an input component, tCouchbaseDCPOutput


wraps flat data into documents for storage in the Couchbase
database.

531
tCouchbaseInput

tCouchbaseInput
Queries the documents from the Couchbase database.

tCouchbaseInput Standard properties


These properties are used to configure tCouchbaseInput running in the Standard Job framework.
The Standard tCouchbaseInput component belongs to the Databases NoSQL family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Bootstrap nodes Enter the name or IP of the node to be bootstrapped by


Couchbase SDK. As Couchbase recommends to specify
multiple nodes to bootstrap, enter the names or IPs of these
nodes in this field, separating them using commas (,).
For further information about Couchbase bootstrapping, see
How Couchbase SDKs connect to the cluster.
You can find the node names on the Servers page in your
Couchbase Web Console. If you need further information,
contact the administrator of your Couchbase cluster or
consult your Couchbase documentation.
Note that the Couchbase servers do not support proxies; for
this reason, the Couchbase components from Talend do not
support proxies either.

Username and Password Provide the authentication credentials to your Couchbase


cluster.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
If you are using Couchbase V5.0 and onwards, enter the
same value you put in the Bucket field as password, because
since Couchbase V5.0, no password is associated with a
bucket. However, on Couchbase, you need to create a user
with appropriate role to access the buckets.
For further information about the access control and
other important requirements on the Couchbase side, see
Couchbase release note of your version.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
When using non-JSON documents, define an id column of
the String type, then define a content column. The type
of this content column should be String for the string
documents and byte[] for the binary documents.
When it comes to JSON documents, define the the fields
that present in your JSON documents.

532
tCouchbaseInput

Bucket Enter, within double quotation marks, the name of the data
bucket in the Couchbase database.
Ensure that the credentials you are using have the
appropriate rights and permissions to access this bucket.
If you are using Couchbase V5.0 and onwards, this bucket
name is the user name you have created in the Security tab
of your Couchbase UI.

Document type Data stored in a Couchbase database could be JSON, strings


or binary. From this drop-down list, select the type of the
data you need to use with Couchbase.
Note that it is not recommended to mix JSON, binary and
string documents in a same bucket, as this mixture could
make the document processing error-prone.
If you need to use N1QL to query string or binary
documents, the only possible way is to use the document
ID to get the document. For example, if you need to get a
document for which the ID number is 2, the N1QL query
should be

SELECT meta().id as `_meta_id_` FROM


`bucket_name` where meta().id = '2';

Note that the quotations marks around _meta_id_ and


bucket_name are backticks (`).

Query Type Select the type of queries to be used from the following
options:
• Select All: select all the contents of a given bucket.
• N1QL: use a N1QL statement to perform fine-tuned
queries.
• Document ID: use the document IDs to select
documents. You need to enter the ID to be used in
theDocument ID field that is displayed. Only one
document ID is allowed per component.

Use N1QL query Select this check box and in the Query field that is
displayed, enter a N1QL query statement to perform
complex actions.
Only one statement is allowed and do not put quotation
marks around your statement.
• When you use wildcards in your query such as SELECT
*, the returned result of this query is wrapped in the
bucket name used in this query. In this situation, define
only one column for the result in the schema of this
component.
For example, when performing this query

SELECT * FROM `travel_sample` limit


3

533
tCouchbaseInput

The returned result is wrapped in the


travel_sample bucket, reading like this:

[
{
"travel_sample": {
"callsign": "MILE-AIR",
"country": "United States",
"iata": "Q5",
"icao": "MLA",
"id": 10,
"name": "40-Mile Air",
"type": "airline"
}
},
{
"travel_sample": {
"callsign": "TXW",
"country": "United States",
"iata": "TQ",
"icao": "TXW",
"id": 10123,
"name": "Texas Wings",
"type": "airline"
}
},
{
"travel_sample": {
"callsign": "atifly",
"country": "United States",
"iata": "A1",
"icao": "A1F",
"id": 10226,
"name": "Atifly",
"type": "airline"
}
}
]

In the schema, define one single column called, for


example, travel_sample to store the result and
select String as its type.
• If you use a query without wildcards, such as

SELECT callsign, country, iata,


icao, id, name, type FROM
`travel_sample` limit 3;

534
tCouchbaseInput

The returned result is not wrapped, reading like this:

[
{
"callsign": "MILE-AIR",
"country": "United States",
"iata": "Q5",
"icao": "MLA",
"id": 10,
"name": "40-Mile Air",
"type": "airline"
},
{
"callsign": "TXW",
"country": "United States",
"iata": "TQ",
"icao": "TXW",
"id": 10123,
"name": "Texas Wings",
"type": "airline"
},
{
"callsign": "atifly",
"country": "United States",
"iata": "A1",
"icao": "A1F",
"id": 10226,
"name": "Atifly",
"type": "airline"
}
]

In this situation, define the columns that represent


the structure of the actual business data, such as the
following columns: callsign, country, iata, icao,
id, name and airline.

Advanced settings

Connect Timeout Enter, without quotation marks, the timeout interval (in
seconds) for the connection to be aborted.

Limit rows Enter the maximum number of rows to be read. This field is
not available when you use a N1QL query.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

535
tCouchbaseInput

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule As a start component, tCouchbaseInput reads the documents


from the Couchbase database.

536
tCouchbaseOutput

tCouchbaseOutput
Upserts documents in the Couchbase database based on the incoming flat data from preceding
components.
This means that it adds a new document or replaces its value if it already exists.

tCouchbaseOutput Standard properties


These properties are used to configure tCouchbaseOutput running in the Standard Job framework.
The Standard tCouchbaseOutput component belongs to the Databases NoSQL family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Bootstrap nodes Enter the name or IP of the node to be bootstrapped by


Couchbase SDK. As Couchbase recommends to specify
multiple nodes to bootstrap, enter the names or IPs of these
nodes in this field, separating them using commas (,).
For further information about Couchbase bootstrapping, see
How Couchbase SDKs connect to the cluster.
You can find the node names on the Servers page in your
Couchbase Web Console. If you need further information,
contact the administrator of your Couchbase cluster or
consult your Couchbase documentation.
Note that the Couchbase servers do not support proxies; for
this reason, the Couchbase components from Talend do not
support proxies either.

Username and Password Provide the authentication credentials to your Couchbase


cluster.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
If you are using Couchbase V5.0 and onwards, enter the
same value you put in the Bucket field as password, because
since Couchbase V5.0, no password is associated with a
bucket. However, on Couchbase, you need to create a user
with appropriate role to access the buckets.
For further information about the access control and
other important requirements on the Couchbase side, see
Couchbase release note of your version.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
When using non-JSON documents, define an id column of
the String type, then define a content column. The type
of this content column should be String for the string
documents and byte[] for the binary documents.

537
tCouchbaseOutput

When it comes to JSON documents, define the the fields


that present in your JSON documents.

Bucket Enter, within double quotation marks, the name of the data
bucket in the Couchbase database.
Ensure that the credentials you are using have the
appropriate rights and permissions to access this bucket.
If you are using Couchbase V5.0 and onwards, this bucket
name is the user name you have created in the Security tab
of your Couchbase UI.

Document type Data stored in a Couchbase database could be JSON, strings


or binary. From this drop-down list, select the type of the
data you need to use with Couchbase.
Note that it is not recommended to mix JSON, binary and
string documents in a same bucket, as this mixture could
make the document processing error-prone.

Field to use as ID Enter, without double quotation marks, the name of the
column from the schema to provide IDs for the documents
to be written to Couchbase.

Partial update Select this check box to update only a subset of a


document, without changing any other property that is not
provided by the incoming data.
If you leave this check box, when a document already exists
in the database, that is to say, when this document and a
document from the incoming data have the same ID, the
whole existing document is replaced with the incoming one.

Use N1QL Query with parameters Select this check box to apply variables in your N1QL
queries. Once selecting it, the Query field and the Query
Parameters wraps flat data into documents for storage in
the Couchbase database. table are displayed for you to
enter your query and define the variables to be used in your
query.
Only one query is allowed per tCouchbaseOutput.
For example, enter this query in the Query field:

INSERT INTO 'travel-sample' (KEY,


VALUE)
VALUES
($nm,
{
"name":$nm,
"type":$tp,
"country":$cnty,
"callsign":$call,
"id":$zid
}
)

Then you need to define all of the variables (the strings


starting with $) used in this query in the Query Parameters
table.

Query Parameter Name Column


nm name
tp type
cnty countries
call company
zid docid

538
tCouchbaseOutput

This table creates a map between the variables in your


query and the columns from the schema you have defined
in the component for your data. The values in the Column
column are the column names from this schema; the values
in the Query Parameter Name column are the variables from
your query.

Advanced settings

Connect Timeout Enter, without quotation marks, the timeout interval (in
seconds) for the connection to be aborted.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_SUCCESS: the number of rows successfully processed.
This is an After variable and it returns an integer.
NB_REJECT: the number of rows rejected. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Preceded by an input component, tCouchbaseOutput

539
tCreateTable

tCreateTable
Creates a table for a specific type of database.

tCreateTable Standard properties


These properties are used to configure tCreateTable running in the Standard Job framework.
The Standard tCreateTable component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Database Type Select the type of the database. The connection properties
may differ slightly according to the database type selected.

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to
be shared in the Basic settings view of the connection
component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.

DB Version Select the version of the database.

Host The IP address or hostname of the database.

Port The listening port number of the database.

Database name The name of the database.

Schema The name of the database schema.

540
tCreateTable

This property is available for DB2, Exasol, Greenplum,


Informix, MS SQL Server, Oracle, PostgresPlus, Postgresql,
Redshift, Sybase, and Vertica database types.

Access File The path to the Access database file.

Firebird File The path to the Firebird database file.

Interbase File The path to the Interbase database file.

SQLite File The path to the SQLite database file.

Running Mode Select the Server Mode that corresponds to your database
setup.
This property is available only for the HSQLDb database
type.

Use TLS/SSL Sockets Select this check box to enable the security mode if
required.
This property is available only for the HSQLDb database
type.

DB Alias The name of the database.


This property is available only for the HSQLDb database
type.

Framework Type Select the framework type for your database.


This property is available only for the JavaDb database type.

DB Root Path Browse to your database root.


This property is available only for the JavaDb database type.

ODBC Name The name of the ODBC database.

Connection Type Select the Oracle database connection type.


• Oracle SID: select this connection type to uniquely
identify a particular database on a system.
• Oracle Service: select this connection type to use
the TNS alias that you give when you connect to the
remote database.
• Oracle OCI: select this connection type to use Oracle
Call Interface with a set of C-language software APIs
that provide an interface to the Oracle database.
• Oracle Custom: select this connection type to access a
clustered database.
• WALLET: select this connection type to store
credentials in an Oracle wallet.

Account In the Account field, enter, in double quotation marks, the


account name that has been assigned to you by Snowflake.
This property is available only for the Snowflake database
type.

Username and Password The database user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the

541
tCreateTable

password between double quotes and click OK to save the


settings.

Role Enter, in double quotation marks, the default access control


role to use to initiate the Snowflake session.
This role must already exist and has been granted to the
user ID you are using to connect to Snowflake. If this field
is left empty, the PUBLIC role is automatically granted. For
information about Snowflake access control model, see
Understanding the Access Control Model.
This property is available only for the Snowflake database
type.

Table name The name of the table to be created.

Table Action Select the action to be carried out on the table.


• Create table: the specified table doesn't exist and gets
created.
• Create table if not exists: the specified table is created
if it does not exist.
• Drop table if exits and create: the table is removed if it
already exists and gets created again.

Temporary Table Select this check box to create a temporary table during
an operation, which is automatically dropped at the end
of the operation. Since temporary tables exist in a special
schema, you cannot specify a schema name when creating a
temporary table, and the name of the temporary table must
be distinct from the name of any other table, sequence,
index, and view in the same schema.
Note that once you select to create a temporary table, you
should empty the values when you edit schema.
This field is available only when Postgresql is selected from
the Database Type drop-down list.

Unlogged Table Select this check box to create an unlogged table during an
operation. This way, data is loaded considerably faster than
an ordinary table where the data is logged and then written.
However, the data in an unlogged table is not crash-safe.
This field is available only when Postgresql is selected from
the Database Type drop-down list and Temporary Table is
not selected.

Case Sensitive Select this check box to make the table/column name case
sensitive.
This property is available only for the HSQLDb database
type.

Temporary Table Select this check box if you want to save the created table
temporarily.
This property is available only for the MySQL database type.

Create Select the type of the table to be created.


• SET TABLE: the table that does not allow duplicate
rows.
• MULTISET TABLE: the table that allows duplicate rows.

542
tCreateTable

This property is available only for the Teradata database


type.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Advanced settings

Additional JDBC Parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.
This property is available for the AS/400 and MSSQL Server
database types.

Create projection Select this check box to create a projection.


This property is available only for the Vertica database type.

Enforce database delimited identifiers Select this check box to enable delimited identifiers.
This property is available only for the Snowflake database
type.
For more information on delimited identifiers, see
https://docs.intersystems.com/latest/csp/docbook/
DocBook.UI.Page.cls?KEY=GSQL_identifiers.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

543
tCreateTable

Global Variables

QUERY The query statement being processed. This is a Flow


variable and it returns a string.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component can be used as a standalone component of


a Job or subJob.

Creating new table in a Mysql Database


The Job described below aims at creating a table in a database, made of a dummy schema taken from
a delimited file schema stored in the Repository. This Job is composed of a single component.

Procedure
Procedure
1. Drop a tCreateTable component from the Databases family in the Palette to the design
workspace.
2. In the Basic settings view, and from the Database Type list, select Mysql for this scenario.

3. From the Table Action list, select Create table.


4. Select the Use Existing Connection check box only if you are using a dedicated DB connection
component tMysqlConnection on page 2425. In this example, we won't use this option.
5. In the Property type field, select Repository so that the connection fields that follow are
automatically filled in. If you have not defined your DB connection metadata in the DB connection
directory under the Metadata node, fill in the details manually as Built-in .
6. In the Table Name field, fill in a name for the table to be created.
7. If you want to retrieve the Schema from the Metadata (it doesn't need to be a DB connection
Schema metadata), select Repository then the relevant entry.

544
tCreateTable

8. In any case (Built-in or Repository) click Edit Schema to check the data type mappingClick Edit
Schema to define the data structure.

9. Click the Reset DB Types button in case the DB type column is empty or shows discrepancies
(marked in orange). This allows you to map any data type to the relevant DB data type. Then, click
OK to validate your changes and close the dialog box.
10. Save your Job and press F6 to execute it.

Results
The table is created empty but with all columns defined in the Schema.

545
tCreateTemporaryFile

tCreateTemporaryFile
Creates a temporary file in a specified directory. This component allows you to either keep the
temporary file or delete it after the Job execution.

tCreateTemporaryFile Standard properties


These properties are used to configure tCreateTemporaryFile running in the Standard Job framework.
The Standard tCreateTemporaryFile component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Remove file when execution is over Select this check box to delete the temporary file after the
Job execution.

Use default temporary system directory Select this check box to create the file in the default system
temporary directory.

Directory Specify the directory under which the temporary file will be
created.
This field is available only when the Use default temporary
system directory check box is cleared.

Use Prefix Select this check box to specify to use a string as the prefix
of the temporary file name.
File name prefix string helps you prevent existing files from
being overwritten.

Prefix Specify the file name prefix string for the temporary file.
The prefix string needs to be at least three characters in
length.
To prevent existing files from being overwritten, it is
suggested to use a prefix string that is different from those
of any existing file names in the directory.
This option is available only when the Use Prefix check box
is selected.

Template Enter the temporary file name which should contain the
characters XXXX, such as talend_XXXX.
This option is unavailable when the Use Prefix check box is
selected.

Suffix Enter the filename extension of the temporary file.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component level.

546
tCreateTemporaryFile

Global Variables

Global Variables FILEPATH: the path where the file was created. This is an
After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as a standalone component of


a Job or subJob.

Connections Outgoing links (from this component to another):


Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

Incoming links (from one component to this one):


Row: Iterate.
Trigger: Run if; On Subjob Ok; On Subjob Error; On
component Ok; On Component Error; Synchronize;
Parallelize.

For further information regarding connections, see Talend


Studio User Guide.

Creating a temporary file and writing data into it


This scenario describes a Job that creates a temporary file in the default system temporary directory,
writes data into the file, and finally displays the data in the file on the console.

547
tCreateTemporaryFile

Adding and linking the components


Procedure
1. Create a new Job and add the following components by typing their names in the design
workspace or dropping them from the Palette: a tCreateTemporaryFile component, a tJava
component, a tRowGenerator component, a tFileOutputDelimited component, a tFileInputDeli
mited component, and a tLogRow component.
2. Connect tRowGenerator to tFileOutputDelimited using a Row > Main connection.
3. Do the same to connect tFileInputDelimited to tLogRow.
4. Connect tCreateTemporaryFile to tJava using a Trigger > OnSubjobOk connection.
5. Do the same to connect tJava to tRowGenerator and connect tRowGenerator to tFileInputDeli
mited.

Configuring the components


Creating the temporary file

Procedure
1. Double-click tCreateTemporaryFile to open its Basic settings view.

548
tCreateTemporaryFile

2. Select the Remove file when execution is over check box to delete the created temporary file after
the Job execution.
3. Select the Use default temporary system directory check box to create the file in the default
system temporary directory.
4. In the Template field, enter the temporary file name which should contain the characters XXXX. In
this example, it is talend_XXXX.
5. In the Suffix field, enter the filename extension of the temporary file. In this example, it is dat.
6. Double-click tJava to open its Basic settings view.

7. In the Code field, enter the following code to display the default system temporary directory and
the path to the temporary file that will be created on the console:

System.out.println("The default system temporary directory is:\r" + (String)System


.getProperty("java.io.tmpdir"));
System.out.println("The path to the temporary file is:\r" + (String)global
Map.get("tCreateTemporaryFile_1_FILEPATH"));

Writing the data into the file

Procedure
1. Double-click tRowGenerator to open its RowGenerator Editor.

549
tCreateTemporaryFile

2. Click the [+] button to add two columns: id of Integer type and name of String type. Then in the
Functions column, select the predefined function Numeric.sequence(String,int,int) for id and
TalendDataGenerator.getFirstName() for name.
3. In the Number of Rows for RowGenerator field, enter 5 to generate five rows.
4. Click OK to validate the changes and accept the propagation prompted by the pop-up dialog box.
5. Double-click tFileOutputDelimited to open its Basic settings view.

6. In the File Name field, press Ctrl+Space and from the global variable list displayed select
((String)globalMap.get("tCreateTemporaryFile_1_FILEPATH")).

Reading the data from the file

Procedure
1. Double-click tFileInputDelimited to open its Basic settings view.

550
tCreateTemporaryFile

2. In the File name/Stream field, press Ctrl+Space and from the global variable list displayed select
((String)globalMap.get("tCreateTemporaryFile_1_FILEPATH")).
3. Click the [...] button next to Edit schema and in the dialog box displayed define the schema by
adding two columns: id of Integer type and name of String type.

4. Click OK to validate the changes and accept the propagation prompted by the pop-up dialog box.
5. Double-click tLogRow to open its Basic settings view.

6. In the Mode area, select Table (print values in cells of a table) to display the output data in a
better way.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save the Job.

551
tCreateTemporaryFile

2. Press F6 or click Run on the Run tab to run the Job.

The file talend_MHTI.dat is created under the default system temporary directory C:\Users\lena_li
\AppData\Local\Temp\ during the Job execution, the five generated rows of data is written into it,
then the file is deleted after the Job execution.

552
tDB2BulkExec

tDB2BulkExec
Executes the Insert action on the provided data and gains in performance during Insert operations to
a DB2 database.

tDB2BulkExec Standard properties


These properties are used to configure tDB2BulkExec running in the Standard Job framework.
The Standard tDB2BulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

553
tDB2BulkExec

Table Schema Name of the DB schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create table: The table is removed and created
again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear table: The table content is deleted.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: You create the schema and store it locally for this
component only. Related topic: see Talend Studio User
Guide.

  Repository: You have already created the schema and


stored it in the Repository, hence can reuse it. Related topic:
see Talend Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Use Ingest Command Select this check box to populate data into DB2 using the
INGEST command. For more information about the INGEST
command, see http://www.ibm.com/developerworks/
data/library/techarticle/dm-1304ingestcmd and https://
www-01.ibm.com/support/knowledgecenter/SSEPGG_10

554
tDB2BulkExec

.1.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0057198.html?
cp=SSEPGG_10.1.0%2F3-5-2-4-59.

Load From Select the source of the data to be populated.


• FILE: loads data from a file.
• PIPE: loads data from a pipe.
• FOLDER: loads data from multiple files in a folder.
This list is available only when the Use Ingest Command
check box is selected.

Data File Name of the file to be loaded.

Warning:
This file is located on the machine specified by the URI
in the Host field so it should be on the same machine as
the database server.

This field is not visible when PIPE or FOLDER is selected


from the Load From drop-down list.

Pipe Name Enter the name of the pipe.


This field is available only when PIPE is selected from the
Load From drop-down list.

Folder Specify the path to the folder holding the files to be loaded.
This field is available only when FOLDER is selected from
the Load From drop-down list.

Action on Data On the data of the table defined, you can perform one of the
following operations:
• Insert: Add new records to the table. If duplicates are
found, Job stops.
• Replace: Add new records to the table. If an old record
in the table has the same value as a new record for
a PRIMARY KEY or a UNIQUE index, the old record is
deleted before the new record is inserted.
• Update: Make changes to existing records.
• Delete: Remove the records that match the input data.
• Merge: Merge the input data to the table.
Delete and Merge are available only when the Use Ingest
Command check box is selected.

File Glob Pattern Specify the global expression for the files to be loaded.
This field is available only when FOLDER is selected from
the Load From drop-down list.

Where Clause Enter the WHERE clause to filter the data to be processed.
This field is available only when update or delete is sel
ected from the Action on Data drop-down list.

Custom Insert Values Clause Select this check box and in the Insert Values Clause field
displayed enter the VALUES clause for the insert operation.
This check box is available only when the Use Ingest
Command check box is selected and insert is selected from
the Action on Data drop-down list.

555
tDB2BulkExec

Custom Update Set Clause Select this check box and specify the SET clause for the
update operation by completing the Set Mapping table.
This check box is available only when the Use Ingest
Command check box is selected and update is selected from
the Action on Data drop-down list.

Set Mapping Complete this table to specify the SET clause for the update
operation.
• Column: the name of the column. By default, the fields
in the Column column are same as what they are in the
schema.
• Expression: the expression for the corresponding
column.
This table is available only when the Custom Update Set
Clause check box is selected.

Merge Clause Specify the MERGE clause for the merge operation.
This table is available only when the Use Ingest Command
check box is selected and merge is selected from the Action
on Data drop-down list.

Content Format Select the format of the input file, either Delimited or
Positional.
This list is available only when the Use Ingest Command
check box is selected.

Delimited By Enter the character that separates the fields in the


delimited file.
This field is available only when Delimited is selected from
the Content Format drop-down list.

Optionally Enclosed By Enter the character that encloses the string in the delimited
file.
This field is available only when Delimited is selected from
the Content Format drop-down list.

Fixed Length Enter the length (in bytes) of the record in the positional
file.
This field is available only when Positional is selected from
the Content Format drop-down list.

Mapping Complete this table to specify the mapping relationship


between the source column and the DB2 table column.
• Column: the name of the column. By default, the fields
in the Column column are same as what they are in the
schema.
• Is Table Column: select the check box if the
corresponding column is a table column.
• Start Position: the starting position of the
corresponding column.
• End Position: the ending position of the corresponding
column.
The Start Position and End Position columns are
available only when Positional is selected from the
Content Format drop-down list.

556
tDB2BulkExec

This table is available only when the Use Ingest Command


check box is selected.

Script Generated Folder Specify the directory under which the script file will be cre
ated.
This field is available only when the Use Ingest Command
check box is selected.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating.

Note:
You can set the encoding parameters through this field.

Field terminated by Character, string or regular expression to separate fields.

Date Format Use this field to define the way months and days are or
dered.

Time Format Use this field to define the way hours, minutes and seconds
are ordered.

Timestamp Format Use this field to define the way date and time are ordered.

Remove load pending When the box is ticked, tables blocked in "pending" status
following a bulk load are de-blocked.

Load options Click + to add data loading options:


Parameter: select a loading parameter from the list.
Value: enter a value for the parameter selected.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
QUERY: the query statement processed. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

557
tDB2BulkExec

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This dedicated component offers performance and flexibility


of DB2 query handling.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For tDB2BulkExec related topics, see:
• Inserting transformed data in MySQL database on page 2482.
• Truncating and inserting file data into an Oracle database on page 2681.

558
tDB2Close

tDB2Close
Closes a transaction committed in the connected DB.

tDB2Close Standard properties


These properties are used to configure tDB2Close running in the Standard Job framework.
The Standard tDB2Close component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tDB2Connection component in the list if more


than one connection are planned for the current Job.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is to be used along with DB2 components,


especially with tDB2Connection and tDB2Commit.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

559
tDB2Close

Related scenarios
No scenario is available for the Standard version of this component yet.

560
tDB2Commit

tDB2Commit
Commits in one go a global transaction instead of doing that on every row or every batch and thus
provides gain in performance.
tDB2Commit validates the data processed through the Job into the connected DB.

tDB2Commit Standard properties


These properties are used to configure tDB2Commit running in the Standard Job framework.
The Standard tDB2Commit component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tDB2Connection component in the list if more


than one connection are planned for the current Job.

Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.

Warning:
If you want to use a Row > Main connection to link
tDB2Commit to your Job, your data will be committed row
by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other tDB2*
components, especially with the tDB2Connection and
tDB2Rollback components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces

561
tDB2Commit

s database tables having the same data structure but in


different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenario
For tDB2Commit related scenario, see Inserting data in mother/daughter tables on page 2426

562
tDB2Connection

tDB2Connection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.

tDB2Connection Standard properties


These properties are used to configure tDB2Connection running in the Standard Job framework.
The Standard tDB2Connection component belongs to the Databases and the ELT families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Host name Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

Table Schema Name of the schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.

563
tDB2Connection

This option is incompatible with the Use dynamic job and


Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.
This check box is not available when the Specify a data
source alias check box is selected.

Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box is not visible when the Use or register a
shared DB Connection check box is selected.

Data source alias Enter the alias of the data source created on the Talend
Runtime side.
This field is available only when the Specify a data source
alias check box is selected.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating.

Note:
You can set the encoding parameters through this field.

Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.

Usage

Usage rule This component is more commonly used with other


tDB2* components, especially with the tDB2Commit and
tDB2Rollback components.

564
tDB2Connection

Related scenarios
For tDB2Connection related scenario, see tMysqlConnection on page 2425

565
tDB2Input

tDB2Input
Executes a DB query with a strictly defined order which must correspond to the schema definition.
Then tDB2Input passes on the field list to the next component via a Row > Main link.
If double quotes exist in the column names of a table, the double quotation marks cannot be retrieved
when retrieving the column. Therefore, it is recommended not to use double quotes in column names
in a DB2 database table.

tDB2Input Standard properties


These properties are used to configure tDB2Input running in the Standard Job framework.
The Standard tDB2Input component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

566
tDB2Input

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Schema Name of the schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

567
tDB2Input

Table name Select the source table where to capture any changes made
on data.

Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.

Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box is not available when the Use an existing
connection check box is selected.

Data source alias Enter the alias of the data source created on the Talend
Runtime side.
This field is available only when the Specify a data source
alias check box is selected.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating.

Note:
You can set the encoding parameters through this field.

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Remove leading and trailing whitespace from defined


columns.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

568
tDB2Input

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule This component covers all possible SQL queries for DB2 da
tabases.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For related topics, see:
See also the related topic in Reading data from different MySQL databases using dynamically loaded
connection parameters on page 497.

569
tDB2Output

tDB2Output
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tDB2Output writes, updates, makes changes or suppresses entries in a database.

tDB2Output Standard properties


These properties are used to configure tDB2Output running in the Standard Job framework.
The Standard tDB2Output component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.

570
tDB2Output

For more information about setting up and storing database


connection parameters, see Talend Studio User Guide.

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Table schema Name of the DB schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
Default: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Truncate table: The table content is deleted. You do not
have the possibility to rollback the operation.
Truncate table with reuse storage: The table content is
deleted. You do not have the possibility to rollback the
operation. However, you can reuse the existing storage
allocated to the table, even if the storage is considered
empty.

Warning:
If you select the Use an existing connection check
box, and then select Truncate table or Truncate table
with reuse storage from the Action on table list, a
commit statement will be invoked before the truncate
operation because the truncate statement must be the
first statement in a transaction.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.

571
tDB2Output

Update or insert: Update the record with the given


reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.

Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You can
do that by clicking Edit Schema and selecting the check
box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings
view where you can simultaneously define primary keys for
the update and delete operations. To do that: Select the
Use field options check box and then in the Key in update
column, select the check boxes next to the column name on
which you want to base the update operation. Do the same
in the Key in delete column for the deletion operation.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .

572
tDB2Output

This check box is not available when the Use an existing


connection check box is selected.

Data source alias Enter the alias of the data source created on the Talend
Runtime side.
This field is available only when the Specify a data source
alias check box is selected.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Use alternate schema Select this option to use a schema other than the one
specified by the component that establishes the database
connection (that is, the component selected from the
Component list drop-down list in Basic settings view).
After selecting this option, provide the name of the desired
schema in the Schema field.
This option is available when Use an existing connection is
selected in Basic settings view.

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating.

Note:
You can set the encoding parameters through this field.

Commit every Enter the number of rows to be completed before


committing batches of rows together into the DB. This
option ensures transaction quality (but not rollback) and,
above all, better performance at execution.

Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.

  Name: Type in the name of the schema column to be


altered or inserted as new column

  SQL expression: Type in the SQL statement to be executed


in order to alter or insert the relevant column data.

  Position: Select Before, Replace or After following the


action to be performed on the reference column.

  Reference column: Type in a column of reference that the


tDBOutput can use to place or replace the new or altered
column.

Use field options Select this check box to customize a request, especially
when there is double action on data.

573
tDB2Output

Convert columns and table names to uppercase Select this check box to uppercase the names of the
columns and the name of the table.

Debug query mode Select this check box to display each step during processing
entries in a database.

Support null in "SQL WHERE" statement Select this check box if you want to deal with the Null
values contained in a DB table.

Note:
Make sure the Nullable check box is selected for the c
orresponding columns in the schema.

Use Batch Select this check box to activate the batch mode for data
processing.

Note:
This check box is available only when you have selected
the Insert, the Update or the Delete option in the Action
on data field.

Batch Size Specify the number of records to be processed in each


batch.
This field appears only when the Use batch mode check box
is selected.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

574
tDB2Output

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of a
table in a DB2 database. It also allows you to create a reject
flow using a Row > Rejects link to filter data in error. For
an example of tMySqlOutput in use, see Retrieving data in
error with a Reject link on page 2474.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For tDB2Output related topics, see
• Inserting a column and altering data using tMysqlOutput on page 2466.

575
tDB2Rollback

tDB2Rollback
Avoids to commit part of a transaction involuntarily.
tDB2Rollback cancels the transaction committed in the connected DB.

tDB2Rollback Standard properties


These properties are used to configure tDB2Rollback running in the Standard Job framework.
The Standard tDB2Rollback component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tDB2Connection component in the list if more


than one connection are planned for the current Job.

Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other tDB2*
components, especially with the tDB2Connection and
tDB2Commit components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection

576
tDB2Rollback

parameters on page 497. For more information on Dynamic


settings and context variables, see Talend Studio User
Guide.

Related scenarios
For tDB2Rollback related scenario, see Rollback from inserting data in mother/daughter tables on
page 2429 of the tMysqlRollback.

577
tDB2Row

tDB2Row
Acts on the actual DB structure or on the data (although without handling data) depending on
the nature of the query and the database. The SQLBuilder tool helps you write easily your SQL
statements.
tDB2Row is the specific component for this database query. It executes the SQL query stated onto
the specified database. The row suffix means the component implements a flow in the job design
although it doesn't provide output.

tDB2Row Standard properties


These properties are used to configure tDB2Row running in the Standard Job framework.
The Standard tDB2Row component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

578
tDB2Row

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type Either Built-in or Repository .

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared
connection pool defined in the data source configuration.
This option works only when you deploy and run your Job in
Talend Runtime .
This check box is not available when the Use an existing
connection check box is selected.

579
tDB2Row

Data source alias Enter the alias of the data source created on the Talend
Runtime side.
This field is available only when the Specify a data source
alias check box is selected.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating.

Note:
You can set the encoding parameters through this field.

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.

Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

tStatCatcher Statistics Select this check box to collect log data at the component
level.

580
tDB2Row

Global Variables

Global Variables  QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For tDB2Row related topics, see:
• Combining two flows for selective output on page 2503
• Procedure on page 622
• Removing and regenerating a MySQL table index on page 2497.

581
tDB2SCD

tDB2SCD
Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table
tDB2SCD reflects and tracks changes in a dedicated DB2 SCD table.

tDB2SCD Standard properties


These properties are used to configure tDB2SCD running in the Standard Job framework.
The Standard tDB2SCD component belongs to the Business Intelligence and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the Repository file where properties are


stored. The following fields are pre-filled in using fetched
data.

Host Database server IP address.

Port Listening port number of DB server.

582
tDB2SCD

Database Name of the database.

Table Schema Name of the DB schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

SCD Editor The SCD editor helps to build and configure the data flow
for slowly changing dimension outputs.
For more information, see SCD management methodology
on page 2511.

Use memory saving Mode Select this check box to maximize system performance.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating.

583
tDB2SCD

Note:
You can set the encoding parameters through this field.

End date time details Specify the time value of the SCD end date time setting in
the format of HH:mm:ss. The default value for this field is
12:00:00.
This field appears only when SCD Type 2 is used and Fixed
year value is selected for creating the SCD end date.

Debug mode Select this check box to display each step during
processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE_UPDATED: the number of rows updated. This is an


After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is used as Output component. It requires an


Input component and Row main link as input.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the

584
tDB2SCD

Component List box in the Basic settings view becomes


unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation This component does not support using SCD type 0 together
with other SCD types.

Related scenarios
For related topics, see tMysqlSCD on page 2508.

585
tDB2SCDELT

tDB2SCDELT
Addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and
logs the changes into a dedicated DB2 SCD table.

tDB2SCDELT Standard properties


These properties are used to configure tDB2SCDELT running in the Standard Job framework.
The Standard tDB2SCDELT component belongs to the Business Intelligence and the Databases
families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally. Enter properties


manually.

  Repository: Select the repository file where Properties are


stored. The fields that come after are pre-filled in using the
fetched data.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host The IP address of the database server.

Port Listening port number of database server.

586
tDB2SCDELT

Database Name of the database

UsernamePassword User authentication data for a dedicated database.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Source table Name of the input DB2 SCD table.

Table Name of the table to be written. Note that only one table
can be written at a time.

Action on table Select to perform one of the following operations on the


table defined:
None: No action carried out on the table.
Drop and create table: The table is removed and created
again
Create table: A new table gets created.
Create table if not exists: A table gets created if it does not
exist.
Clear table: The table content is deleted. You have the
possibility to rollback the operation.
Truncate table: The table content is deleted. You don not
have the possibility to rollback the operation.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Surrogate Key Select the surrogate key column from the list.

Creation Select the method to be used for the surrogate key ge


neration.

587
tDB2SCDELT

For more information regarding the creation methods, see


SCD management methodology on page 2511.

Source Keys Select one or more columns to be used as keys, to ensure


the unicity of incoming data.

Source fields value include Null Select this check box to allow the source columns to have
Null values.

Note:
The source columns here refer to the fields defined in
the SCD type 1 fields and SCD type 2 fields tables.

Use SCD Type 1 fields Use type 1 if tracking changes is not necessary. SCD Type
1 should be used for typos corrections for example. Select
the columns of the schema that will be checked for changes.

Use SCD Type 2 fields Use type 2 if changes need to be tracked down. SCD Type
2 should be used to trace updates for example. Select the
columns of the schema that will be checked for changes.

SCD type 2 fields Click the [+] button to add as many rows as needed, each
row for a column. Click the arrow on the right side of
the cell and select the column whose value changes will
be tracked using Type 2 SCD from the drop-down list
displayed .
This table is available only when the Use SCD type 2 fields
option is selected.

Start date Specify the column that holds the start date for type 2 SCD.
This list is available only when the Use SCD type 2 fields
option is selected.

End date Specify the column that holds the end date for type 2 SCD.
This list is available only when the Use SCD type 2 fields
option is selected.

Note: To avoid duplicated change records, it is


recommended to select a column that can identify each
change for this field.

Log active status Select this check box and from the Active field drop-down
list displayed, select the column that holds the true or false
status value, which helps to spot the active record for type 2
SCD.
This option is available only when the Use SCD type 2 fields
option is selected.

Log versions Select this check box and from the Version field drop-down
list displayed, select the column that holds the version
number of the record for type 2 SCD.
This option is available only when the Use SCD type 2 fields
option is selected.

588
tDB2SCDELT

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating.

Note:
You can set the encoding parameters through this field.

Debug mode Select this check box to display each step during
processing entries in a database.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is used as an output component. It requires


an input component and Row main link as input.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic

589
tDB2SCDELT

settings and context variables, see Talend Studio User


Guide.

Related Scenarios
For related scenarios,see:
• Tracking data changes in a Snowflake table using the tJDBCSCDELT component on page 1879.
• Tracking data changes in a PostgreSQL table using the tPostgreSQLSCDELT component on page
2948.

590
tDB2SP

tDB2SP
Offers a convenient way to call the database stored procedures.

tDB2SP Standard properties


These properties are used to configure tDB2SP running in the Standard Job framework.
The Standard tDB2SP component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.

591
tDB2SP

To enter the password, click the [...] button next to the


password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

SP Name Type in the exact name of the Stored Procedure

Is Function / Return result in Check this box, if a value only is to be returned.


Select on the list the schema column, the value to be
returned is based on.

Parameters Click the Plus button and select the various Schema
Columns that will be required by the procedures. Note
that the SP schema can hold more columns than there are
parameters used in the procedure.
Select the Type of parameter:
IN: Input parameter
OUT: Output parameter/return value
IN OUT: Input parameters is to be returned as value, likely
after modification through the procedure (function).
RECORDSET: Input parameters is to be returned as a set of
values, rather than single value.

Note:
Check Inserting data in mother/daughter tables on page
2426 if you want to analyze a set of records from a
database table or DB query and return single records.

Specify a data source alias Select this check box and specify the alias of a data source
created on the Talend Runtime side to use the shared

592
tDB2SP

connection pool defined in the data source configuration.


This option works only when you deploy and run your Job in
Talend Runtime .
This check box is not available when the Use an existing
connection check box is selected.

Data source alias Enter the alias of the data source created on the Talend
Runtime side.
This field is available only when the Specify a data source
alias check box is selected.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating.

Note:
You can set the encoding parameters through this field.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is used as intermediary component. It can


be used as start component but only input parameters are
thus allowed.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For related scenarios, see:

593
tDB2SP

• Retrieving personal information using a stored procedure on page 2404.


• Using tMysqlSP to find a State Label using a stored procedure on page 2528.
• Checking number format using a stored procedure on page 2735.
• Executing a stored procedure using tMDMSP on page 2180.
Check Inserting data in mother/daughter tables on page 2426 as well if you want to analyze a set of
records from a database table or DB query and return single records.

594
Dynamic database components

Dynamic database components

Talend provides a number of database components that allow you to change dynamically the type of
database you want to work on. These components are available in the Database Common group under
the Databases family of the Palette for standard data integration Jobs.
Each of these components has only one property, the Database list, on its Basic settings view for you
to select the type of database of your interest.
For more information on these dynamic database components, see:
• tDBBulkExec on page 596
• tDBClose on page 597
• tDBColumnList on page 598
• tDBCommit on page 599
• tDBConnection on page 600
• tDBInput on page 601
• tDBLastInsertId on page 603
• tDBOutput on page 604
• tDBOutputBulk on page 606
• tDBOutputBulkExec on page 607
• tDBRollback on page 608
• tDBRow on page 609
• tDBSCD on page 610
• tDBSCDELT on page 611
• tDBSP on page 612
• tDBTableList on page 613

595
tDBBulkExec

tDBBulkExec
Offers gains in performance while executing the Insert operations on a database.
This component works with a variety of databases depending on your selection.
The tDBOutputBulk and tDBBulkExec components are used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT statement
used to feed a database the selected database type. These two steps are fused together in the
tDBOutputBulkExec component, detailed in a separate section. The advantage of using two separate
steps is that the data can be transformed before it is loaded in the database.

tDBBulkExec Standard properties


These properties are used to configure tDBBulkExec running in the Standard Job framework.
The Standard tDBBulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessBulkExec on page 79)
• Amazon (tRedshiftBulkExec on page 2964)
• Greenplum (tGreenplumBulkExec on page 1311)
• IBM DB2 (tDB2BulkExec on page 553)
• Informix (tInformixBulkExec on page 1706)
• Ingres (tIngresBulkExec on page 1747)
• Microsoft SQL Server (tMSSqlBulkExec on page 2348)
• MySQL (tMysqlBulkExec on page 2412)
• Netezza (tNetezzaBulkExec on page 2616)
• Oracle (tOracleBulkExec on page 2676)
• ParAccel (tParAccelBulkExec on page 2803)
• PostgreSQL (tPostgresqlBulkExec on page 2906)
• PostgresPlus (tPostgresPlusBulkExec on page 2865)
• Snowflake (tSnowflakeBulkExec on page 3384)
• Sybase (ASE and IQ) (tSybaseBulkExec on page 3658)
• Sybase IQ (tSybaseIQBulkExec on page 3673)
• Vertica (tVerticaBulkExec on page 3822)

596
tDBClose

tDBClose
Closes the transaction committed in a connected database.
This component works with a variety of databases depending on your selection.

tDBClose Standard properties


These properties are used to configure tDBClose running in the Standard Job framework.
The Standard tDBClose component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessClose on page 82)
• Amazon Aurora (tAmazonAuroraClose on page 146)
• Amazon Mysql (tAmazonMysqlClose on page 185)
• Amazon Oracle (tAmazonOracleClose on page 207)
• Amazon Redshift (tRedshiftClose on page 2980)
• AS400 (tAS400Close on page 237)
• FireBird (tFirebirdClose on page 1179)
• Greenplum (tGreenplumClose on page 1315)
• IBM DB2 (tDB2Close on page 559)
• Exasol (tEXAClose on page 895)
• Informix (tInformixClose on page 1711)
• Ingres (tIngresClose on page 1751)
• Interbase (tInterbaseClose on page 1784)
• JDBC (tJDBCClose on page 1850)
• MemSQL (tMemSQLClose (deprecated))
• Microsoft SQL Server (tMSSqlClose on page 2353)
• MySQL (tMysqlClose on page 2416)
• Netezza (tNetezzaClose on page 2620)
• Oracle (tOracleClose on page 2684)
• ParAccel (tParAccelClose on page 2807)
• PostgreSQL (tPostgresqlClose on page 2910)
• PostgresPlus (tPostgresPlusClose on page 2869)
• SAPHana (tSAPHanaClose on page 3303)
• SQLite (tSQLiteClose on page 3504)
• Snowflake (tSnowflakeClose on page 3398)
• Sybase (ASE and IQ) (tSybaseClose on page 3663)
• Teradata (tTeradataClose on page 3726)
• Vertica (tVerticaClose on page 3828)

597
tDBColumnList

tDBColumnList
Iterates on all columns of a given database table and lists column names.
This component works with a variety of databases depending on your selection.

tDBColumnList Standard properties


These properties are used to configure tDBColumnList running in the Standard Job framework.
The Standard tDBColumnList component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Microsoft SQL Server (tMSSqlColumnList on page 2355)
• MySQL (tMysqlColumnList on page 2418)

598
tDBCommit

tDBCommit
Validates the data processed through the Job into the connected database.
This component works with a variety of databases depending on your selection.

tDBCommit Standard properties


These properties are used to configure tDBCommit running in the Standard Job framework.
The Standard tDBCommit component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessCommit on page 84)
• Amazon Aurora (tAmazonAuroraCommit on page 148)
• Amazon Mysql (tAmazonMysqlCommit on page 187)
• Amazon Oracle (tAmazonOracleCommit on page 209)
• AS400 (tAS400Commit on page 239)
• Amazon Redshift (tRedshiftCommit on page 2982)
• FireBird (tFirebirdCommit on page 1181)
• Greenplum (tGreenplumCommit on page 1317)
• IBM DB2 (tDB2Commit on page 561)
• Exasol (tEXACommit on page 897)
• Informix (tInformixCommit on page 1713)
• Ingres (tIngresCommit on page 1753)
• Interbase (tInterbaseCommit on page 1786)
• JDBC (tJDBCCommit on page 1854)
• Microsoft SQL Server (tMSSqlCommit on page 2358)
• MySQL (tMysqlCommit on page 2423)
• Netezza (tNetezzaCommit on page 2622)
• Oracle (tOracleCommit on page 2686)
• ParAccel (tParAccelCommit on page 2809)
• PostgreSQL (tPostgresqlCommit on page 2912)
• PostgresPlus (tPostgresPlusCommit on page 2871)
• SAPHana (tSAPHanaCommit on page 3304)
• SQLite (tSQLiteCommit on page 3506)
• Sybase (ASE and IQ) (tSybaseCommit on page 3665)
• Teradata (tTeradataCommit on page 3728)
• VectorWise (tVectorWiseCommit on page 3803)
• Vertica (tVerticaCommit on page 3830)

599
tDBConnection

tDBConnection
Opens a connection to a database to be reused in the subsequent subJob or subJobs.
This component works with a variety of databases depending on your selection.

tDBConnection Standard properties


These properties are used to configure tDBConnection running in the Standard Job framework.
The Standard tDBConnection component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessConnection on page 86)
• Amazon Aurora (tAmazonAuroraConnection on page 150)
• Amazon Mysql (tAmazonMysqlConnection on page 189)
• Amazon Oracle (tAmazonOracleConnection on page 211)
• Amazon Redshift (tRedshiftConnection on page 2984)
• AS400 (tAS400Connection on page 241)
• Exasol (tEXAConnection on page 899)
• FireBird (tFirebirdConnection on page 1183)
• Greenplum (tGreenplumConnection on page 1319)
• IBM DB2 (tDB2Connection on page 563)
• Informix (tInformixConnection on page 1715)
• Ingres (tIngresConnection on page 1755)
• Interbase (tInterbaseConnection on page 1788)
• JDBC (tJDBCConnection on page 1856)
• MemSQL (tMemSQLConnection (deprecated))
• Microsoft SQL Server (tMSSqlConnection on page 2360)
• MySQL (tMysqlConnection on page 2425)
• Netezza (tNetezzaConnection on page 2624)
• Oracle (tOracleConnection on page 2688)
• ParAccel (tParAccelConnection on page 2811)
• PostgreSQL (tPostgresqlConnection on page 2914)
• PostgresPlus (tPostgresPlusConnection on page 2873)
• SAPHana (tSAPHanaConnection on page 3306)
• SQLite (tSQLiteConnection on page 3508)
• Snowflake (tSnowflakeConnection on page 3401)
• Sybase (ASE and IQ) (tSybaseConnection on page 3667)
• Teradata (tTeradataConnection on page 3730)
• VectorWise (tVectorWiseConnection on page 3805)
• Vertica (tVerticaConnection on page 3832)

600
tDBInput

tDBInput
Extracts data from a database.
This component works with a variety of databases depending on your selection.

tDBInput Standard properties


These properties are used to configure tDBInput running in the Standard Job framework.
The Standard tDBInput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessInput on page 91)
• Amazon Aurora (tAmazonAuroraInput on page 153)
• Amazon Mysql (tAmazonMysqlInput on page 192)
• Amazon Oracle (tAmazonOracleInput on page 214)
• Amazon Redshift (tRedshiftInput on page 2987)
• AS400 (tAS400Input on page 243)
• Exasol (tEXAInput on page 902)
• FireBird (tFirebirdInput on page 1185)
• Greenplum (tGreenplumInput on page 1327)
• IBM DB2 (tDB2Input on page 566)
• Informix (tInformixInput on page 1717)
• Ingres (tIngresInput on page 1757)
• Interbase (tInterbaseInput on page 1790)
• JDBC (tJDBCInput on page 1861)
• MemSQL (tMemSQLInput (deprecated))
• Microsoft SQL Server (tMSSqlInput on page 2368)
• MySQL (tMysqlInput on page 2437)
• Netezza (tNetezzaInput on page 2626)
• Oracle (tOracleInput on page 2692)
• ParAccel (tParAccelInput on page 2813)
• PostgreSQL (tPostgresqlInput on page 2916)
• PostgresPlus (tPostgresPlusInput on page 2875)
• SAPHana (tSAPHanaInput on page 3308)
• SAS (tSasInput (deprecated))
• SQLite (tSQLiteInput on page 3510)
• Snowflake (tSnowflakeInput on page 3404)
• Sybase (ASE and IQ) (tSybaseInput on page 3669)
• Teradata (tTeradataInput on page 3742)
• VectorWise (tVectorWiseInput on page 3807)

601
tDBInput

• Vertica (tVerticaInput on page 3834)

602
tDBLastInsertId

tDBLastInsertId
Obtains the primary key value of the record that was last inserted in a database table by a user.
This component works with a variety of databases depending on your selection.

tDBLastInsertId Standard properties


These properties are used to configure tDBLastInsertId running in the Standard Job framework.
The Standard tDBLastInsertId component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• AS400 (tAS400LastInsertId on page 250)
• Microsoft SQL Server (tMSSqlLastInsertId on page 2372)
• MySQL (tMysqlLastInsertId on page 2453)

603
tDBOutput

tDBOutput
Writes, updates, makes changes or suppresses entries in a database.
This component works with a variety of databases depending on your selection.

tDBOutput Standard properties


These properties are used to configure tDBOutput running in the Standard Job framework.
The Standard tDBOutput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessOutput on page 95)
• Amazon Aurora (tAmazonAuroraOutput on page 163)
• Amazon Mysql (tAmazonMysqlOutput on page 195)
• Amazon Oracle (tAmazonOracleOutput on page 218)
• Amazon Redshift (tRedshiftOutput on page 2996)
• AS400 (tAS400Output on page 252)
• Exasol (tEXAOutput on page 906)
• FireBird (tFirebirdOutput on page 1189)
• Greenplum (tGreenplumOutput on page 1330)
• IBM DB2 (tDB2Output on page 570)
• Informix (tInformixOutput on page 1720)
• Ingres (tIngresOutput on page 1761)
• Interbase (tInterbaseOutput on page 1794)
• JDBC (tJDBCOutput on page 1865)
• MemSQL (tMemSQLOutput (deprecated))
• Microsoft SQL Server (tMSSqlOutput on page 2375)
• MySQL (tMysqlOutput on page 2460)
• Netezza (tNetezzaOutput on page 2637)
• Oracle (tOracleOutput on page 2699)
• ParAccel (tParAccelOutput on page 2817)
• PostgreSQL (tPostgresqlOutput on page 2920)
• PostgresPlus (tPostgresPlusOutput on page 2879)
• SAPHana (tSAPHanaOutput on page 3312)
• SAS (tSasOutput (deprecated))
• SQLite (tSQLiteOutput on page 3515)
• Snowflake (tSnowflakeOutput on page 3412)
• Sybase (ASE and IQ) (tSybaseOutput on page 3689)
• Teradata (tTeradataOutput on page 3749)
• VectorWise (tVectorWiseOutput on page 3811)

604
tDBOutput

• Vertica (tVerticaOutput on page 3838)

605
tDBOutputBulk

tDBOutputBulk
Writes a file with columns based on the defined delimiter and the standards of the selected database
type.
This component works with a variety of databases depending on your selection.
The tDBOutputBulk and tDBBulkExec components are used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT statement
used to feed a database the selected database type. These two steps are fused together in the
tDBOutputBulkExec component, detailed in a separate section. The advantage of using two separate
steps is that the data can be transformed before it is loaded in the database.

tDBOutputBulk Standard properties


These properties are used to configure tDBOutputBulk running in the Standard Job framework.
The Standard tDBOutputBulk component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessOutputBulk on page 101)
• Amazon Redshift (tRedshiftOutputBulk on page 3002)
• Greenplum (tGreenplumOutputBulk on page 1336)
• Informix (tInformixOutputBulk on page 1726)
• Ingres (tIngresOutputBulk on page 1766)
• Microsoft SQL Server (tMSSqlOutputBulk on page 2382)
• MySQL (tMysqlOutputBulk on page 2480)
• Oracle (tOracleOutputBulk on page 2706)
• ParAccel (tParAccelOutputBulk on page 2823)
• PostgreSQL (tPostgresqlOutputBulk on page 2927)
• PostgresPlus (tPostgresPlusOutputBulk on page 2885)
• Snowflake (tSnowflakeOutputBulk on page 3416)
• Sybase (ASE and IQ) (tSybaseOutputBulk on page 3695)
• Vertica (tVerticaOutputBulk on page 3844)

606
tDBOutputBulkExec

tDBOutputBulkExec
Executes the Insert action in a database.
This component works with a variety of databases depending on your selection.
The tDBOutputBulk and tDBBulkExec components are used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT statement
used to feed a database the selected database type. These two steps are fused together in the
tDBOutputBulkExec component, detailed in a separate section. The advantage of using two separate
steps is that the data can be transformed before it is loaded in the database.

tDBOutputBulkExec Standard properties


These properties are used to configure tDBOutputBulkExec running in the Standard Job framework.
The Standard tDBOutputBulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessOutputBulk on page 101)
• Amazon Redshift (tRedshiftOutputBulk on page 3002)
• Greenplum (tGreenplumOutputBulk on page 1336)
• Informix (tInformixOutputBulk on page 1726)
• Ingres (tIngresOutputBulk on page 1766)
• Microsoft SQL Server (tMSSqlOutputBulk on page 2382)
• MySQL (tMysqlOutputBulk on page 2480)
• Oracle (tOracleOutputBulk on page 2706)
• ParAccel (tParAccelOutputBulk on page 2823)
• PostgreSQL (tPostgresqlOutputBulk on page 2927)
• PostgresPlus (tPostgresPlusOutputBulk on page 2885)
• Snowflake (tSnowflakeOutputBulkExec on page 3423)
• Sybase (ASE and IQ) (tSybaseOutputBulk on page 3695)
• Vertica (tVerticaOutputBulk on page 3844)

607
tDBRollback

tDBRollback
Cancels the transaction commit in a connected database to avoid committing part of a transaction
involuntarily.
This component works with a variety of databases depending on your selection.

tDBRollback Standard properties


These properties are used to configure tDBRollback running in the Standard Job framework.
The Standard tDBRollback component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessRollback on page 108)
• Amazon Aurora (tAmazonAuroraRollback on page 170)
• Amazon Mysql (tAmazonMysqlRollback on page 201)
• Amazon Oracle (tAmazonOracleRollback on page 224)
• Amazon Redshift (tRedshiftRollback on page 3014)
• AS400 (tAS400Rollback on page 257)
• Exasol (tEXARollback on page 912)
• FireBird (tFirebirdRollback on page 1194)
• Greenplum (tGreenplumRollback on page 1342)
• IBM DB2 (tDB2Rollback on page 576)
• Informix (tInformixRollback on page 1733)
• Ingres (tIngresRollback on page 1775)
• Interbase (tInterbaseRollback on page 1800)
• JDBC (tJDBCRollback on page 1870)
• Microsoft SQL Server (tMSSqlRollback on page 2390)
• MySQL (tMysqlRollback on page 2491)
• Netezza (tNetezzaRollback on page 2643)
• Oracle (tOracleRollback on page 2715)
• ParAccel (tParAccelRollback on page 2830)
• PostgreSQL (tPostgresqlRollback on page 2934)
• PostgresPlus (tPostgresPlusRollback on page 2891)
• SAPHana (tSAPHanaRollback on page 3318)
• SQLite (tSQLiteRollback on page 3520)
• Sybase (ASE and IQ) (tSybaseRollback on page 3703)
• Teradata (tTeradataRollback on page 3755)
• VectorWise (tVectorWiseRollback on page 3816)
• Vertica (tVerticaRollback on page 3852)

608
tDBRow

tDBRow
Executes the stated SQL query onto a database.
This component works with a variety of databases depending on your selection.

tDBRow Standard properties


These properties are used to configure tDBRow running in the Standard Job framework.
The Standard tDBRow component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Access (tAccessRow on page 110)
• Amazon Mysql (tAmazonMysqlRow on page 203)
• Amazon Oracle (tAmazonOracleRow on page 226)
• Amazon Redshift (tRedshiftRow on page 3016)
• AS400 (tAS400Row on page 259)
• Exasol (tEXARow on page 914)
• FireBird (tFirebirdRow on page 1196)
• Greenplum (tGreenplumRow on page 1344)
• IBM DB2 (tDB2Row on page 578)
• Informix (tInformixRow on page 1735)
• Ingres (tIngresRow on page 1777)
• Interbase (tInterbaseRow on page 1802)
• JDBC (tJDBCRow on page 1872)
• MemSQL (tMemSQLRow (deprecated))
• Microsoft SQL Server (tMSSqlRow on page 2392)
• MySQL (tMysqlRow on page 2493)
• Netezza (tNetezzaRow on page 2645)
• Oracle (tOracleRow on page 2717)
• ParAccel (tParAccelRow on page 2832)
• PostgreSQL (tPostgresqlRow on page 2936)
• PostgresPlus (tPostgresPlusRow on page 2893)
• SAPHana (tSAPHanaRow on page 3319)
• SQLite (tSQLiteRow on page 3522)
• Snowflake (tSnowflakeRow on page 3440)
• Sybase (ASE and IQ) (tSybaseRow on page 3705)
• Teradata (tTeradataRow on page 3757)
• VectorWise (tVectorWiseRow on page 3818)
• Vertica (tVerticaRow on page 3854)

609
tDBSCD

tDBSCD
Reflects and tracks changes in a dedicated database SCD table.
This component works with a variety of databases depending on your selection.

tDBSCD Standard properties


These properties are used to configure tDBSCD running in the Standard Job framework.
The Standard tDBSCD component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Greenplum (tGreenplumSCD on page 1348)
• IBM DB2 (tDB2SCD on page 582)
• Informix (tInformixSCD on page 1739)
• Ingres (tIngresSCD on page 1781)
• Microsoft SQL Server (tMSSqlSCD on page 2397)
• MySQL (tMysqlSCD on page 2508)
• Netezza (tNetezzaSCD on page 2649)
• Oracle (tOracleSCD on page 2722)
• ParAccel (tParAccelSCD on page 2836)
• PostgreSQL (tPostgresqlSCD on page 2940)
• PostgresPlus (tPostgresPlusSCD on page 2897)
• Sybase (ASE and IQ) (tSybaseSCD on page 3709)
• Teradata (tTeradataSCD on page 3762)
• Vertica (tVerticaSCD on page 3858)

610
tDBSCDELT

tDBSCDELT
Reflects and tracks changes in a dedicated SCD table through SQL queries.
This component works with a variety of databases depending on your selection.

tDBSCDELT Standard properties


These properties are used to configure tDBSCDELT running in the Standard Job framework.
The Standard tDBSCDELT component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• IBM DB2 (tDB2SCDELT on page 586)
• MySQL (tMysqlSCDELT on page 2522)
• Oracle (tOracleSCDELT on page 2726)
• PostgreSQL (tPostgresqlSCDELT on page 2944)
• PostgresPlus (tPostgresPlusSCDELT on page 2901)
• Sybase (ASE and IQ) (tSybaseSCDELT on page 3713)
• Teradata (tTeradataSCDELT on page 3766)

611
tDBSP

tDBSP
Calls a database stored procedure.
This component works with a variety of databases depending on your selection.

tDBSP Standard properties


These properties are used to configure tDBSP running in the Standard Job framework.
The Standard tDBSP component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• IBM DB2 (tDB2SP on page 591)
• Informix (tInformixSP on page 1743)
• JDBC (tJDBCSP on page 1889)
• Microsoft SQL Server (tMSSqlSP on page 2401)
• MySQL (tMysqlSP on page 2526)
• Oracle (tOracleSP on page 2731)
• Sybase (ASE and IQ) (tSybaseSP on page 3718)

612
tDBTableList

tDBTableList
Lists the names of specified database tables using a SELECT statement based on a WHERE clause.
This component works with a variety of databases depending on your selection.

tDBTableList Standard properties


These properties are used to configure tDBSP running in the Standard Job framework.
The Standard tDBSP component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings
This component serves as an entry point for the following databases. To configure this component,
select a type of database from the Database list and click Apply on its Basic settings view. For more
information about specific database properties, see the relevant documentation:
• Microsoft SQL Server (tMSSqlTableList on page 2410)
• MySQL (tMysqlTableList on page 2532)
• Oracle (tOracleTableList on page 2739)

613
tDBFSConnection

tDBFSConnection
Connects to a given DBFS (Databricks Filesystem) system so that the other DBFS components can
reuse the connection it creates to communicate with this DBFS.
The DBFS (Databricks Filesystem) components are designed for quick and straightforward data
transferring with Databricks. If you need to handle more sophisticated scenarios for optimal
performance, use Spark Jobs with Databricks.

tDBFSConnection Standard properties


These properties are used to configure tDBFSConnection running in the Standard Job framework.
The Standard tDBFSConnection component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Endpoint In the Endpoint field, enter the URL address of your Azure
Databricks workspace. This URL can be found in the
Overview blade of your Databricks workspace page on your
Azure portal. For example, this URL could look like https://
westeurope.azuredatabricks.net.

Token Click the [...] button next to the Token field to enter the
authentication token generated for your Databricks user
account. You can generate or find this token on the User
settings page of your Databricks workspace. For further
information, see Token management from the Azure
documentation.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Usage

Usage rule This component is generally used with other DBFS c


omponents.

614
tDBFSGet

tDBFSGet
Copies files from a given DBFS (Databricks Filesystem) system, pastes them in a user-defined directory
and if needs be, renames them.
The DBFS (Databricks Filesystem) components are designed for quick and straightforward data
transferring with Databricks. If you need to handle more sophisticated scenarios for optimal
performance, use Spark Jobs with Databricks.

tDBFSGet Standard properties


These properties are used to configure tDBFSGet running in the Standard Job framework.
The Standard tDBFSGet component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Endpoint In the Endpoint field, enter the URL address of your Azure
Databricks workspace. This URL can be found in the
Overview blade of your Databricks workspace page on your
Azure portal. For example, this URL could look like https://
westeurope.azuredatabricks.net.

Token Click the [...] button next to the Token field to enter the
authentication token generated for your Databricks user
account. You can generate or find this token on the User
settings page of your Databricks workspace. For further
information, see Token management from the Azure
documentation.

DBFS directory In the DBFS directory field, enter the path pointing to the
data to be used in the DBFS file system.

Local directory Browse to, or enter the local directory to store the files
copied from DBFS.

Overwrite file Options to overwrite or not the existing file with the new
one.

615
tDBFSGet

Include subdirectories Select this check box if the selected input source type
includes sub-directories.

Files In the Files area, the fields to be completed are:


- File mask: type in the file name to be selected from HDFS.
Regular expression is available.
- New name: give a new name to the obtained file.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Usage

Usage rule This component combines DBFS connection and data


extraction, thus used as a single-component subJob to copy
data from DBFS to an user-defined local directory.
It runs standalone and does not generate input or output
flow for the other components. It is often connected to the
Job using OnSubjobOk or OnComponentOk link, depending
on the context.

616
tDBFSPut

tDBFSPut
Connects to a given DBFS (Databricks Filesystem) system, copies files from an user-defined directory,
pastes them in this system and if needs be, renames these files.
The DBFS (Databricks Filesystem) components are designed for quick and straightforward data
transferring with Databricks. If you need to handle more sophisticated scenarios for optimal
performance, use Spark Jobs with Databricks.

tDBFSPut Standard properties


These properties are used to configure tDBFSPut running in the Standard Job framework.
The Standard tDBFSPut component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Endpoint In the Endpoint field, enter the URL address of your Azure
Databricks workspace. This URL can be found in the
Overview blade of your Databricks workspace page on your
Azure portal. For example, this URL could look like https://
westeurope.azuredatabricks.net.

Token Click the [...] button next to the Token field to enter the
authentication token generated for your Databricks user
account. You can generate or find this token on the User
settings page of your Databricks workspace. For further
information, see Token management from the Azure
documentation.

DBFS directory In the DBFS directory field, enter the path pointing to the
data to be used in the DBFS file system.

Local directory Local directory where are stored the files to be loaded into
DBFS.

Overwrite file Options to overwrite or not the existing file with the new
one.

617
tDBFSPut

Include subdirectories Select this check box if the selected input source type
includes sub-directories.

Files In the Files area, the fields to be completed are:


- File mask: type in the file name to be selected from the
local directory. Regular expression is available.
- New name: give a new name to the loaded file.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Usage

Usage rule This component combines DBFS connection and data


extraction, thus usually used as a single-component subJob
to copy data from a user-defined local directory to DBFS.
It runs standalone and does not generate input or output
flow for the other components. It is often connected to the
Job using OnSubjobOk or OnComponentOk link, depending
on the context.

618
tDBSQLRow

tDBSQLRow
Acts on the actual DB structure or on the data (although without handling data) depending on
the nature of the query and the database. The SQLBuilder tool helps you write easily your SQL
statements.
tDBSQLRow is the generic component for database query. It executes the SQL query stated onto t
he specified database. The row suffix means the component implements a flow in the job design
although it does not provide output. For performance reasons, specific DB component should always
be preferred to the generic component.
To use this component, relevant DBMSs' ODBC drivers should be installed and the corresponding
ODBC connections should be configured via the database connection configuration wizard.

tDBSQLRow Standard properties


These properties are used to configure tDBSQLRow running in the Standard Job framework.
The Standard tDBSQLRow component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Datasource Name of the data source defined via the database


connection configuration wizard.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:

619
tDBSQLRow

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Table Name Name of the source table where changes made to data
should be captured.

Query type Either Built-in or Repository .

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the database
connection you are creating.

Note:
You can set the encoding parameters through this field.

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.

620
tDBSQLRow

Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.
Note that the relevant DBRow component should be
preferred according to your DBMSs. Most of the DBMSs
have their specific DBRow components.

Resetting a DB auto-increment
This scenario describes a single component Job which aims at re-initializing the DB auto-increment to
1. This job has no output and is generally to be used before running a script.

Warning:
As a prerequisite of this Job, the relevant DBMS's ODBC driver must have been installed and the
corresponding ODBC connection must have been configured.

621
tDBSQLRow

Procedure
Procedure
1. Drag and drop a tDBSQLRow component from the Palette to the design workspace.

2. Double-click tDBSQLRow to open its Basic settings view.

3. Select Repository in the Property Type list as the ODBC connection has been configured and
saved in the Repository. The follow-up fields gets filled in automatically.
For more information on storing DB connections in the Repository, see Talend Studio User Guide.
4. The Schema is built-in for this Job and it does not really matter in this example as the action is
made on the table auto-increment and not on data.
5. The Query type is also built-in. Click on the [...] button next to the Query statement box to launch
the SQLbuilder editor, or else type in directly in the statement box:
Alter table <TableName> auto_increment = 1
6. Press Ctrl+S to save the Job and F6 to run.
The database autoincrement is reset to 1.

622
tDenormalize

tDenormalize
Denormalizes the input flow based on one column.

tDenormalize Standard properties


These properties are used to configure tDenormalize running in the Standard Job framework.
The Standard tDenormalize component belongs to the Processing family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

To denormalize In this table, define the parameters used to denormalize


your columns.
Column: Select the column to denormalize.
Delimiter: Type in the separator you want to use to
denormalize your data between double quotes.
Merge same value: Select this check box to merge identical
values.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at component
level. Note that this check box is not available in the Map/
Reduce version of the component.

623
tDenormalize

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows read by an input component
or transferred to an output component. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as intermediate step in a data f


low.

Limitation Note that this component may change the order in the
incoming Java flow.

Denormalizing on one column


This scenario illustrates a Job denormalizing one column in a delimited file.

Denormalizing on one column


Procedure
1. Drop the following components: tFileInputDelimited, tDenormalize, tLogRow from the Palette to
the design workspace.
2. Connect the components using Row main connections.
3. On the tFileInputDelimited Component view, set the filepath to the file to be denormalized.

624
tDenormalize

4. Define the Header, Row Separator and Field Separator parameters.


5. The input file schema is made of two columns, Fathers and Children.

6. In the Basic settings of tDenormalize, define the column that contains multiple values to be
grouped.
7. In this use case, the column to denormalize is Children.

8. Set the Delimiter to separate the grouped values. Beware as only one column can be
denormalized.
9. Select the Merge same value check box, if you know that some values to be grouped are strictly
identical.
10. Save your Job and press F6 to execute it.

625
tDenormalize

Results

All values from the column Children (set as column to denormalize) are grouped by their Fathers
column. Values are separated by a comma.

Denormalizing on multiple columns


This scenario illustrates a Job denormalizing two columns from a delimited file.

Denormalizing on multiple columns


Procedure
1. Drop the following components: tFileInputDelimited, tDenormalize, tLogRow from the Palette to
the design workspace.
2. Connect all components using a Row main connection.
3. On the tFileInputDelimited Basic settings panel, set the filepath to the file to be denormalized.

4. Define the Row and Field separators, the Header and other information if required.
5. The file schema is made of four columns including: Name, FirstName, HomeTown, WorkTown.

626
tDenormalize

6. In the tDenormalize component Basic settings, select the columns that contain the repetition.
These are the column which are meant to occur multiple times in the document. In this use
case, FirstName, HomeCity and WorkCity are the columns against which the denormalization is
performed.
7. Add as many line to the table as you need using the plus button. Then select the relevant columns
in the drop-down list.

8. In the Delimiter column, define the separator between double quotes, to split concanated values.
For FirstName column, type in "#", for HomeCity, type in "§", ans for WorkCity, type in "¤".
9. Save your Job and press F6 to execute it.

The result shows the denormalized values concatenated using a comma.


10. Back to the tDenormalize components Basic settings, in the To denormalize table, select the
Merge same value check box to remove the duplicate occurrences.
11. Save your Job again and press F6 to execute it..

627
tDenormalize

Results

This time, the console shows the results with no duplicate instances.

628
tDenormalizeSortedRow

tDenormalizeSortedRow
Synthesizes sorted input flow to save memory.
tDenormalizeSortedRow combines in a group all input sorted rows. Distinct values of the
denormalized sorted row are joined with item separators.

tDenormalizeSortedRow Standard properties


These properties are used to configure tDenormalizeSortedRow running in the Standard Job
framework.
The Standard tDenormalizeSortedRow component belongs to the Processing family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component in the Job.

  Built-in: You create the schema and store it locally for the
relevant component. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
flowcharts. Related topic: see Talend Studio User Guide.

Input rows count Enter the number of input rows.

To denormalize Enter the name of the column to denormalize.

629
tDenormalizeSortedRow

Advanced settings

tStatCatcher Statistics Select this ckeck box to collect the log data at component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows read by an input component
or transferred to an output component. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component handles flows of data therefore it requires


input and output components.

Regrouping sorted rows


This Java scenario describes a four-component Job. It aims at reading a given delimited file row by
row, sorting input data by sort type and order, denormalizing all input sorted rows and displaying the
output on the Run log console.
• Drop the following components from the Palette onto the design workspace: tFileInputDelimited,
tSortRow, tDenormalizeSortedRow, and tLogRow.
• Connect the four components using Row Main links.

• In the design workspace, select tFileInputDelimited.


• Click the Component tab to define the basic settings for tFileInputDelimited.

630
tDenormalizeSortedRow

• Set Property Type to Built-In.


• Fill in a path to the processed file in the File Name field. The name_list file used in this example
holds two columns, id and first name.

• If needed, define row and field separators, header and footer, and the number of processed rows.
• Set Schema to Built in and click the three-dot button next to Edit Schema to define the data to
pass on to the next component. The schema in this example consists of two columns, id and name.

• In the design workspace, select tSortRow.


• Click the Component tab to define the basic settings for tSortRow.

631
tDenormalizeSortedRow

• Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the
tFileInputDelimited component.
• In the Criteria panel, use the plus button to add a line and set the sorting parameters for the
schema column to be processed. In this example we want to sort the id columns in ascending
order.
• In the design workspace, select tDenormalizeSortedRow.
• Click the Component tab to define the basic settings for tDenormalizeSortedRow.

• Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tSortRow
component.
• In the Input rows countfield, enter the number of the input rows to be processed or press
Ctrl+Space to access the context variable list and select the variable: tFileInputDeli
mited_1_NB_LINE.
• In the To denormalize panel, use the plus button to add a line and set the parameters to the
column to be denormalize. In this example we want to denormalize the name column.
• In the design workspace, select tLogRow and click the Component tab to define its basic settings.
For more information about tLogRow, see tLogRow on page 1977.
• Save your Job and press F6 to execute it.

632
tDenormalizeSortedRow

The result displayed on the console shows how the name column was denormalize.

633
tDie

tDie
Triggers the tLogCatcher component for exhaustive log before killing the Job.
Both tDie and tWarn components are closely related to the tLogCatcher component.They generally
make sense when used alongside a tLogCatcher in order for the log data collected to be encapsulated
and passed on to the output defined.
This component throws an error and kills the job. If you simply want to throw a warning, see the
tWarn documentation.

tDie Standard properties


These properties are used to configure tDie running in the Standard Job framework.
The Standard tDie component belongs to the Logs & Errors family.
The component in this framework is available in all Talend products.

Basic settings

Die message Enter the message to be displayed before the Job is killed.

Error code Enter the error code if need be, as an integer.

Note:
Note that any value greater than 255 can not be used as
an error code on Linux.

Priority Set the level of priority, as an integer

Global Variables

Global Variables DIE_MESSAGES: the die message. This is an After variable


and it returns a string.
DIE_CODE: the error code of the die message. This is an
After variable and it returns an integer.
DIE_PRIORITY: the priority level of the die message. This is
an After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

634
tDie

Usage

Usage rule This component cannot be used as a start component and it


is generally used with a tLogCatcher for the log purpose.

Related scenarios
For use cases in relation with tDie, see tLogCatcher scenarios:
• Catching messages triggered by a tWarn component on page 1971
• Catching the message triggered by a tDie component on page 1973

635
tDotNETInstantiate

tDotNETInstantiate
Invokes the constructor of a .NET object that is intended for later reuse.
tDotNETInstantiate instantiates an object in the .NET for later reuse.

tDotNETInstantiate Standard properties


These properties are used to configure tDotNETInstantiate running in the Standard Job framework.
The Standard tDotNETInstantiate component belongs to the DotNET family.
The component in this framework is available in all Talend products.

Basic settings

Dll to load Type in the path, or browse to the DLL library containing
the classe(es) of interest or enter the assembly's name
to be used. For example, System.Data, Version=2.0.0.0,
Culture=neutral, PublicKeyToken=b77a5c561934e089 for an
OleDb assembly.

 Fully qualified class name(i.e. ClassLibrary1. Enter a fully qualified name for the class of interest.
NameSpace2.Class1)

Value(s) to pass to the constructor Click the plus button to add one or more values to be
passed to the constructor for the object. Or, leave this table
empty to call a default constructor for the object.
The valid value(s) should be the parameters required by the
class to be used.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables INSTANCE: the instance of a .NET object. This is an After


variable and it returns an object.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

636
tDotNETInstantiate

Usage

Usage rule This component can be used as a start component in a flow


or an independent subJob.
To use this component, you must first install the runtime
DLLs, for example janet-win32.dll for Windows 32-bit
version and janet-win64.dll for Windows 64-bit version,
from the corresponding Microsoft Visual C++ Redistributabl
e Package. This allows you to avoid errors like the
UnsatisfiedLinkError on dependent DLL.
So ensure that the runtime and all of the other DLLs which
the DLL to be called depends on are installed and their
versions are consistent among one another.

Note: The required DLLs can be installed in the


System32 folder or in the bin folder of the Java runtime
to be used. If you need to export a Job using this
component to run it outside the Studio, you have to
specify the runtime container of interest by setting
the -Djava.library.path argument accordingly. For users
of Talend solutions with ESB, to run a Job using this
component in ESB Runtime, you need to copy the
runtime DLLs to the %KARAF_HOME%/lib/wrapper/
directory.

Related scenario
For a related scenario, see Utilizing .NET in Talend on page 643.

637
tDotNETRow

tDotNETRow
Facilitates data transform by utilizing custom or built-in .NET classes.
tDotNETRow sends data to and from libraries and classes within .NET or other custom DLL files.

tDotNETRow Standard properties


These properties are used to configure tDotNETRow running in the Standard Job framework.
The Standard tDotNETRow component belongs to the DotNET family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.

Built-in: No property data stored centrally.

  Repository: Select the Repository file where properties are


stored. The following fields are pre-filled in using fetched
data

Use a static method Select this check box to invoke a static method in .NET and
this will disable Use an existing instance check box.

Propagate a data to output Select this check box to propagate a transformed data to
output.

Use an existing instance Select this check box to reuse an existing instance of a .NET
object from the Existing instance to use list.
Existing instance to use: Select an existing instance of .NET
objects created by the other .NET components from the list.

Note: This check box will be disabled if you have


selected Use a static method and selecting this check
box will disable Dll to load, Fully qualified class
name(i.e. ClassLibrary1.NameSpace2.Class1) and
Value(s) to pass to the constructor.

Dll to load Type in the path, or browse to the DLL library containing
the class(es) of interest or enter the assembly's name
to be used. For example, System.Data, Version=2.0.0.0,
Culture=neutral, PublicKeyToken=b77a5c561934e089 for an
OleDb assembly.

 Fully qualified class name(i.e. ClassLibrary1. Enter a fully qualified name for the class of interest.
NameSpace2.Class1)

Method name Fill this field with the name of the method to be invoked
in .NET.

638
tDotNETRow

Value(s) to pass to the constructor Click the plus button to add one or more lines for values to
be passed to the constructor for the object. Or, leave this
table empty to call a default constructor for the object.
The valid value(s) should be the parameters required by the
class to be used.

Method Parameters Click the plus button to add one or more lines for
parameters to be passed to the method.

Output value target column Select a column in the output row from the list to put value
into it.

Advanced settings

Create a new instance at each row Select this check box to create a new instance at each row
that passes through the component.

Method doesn't return a value Select this check box to invoke a method without returning
a value as a result of the processing.

Returns an instance of a .NET Object Select this check box to return an instance of a .NET object
as a result of a invoked method.

Store the returned value for later use Select this check box to store the returned value of a
method for later reuse in another tDotNETRow component.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is utilized to integrate with .NET objects.


To use this component, you must first install the runtime
DLLs, for example janet-win32.dll for Windows 32-bit
version and janet-win64.dll for Windows 64-bit version,
from the corresponding Microsoft Visual C++ Redistributabl

639
tDotNETRow

e Package. This allows you to avoid errors like the


UnsatisfiedLinkError on dependent DLL.
So ensure that the runtime and all of the other DLLs which
the DLL to be called depends on are installed and their
versions are consistent among one another.

Note:
The required DLLs can be installed in the System32
folder or in the bin folder of the Java runtime to be used.
If you need to export a Job using this component to run
it outside the Studio, you have to specify the runtime
container of interest by setting the -Djava.library.path
argument accordingly. For users of Talend solutions
with ESB, to run a Job using this component in ESB
Runtime, you need to copy the runtime DLLs to the
%KARAF_HOME%/lib/wrapper/ directory.

Integrating .Net into Talend Studio: Introduction


This article describes the way to integrate .Net into Talend Studio, for example, invoking dll methods
in a Talend Studio Job.
Based on the runtime dlls (such as janet-win64.dll), Talend Studio provides the capability of
integrating .NET and Java, through which you can access C++ libraries and invoke their methods
easily in Java. Normally, for a Talend Studio user, this can be implemented in two ways: utilizing the
components in the DotNET family (that is, tDotNetInstantiate and tDotNetRow) in Talend Studio and
custom code. This article discusses the first method.
In a Talend Studio Job, the tDotNetInstantiate component can be used as a start component in a
flow or an independent subJob. It loads a system assembly or a custom dll by creating a .NET object.
The object can then be used by the subsequent tDotNetRow components for invoking the methods.
You need also to specify the class and set parameters of the constructor for a tDotNetInstantiate
component.
The tDotNetRow component references a .NET object created by a tDotNetInstantiate compon
ent. It can be used mid-flow, start the flow, or end the flow. You need to specify the method to be
invoked and set the parameters for the method. This component also passes the output of the method
to a specified column defined in the schema. So, you need to add columns in the schema of the
component and specify the column which the output values are passed to.

Note: For information about configuring the tDotNetInstantiate and tDotNetRow components, see
Talend Components Reference Guide.

This article shows the way to invoke dll methods in a Talend Studio Job, which uses the two DotNet
family components.

Integrating .Net into Talend Studio: Prerequisites


The prerequisites for invoking dll methods in a Talend Studio Job:
• Obtain the janet dll (that is, janet-win64>.dll): click here for .NET 3.5 or here for .NET 4.0.

640
tDotNETRow

• Place the file in a directory that the system variable Path points to (for example, %JAVA_HOME%
\bin, C:\Windows\System32, etc). You can also place it in another directory. In this case, you
need to add the directory as a library path using -Djava.library.path=path_to_direct
ory_containing_the_dll.
• The system assembly or the dll to integrate already exists.

Integrating .Net into Talend Studio: configuring the Job

Configuring tDotNetInstantiate

About this task


In the Basic settings of the tDotNetInstantiate component, take the following steps.

Procedure
1. Specifying the dll to load in the DDL to load field. The DLL can be a system assembly or a custom
DLL.

For system assemblies, you can specify the name of the desired system assembly (for example,
“System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken
=b77a5c561934e089”); for custom dlls, you need to provide the absolute path to the dll (for
example, "C:\\WINDOWS\\system32\\ClassLibrary1.dll)".

2. Specify the class name and the name space in the Fully qualified class name field
3. Set parameter values for the constructor in the Value(s) to pass to the constructor field.

Configuring tDotNetRow

About this task


The tDotNetRow component invokes methods of a .Net object created by a tDotNetInstantiate
component and passes the output (if any) to the next component. This component can also
create .Net objects, which can also be reused by subsequent components.
In the Basic settings of the tDotNetRow component, take the following steps.

641
tDotNETRow

Procedure
1. Add columns in the schema by clicking the Edit schema button or using the schema propagated
to this component. You need to specify one of the columns of the schema for holding the output
value (if any) using the Output value target column drop-down list.

2. Select Propagate data to output to pass the data from input to output.
3. Take either of the following two options.
• If you have deployed a tDotNetInstantiate component for creating the .Net object, select Use
an existing instance and select the component from the Existing instance to use drop-down
list to refer the corresponding .Net object.
• You can also create a new .Net object for use. To achieve this, make sure Use an existing
instance is not select, set DLL to load, Fully qualified class name, Method Name, and Value(s)
to pass to the constructor options as needed.
4. Provide the name of the method to invoke in the Method Name field.
5. Provide the parameter values for the method in rows of the Method Parameters filed. As
prompted, you can use input row values as parameter values (for example, input_row.colu
mn_name).

642
tDotNETRow

Note:
• For information about other options of this component, refer to Talend Components
Reference Guide.
• See Utilizing .NET in Talend section in Talend Components Reference Guide for an example of
this article.

Utilizing .NET in Talend


This scenario describes a three-component Job that uses a DLL library containing a class called
Test1.Class1 Class and invokes a method on it that processes the value and output the result onto the
console.

Prerequisites
Before replicating this scenario, you need first to build up your runtime environment.
• Create the DLL to be loaded by tDotNETInstantiate
This example class built into .NET reads as follows:
using System;
using System.Collections.Generic;
using System.Text;

namespace Test1
{
public class Class1
{
string s = null;
public Class1(string s)
{
this.s = s;
}

public string getValue()


{
return "Return Value from Class1: " + s;
}

643
tDotNETRow

}
This class reads the input value and adds the text Return Value from Class1: in front of this value. It
is compiled using the latest .NET.
• Install the runtime DLL from the latest .NET. In this scenario, we use janet-win32.dll on Windows
32-bit version and place it in the System32 folder.
Thus the runtime DLL is compatible with the DLL to be loaded.

Connecting components
Procedure
1. Drop the following components from the Palette to the design workspace: tDotNETInstantiate,
tDotNETRow and tLogRow.
2. Connect tDotNETInstantiate to tDotNETRow using a Trigger On Subjob OK connection.
3. Connect tDotNETRow to tLogRow using a Row Main connection.

Configuring tDotNETInstantiate
Procedure
1. Double-click tDotNETInstantiate to display its Basic settings view and define the component
properties.

2. Click the three-dot button next to the Dll to load field and browse to the DLL file to be loaded.
Alternatively, you can fill the field with an assembly. In this example, we use :
"C:/Program Files/ClassLibrary1/bin/Debug/ClassLibrary1.dll""
3. Fill the Fully qualified class name field with a valid class name to be used. In this example, we
use:
"Test1.Class1"
4. Click the plus button beneath the Value(s) to pass to the constructor table to add a new line for
the value to be passed to the constructor.
In this example, we use:
"Hello world"

644
tDotNETRow

Configuring tDotNETRow
Procedure
1. Double-click tDotNETRow to display its Basic settings view and define the component properties.

2. Select Propagate data to output check box.


3. Select Use an existing instance check box and select tDotNETInstantiate_1 from the Existing
instance to use list on the right.
4. Fill the Method Name field with a method name to be used. In this example, we use "getValue", a
custom method.
5. Click the three-dot button next to Edit schema to add one column to the schema.

Click the plus button beneath the table to add a new column to the schema and click OK to save
the setting.
6. Select newColumn from the Output value target column list.

Configuring tLogRow
Procedure
1. Double-click tLogRow to display its Basic settings view and define the component properties.

645
tDotNETRow

2. Click Sync columns button to retrieve the schema defined in the preceding component.
3. Select Table in the Mode area.

Results
Save your Job and press F6 to execute it.

From the result, you can read that the text Return Value from Class1 is added in front of the
retrieved value Hello world.

646
tDropboxConnection

tDropboxConnection
Creates a Dropbox connection to a given account that the other Dropbox components can reuse.

tDropboxConnection Standard properties


These properties are used to configure tDropboxConnection running in the Standard Job framework.
The Standard tDropboxConnection component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Access Token Enter the access token required by the Dropbox account you
need to connect to. This access token allows the Studio to
make Dropbox API calls for that Dropbox account.
Note that a Dropbox App should have been created under
that account before generating the access token. For further
information about a Dropbox access token, see https://
www.dropbox.com/developers/blog/94/generate-an-access-
token-for-your-own-account.

Use HTTP Proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is used standalone as a subJob to create


the Dropbox connection to be used. In a Job design, it is
often connected to the other Dropbox components using
the Trigger links such as On Subjob Ok link.

Related scenario
See Uploading files to Dropbox on page 655

647
tDropboxDelete

tDropboxDelete
Removes a given folder or file from Dropbox.

tDropboxDelete Standard properties


These properties are used to configure tDropboxDelete running in the Standard Job framework.
The Standard tDropboxDelete component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Use Existing Connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Token Enter the access token required by the Dropbox account you
need to connect to. This access token allows the Studio to
make Dropbox API calls for that Dropbox account.
Note that a Dropbox App should have been created under
that account before generating the access token. For further
information about a Dropbox access token, see https://
www.dropbox.com/developers/blog/94/generate-an-access-
token-for-your-own-account.

Use HTTP Proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.

Path Enter the path on Dropbox pointing to the folder or the file
you need to remove.
Note that the path string should start with a slash (/). It is
the root folder of the Dropbox App for which you are using
the current access token.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is usually used standalone in a subJob to


remove data from Dropbox.

648
tDropboxDelete

Related scenarios
No scenario is available for the Standard version of this component yet.

649
tDropboxGet

tDropboxGet
Downloads a selected file from a Dropbox account to a specified local directory.

tDropboxGet Standard properties


These properties are used to configure tDropboxGet running in the Standard Job framework.
The Standard tDropboxGet component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Use Existing Connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Token Enter the access token required by the Dropbox account you
need to connect to. This access token allows the Studio to
make Dropbox API calls for that Dropbox account.
Note that a Dropbox App should have been created under
that account before generating the access token. For further
information about a Dropbox access token, see https://
www.dropbox.com/developers/blog/94/generate-an-access-
token-for-your-own-account.

Use HTTP Proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.

Path Enter the path on Dropbox pointing to the file you need to
download.
Note that the path string should start with a slash (/). It is
the root folder of the Dropbox App for which you are using
the current access token.

Save As File Select this check box to display the File field and browse
to, or enter the local directory where you want to store the
downloaded file. The existing file, if any, is replaced.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component.
The schema of this component is read-only. You can click
the button next to Edit schema to view the predefined
schema that contains the following two columns:
• fileName: the name of the downloaded file.
• content: the content of the downloaded file.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

650
tDropboxGet

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component can be used alone or along with other
components via the Iterate link or a trigger link such as On
Subjob OK.

Related scenarios
No scenario is available for the Standard version of this component yet.

651
tDropboxList

tDropboxList
Lists the files stored in a specified directory on Dropbox.

tDropboxList Standard properties


These properties are used to configure tDropboxList running in the Standard Job framework.
The Standard tDropboxList component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Use Existing Connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Token Enter the access token required by the Dropbox account you
need to connect to. This access token allows the Studio to
make Dropbox API calls for that Dropbox account.
Note that a Dropbox App should have been created under
that account before generating the access token. For further
information about a Dropbox access token, see https://
www.dropbox.com/developers/blog/94/generate-an-access-
token-for-your-own-account.

Use HTTP Proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.

Path Enter the path pointing to the folder you need to list the
files from, or enter the path pointing to the exact file you
need to read.
Note that the path string should start with a slash (/). It is
the root folder of the Dropbox App for which you are using
the current access token.

List Type Select the type of data you need to list from the specified
path.

Include subdirectories Select this check box to list files from any existing sub-
folders in addition to the files in the directory defined in
the Path field.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

NAME The name of the remote file being processed. This is a Flow
variable and it returns a string.

652
tDropboxList

PATH The path to the folder or the file being processed on


Dropbox. This is a Flow variable and it returns a string.

LAST_MODIFIED The timestamp of the last modification of the file being


processed. This is a Flow variable and it returns a long.

SIZE The volume of the file being processed. This is a Flow


variable and it returns a long.

IS_FILE The boolean result of the file listing. This is a Flow variable
and it returns a boolean. The result Yes indicates that the
listed data is of the type File; otherwise, the type is Folder.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is typically used standalone.

Related scenarios
No scenario is available for the Standard version of this component yet.

653
tDropboxPut

tDropboxPut
Uploads data to Dropbox from either a local file or a given data flow.

tDropboxPut Standard properties


These properties are used to configure tDropboxPut running in the Standard Job framework.
The Standard tDropboxPut component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Use Existing Connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Token Enter the access token required by the Dropbox account you
need to connect to. This access token allows the Studio to
make Dropbox API calls for that Dropbox account.
Note that a Dropbox App should have been created under
that account before generating the access token. For further
information about a Dropbox access token, see https://
www.dropbox.com/developers/blog/94/generate-an-access-
token-for-your-own-account.

Use HTTP Proxy If you are using a proxy, select this check box and enter the
host and port information of that proxy in the corresponding
fields that are displayed.

Path (File Only) Enter the path pointing to the file you need to write
contents in. This file will be created on the fly if it does not
exist.
Note that the path string should start with a slash (/). It is
the root folder of the Dropbox App for which you are using
the current access token.

Upload Mode Select upload mode to be used:


• Rename if Existing: the uploaded file is automatically
renamed. For example, a file named test.txt might be
renamed to test (1).txt.
• Replace if Existing: the uploaded file replaces the
existing one.
• Update specified Revision: the file you are uploading
is used to update a specific revision of that file. If the
revision you specify is the latest revision, then the
existing file on Dropbox is replaced; if it is an older
revision, the file you are uploading is renamed to
indicate that a conflict is encountered; if the revision
does not exist, an error is returned.

Upload Incoming content as File Select this radio button to read data directly from the input
flow of the preceding component and write the data into
the file specified in the Path field.

654
tDropboxPut

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Note that the schema of this component is read-only with
a single column named content and it receives data from
the content column of its input schema only. This means
that you must use a content column in the input data flow
to carry the data to be uploaded. This type of column is typ
ically provided by the tFileInputRaw component. For further
information, see tFileInputRaw on page 1085.
The Schema field is not available when you have selected
the Expose as OutputStream or the Upload local file radio
button.

Upload local file Select this radio button to upload a locally stored file to Dro
pbox. In the File field that is displayed, you need to enter
the path or browse to this file.

Expose as OutputStream Select this check box to expose the output stream of this
component as a variable named OUTPUTSTREAM so that
the other components can reuse this variable to write the
contents to be uploaded into the exposed output stream.
For example, you can use the Use output stream feature
of the tFileOutputDelimited component to feed a given
tDropboxPut's exposed output stream. For further
information, see tFileOutputDelimited on page 1113.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is used either standalone in a subJob


to directly upload a local file to Dropbox or as an end
component of a Job flow to upload given data being
handled in this flow.

Uploading files to Dropbox


In this scenario, a six-component Job consisting of three subJobs is created to write data onto
Dropbox using different upload modes.

655
tDropboxPut

Before replicating this scenario, you need to create a Dropbox App under the Dropbox account to be
used. In this scenario, the Dropbox App to be used is named to talenddrop and thus the root folder
in which files are uploaded is talenddrop, too. In addition, the access token to this folder has been
generated from the App console provided by Dropbox.
For further information about a Dropbox App, see https://www.dropbox.com/developers/apps/.

Linking the components


Procedure
1. In the Integration perspective of the Studio, create an empty Job from the Job Designs node in
the Repository tree view.
For further information about how to create a Job, see Talend Studio User Guide.
2. In the workspace, enter the name of the component to be used and select this component from
the list that appears. In this scenario, the components are tDropboxConnection, tFixedFlowInput,
tFileOutputDelimited, tFileInputRaw and two tDropboxPut components.
The tFixedFlowInput component generates some data to be uploaded to Dropbox in this scenario.
In the real-world case, you can use other components such as tMysqlInput or tMap in the place of
tFixedFlowInut to design a sophisticated process to prepare your data to be handled.
3. Connect tFixedFlowInput to tFileOutputDelimited using the Row > Main link.
4. Do the same to connect tFileOutputDelimited to one of the two tDropboxPut components and
connect tFileInputRaw to the other tDropboxPut component.
5. Connect tDropboxConnection to tFixedFlowInput using the Trigger > On Subjob Ok link. Then
connect tFixedFlowInput to tFileInputRaw using the same type of link.

Connecting to Dropbox
Procedure
1. Double-click tDropboxConnection to open its Component view.

656
tDropboxPut

2. In the Access token field, paste the token that you have generated via the App console of Dropbox
for accessing the Dropbox App folder to be used.

Generating the output stream


Defining the input data

Procedure
1. Double-click tFixedFlowInput to open its Component view.

In this scenario, only three rows of sample data are created to indicate three countries and their
calling codes.

33;France
86;China
81;Japan

2. Click the [...] button next to Edit schema to open the schema editor.
3. Click the [+] button twice to add two rows and in the Column column, rename them to code and
country, respectively.

657
tDropboxPut

4. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog
box.
5. In the Mode area, select the Use Inline Table radio button. The code and the country column have
been automatically created in this table.
6. Enter the sample data mentioned above in this table.

Defining the output stream

Procedure
1. Double-click tFileOutputDelimited to open its Component view.

2. Select the Use output stream check box to write the data to be outputted into a given output
stream.
3. In the Output stream field, enter the code to define the output stream you need to write data
in. In this scenario, it is the output stream of the tDropboxPut_1 component linked with the
current component. Thus the code used to write the data reads as follows:((java.io.Outp
utStream)globalMap.get("tDropboxPut_1_OUTPUTSTREAM"))
Note that in this example code, the tDropboxPut component has the number 1 as its affix, w
hich represents its component ID distributed automatically within this Job. If the tDropboxPut
component you are using has a different ID, you need to adapt the code to that ID number.
4. Click Edit schema to verify that the schema of this component is identical with that of the
preceding tFixedFlowInput component. If not so, click the Sync columns button to make both of
the schemas identical.
5. Navigate to the Advanced settings tab.

658
tDropboxPut

6. Mark the Custom the flush buffer size check box. This automatically adds 1 in the Row number
field.

Exposing the tDropboxPut output stream


Procedure
1. Double-click the tDropboxPut component linked with tFileOutputDelimited to open its
Component view.

2. Select the Use existing connection check box to reuse the connection created by tDropboxConnec
tion.
3. In the Path field, enter the path pointing to the file you need to write data in, with a slash (/) at
the beginning of the path. For example, enter /calling_code.csv.
4. In the Upload mode area, select the Rename if Existing radio button.
5. Select the Expose As OutputStream radio button to expose the output stream of this component
so that the other component, tFileOutputDelimited in this scenario, can write data in the stream.

Defining the media data to be uploaded


Procedure
1. Double-click tFileInputRaw to open its Component view.

This component is used to read a picture named esb_architecture.png into the data flow. In the
real-world practice, this file can be of many other formats, such as pdf, xls, ppt or mp3.

659
tDropboxPut

2. In the Filename field, enter the path or browse to the file you need to upload.
3. In the Mode area, select the Read the file as a bytes array radio button.

Uploading the incoming contents


Procedure
1. Double-click the tDropboxPut component linked with tFileInputRaw to open its Component view.

2. Select the Use existing connection check box to reuse the connection created by tDropboxConnec
tion.
3. In the Path field, enter the path pointing to the file you need to write data in, with a slash (/) at
the beginning of the path. For example, enter /architecture.png.
4. In the Upload mode area, select Rename if existing.
5. Select the Upload incoming content as file radio button. This displays the Edit schema button to
allow you to view the read-only schema of this component.

Executing the Job


Then you can press F6 to run this Job.
Once done, check the uploaded files in the Dropbox App folder of your Dropbox, in this scenario, the
talenddrop folder.

660
tDTDValidator

tDTDValidator
Helps at controlling data and structure quality of the file to be processed
Validates the XML input file against a DTD file and sends the validation log to the defined output.

tDTDValidator Standard properties


These properties are used to configure tDTDValidator running in the Standard Job framework.
The Standard tDTDValidator component belongs to the XML family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component.
The schema of this component is read-only. It contains
standard information regarding the file validation.

DTD file Filepath to the reference DTD file.

XML file Filepath to the XML file to be validated.

If XML is valid, display If XML is invalid, display Type in a message to be displayed in the Run console based
on the result of the comparison.

Print to console Select this check box to display the validation message.

Advanced settings

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
DIFFERENCE: the result of the validation. This is a Flow
variable and it returns a string.
VALID: the validation result. This is a Flow variable and it
returns a boolean.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

661
tDTDValidator

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule This component can be used as standalone component but


it is usually linked to an output component to gather the log
data.

Validating XML files


This scenario describes a Job that validates the specified type of files from a folder, displays the
validation result on the Run tab console, and outputs the log information for the invalid files into a
delimited file.

Validating XML files


Procedure
1. Drop the following components from the Palette to the design workspace: tFileList,
tDTDValidator, tMap, tFileOutputDelimited.
2. Connect the tFileList to the tDTDValidator with an Iterate link and the remaining component
using a main row.
3. Set the tFileList component properties, to fetch an XML file from a folder.

Click the plus button to add a filemask line and enter the filemask: *.xml. Remember Java code
requires double quotes.
Set the path of the XML files to be verified.
Select No from the Case Sensitive drop-down list.

662
tDTDValidator

4. In the tDTDValidate Component view, the schema is read-only as it contains standard log
information related to the validation process.

In the Dtd file field, browse to the DTD file to be used as reference.
5. Click in the XML file field, press Ctrl+Space bar to access the variable list, and double-click the
current filepath global variable: tFileList.CURRENT_FILEPATH.
6. In the various messages to display in the Run tab console, use the jobName variable to recall
the job name tag. Recall the filename using the relevant global variable: ((String)globa
lMap.get("tFileList_1_CURRENT_FILE")). Remember Java code requires double quotes.
Select the Print to Console check box.
7. In the tMap component, drag and drop the information data from the standard schema that you
want to pass on to the output file.

8. Once the Output schema is defined as required, add a filter condition to only select the log
information data when the XML file is invalid.
Follow the best practice by typing first the wanted value for the variable, then the operator based
on the type of data filtered then the variable that should meet the requirement. In this case: 0 ==
row1.validate.
9. Then connect (if not already done) the tMap to the tFileOutputDelimited component using a Row
> Main connection. Name it as relevant, in this example: log_errorsOnly.
10. In the tFileOutputDelimited Basic settings, define the destination filepath, the field delimiters and
the encoding.
11. Save your Job and press F6 to run it.

663
tDTDValidator

On the Run console the messages defined display for each of the files. At the same time the
output file is filled with the log data for invalid files.

664
tDynamoDBInput

tDynamoDBInput
Retrieves data from an Amazon DynamoDB table and sends them to the component that follows for
transformation.

tDynamoDBInput Standard properties


These properties are used to configure tDynamoDBInput running in the Standard Job framework.
The Standard tDynamoDBInput component belongs to the Big Data and the Databases NoSQL
families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Access Key Enter the access key ID that uniquely identifies an AWS
Account. For further information about how to get your
Access Key and Secret Key, see Getting Your AWS Access
Keys.

Secret Key Enter the secret access key, constituting the security
credentials in combination with the access Key.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. These credentials can be used on Amazon
EC2 instances, and are delivered through the Amazon
EC2 metadata service. To use this option, your Job must
be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.

Assume role If you temporarily need some access permissions associated


to an AWS IAM role that is not granted to your user account,
select this check box to assume that role. Then specify
the values for the following parameters to create a new
assumed role session.

Use End Point Select this check box and in the Server Url field displayed,
specify the Web service URL of the DynamoDB database
service.

Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks
(e.g. "us-east-1") in the list. For more information about the
AWS Region, see Regions and Endpoints.

Action Select the operation to be performed from the drop-down


list, either Query or Scan. For more information, see Query
and Scan Operations in DynamoDB.

665
tDynamoDBInput

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.

  Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
If a column stores JSON documents, select JSON from the
DB Type drop-down list.

Table Name Specify the name of the table to be queried or scanned.

Use advanced key condition expression Select this check box and in the Advanced key condition
expression field displayed, specify the key condition
expressions used to determine the items to be read from the
table or index.

Key condition expression Specify the key condition expressions used to determine the
items to be read. Click the [+] button to add as many rows
as needed, each row for a key condition expression, and set
the following attributes for each expression:
• Key Column: Enter the name of the key column.
• Function: Select the function for the key condition
expression.
• Value1: Specify the value used in the key condition
expression.
• Value2: Specify the second value used in the key
condition expression if needed, depending on the
function you selected.
Note that only the items that meet all the key conditions
defined in this table can be returned.
This table is not available when the Use advanced key
condition expression check box is selected.

Use filter expression Select this check box to use the filter expression for the
query or scan operation.

Use advanced filter expression Select this check box and in the Advanced filter expression
field displayed, specify the filter expressions used to refine
the data after it is queried or scanned and before it is
returned to you.

666
tDynamoDBInput

This check box is available when the Use filter expression


check box is selected.

Filter expression Specify the filter expressions used to refine the results
returned to you. Click the [+] button to add as many rows
as needed, each row for a filter expression, and set the fol
lowing attributes for each expression:
• Column: Enter the name of the column used to refine
the results.
• Function: Select the function for the filter expression.
• Value1: Specify the value used in the filter expression.
• Value2: Specify the second value used in the filter
expression if needed, depending on the function you
selected.
Note that only the items that meet all the filter conditions
defined in this table can be returned.
This table is available when the Use filter expression check
box is selected and the Use advanced filter expression
check box is cleared.

Value mapping Specify the placeholders for the expression attribute values.
• value: Enter the expression attribute value.
• placeholder: Specify the placeholder for the
corresponding value.
For more information, see Expression Attribute Values.

Name mapping Specify the placeholders for the attribute names that
conflict with the DynamoDB reserved words.
• name: Enter the name of the attribute that conflicts
with a DynamoDB reserved word.
• placeholder: Specify the placeholder for the
corresonding attribute name.
For more information, see Expression Attribute Names.

Advanced settings

STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,
sts.amazonaws.com, where session credentials are
retrieved from.
This check box is available only when the Assume role
check box is selected.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the

667
tDynamoDBInput

Die on error check box is cleared, if the component has this


check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is usually used as a start component of a


Job or subJob and it always needs an output link.

Writing and extracting JSON documents from DynamoDB


Use tDynamoDBOutput to write a JSON document to a DynamoDB table and then use
tDynamoDBInput to extract a child element of this JSON element.
Prerequisites:
• A Talend Studio with Big Data
• Your AWS credentials that have been granted the access to your Amazon DynamoDB.
The sample data to be used reads like this:

21058;{"accountId" : "900" , "accountName" : "xxxxx" , "action" : "Create",


"customerOrderNumber" : { "deliveryCode" : "261" , "deliveryId" : "313"}}
21059;{"accountId" : "901" , "accountName" : "xxxxy" , "action" : "Delete",
"customerOrderNumber" : { "deliveryCode" : "262" , "deliveryId" : "314"}}

This data has two columns: DeliverID and EventPayLoad, seperated by a semicolon (;). The JSON
document itself is stored in the EventPayLoad column.

668
tDynamoDBInput

Designing the data flow around the DynamoDB components


Drop tFixedflowInput, tDynamoDBOutput, tDynamoDBInput and tLogRow on the design space of your
Studio to create the Job.

Procedure
1. In the Integration perspective of the Studio, create an empty Standard Job from the Job Designs
node in the Repository tree view.
2. In the workspace, enter the name of the component to be used and select this component from
the list that appears. In this scenario, the components are tFixedflowInput, tDynamoDBOutput,
tDynamoDBInput and tLogRow.
The tFixedFlowInput component is used to load the sample data into the data flow. In the real-
world practice, use the input component specific to the data format or the source system to be
used instead of tFixedFlowInput.
3. Connect tFixedFlowInput to tDynamoDBOutput and connect tDynamoDBInput to tLogRow using
the Row > Main link.
4. Connect tFixedFlowInput to tDynamoDBInput using the Trigger > On Subjob Ok link.

Writing the sample JSON documents to DynamoDB


Configure tFixedFlowInput to load the sample data in the data flow and configure tDynamoDBOutput
to write this data in a DynamoDB table.

About this task

Procedure
1. Double-click tFixedFlowInput in its Component view.

Example

2. Click the ... button next to Edit schema to open the schema editor.

669
tDynamoDBInput

Example

3. Click the + button twice to add two rows, each representing a column of the sample data, and in
the Column column, name these columns to DeliveryId and EventPayload, respectively.
4. On the row for the DeliveryId column, select the check box in the Key column to use
this DeliveryID column as the partition key column of the DynamoDB table to be used. A
DynamoDB table requires a partition key column.
5. Click OK to validate these changes and once prompted, accept the propagation of the schema to
the connected component, tDynamoDBOutput.
6. In the Mode area, select the Use Inline content radio box and enter the sample data in the field
that is displayed:

Example

21058;{"accountId" : "900" , "accountName" : "xxxxx" , "action" : "Create",


"customerOrderNumber" : { "deliveryCode" : "261" , "deliveryId" : "313"}}
21059;{"accountId" : "901" , "accountName" : "xxxxy" , "action" : "Delete",
"customerOrderNumber" : { "deliveryCode" : "262" , "deliveryId" : "314"}}

7. Double-click tDynamoDBOutput to open its Component view.

670
tDynamoDBInput

Example

8. Click the ... button next to Edit schema to open the schema editor. This component should have
retrieved the schema from tFixedFlowInput.

Example

9. In the DB Type column, select JSON for the EventPayload column, as this is the column in
which the JSON documents are stored.
10. In the Access key and Secret key fields, enter the credentials of the AWS account to be used to
access your DynamoDB database.
11. From the Region drop-down list, select the AWS region to be used. If you do not know which
region to select, ask the administrator of your AWS system for more information.
12. From the Action on table drop-down list, select Drop table is exists and create.
13. From the Action on data drop-down list, select Insert.
14. In the Table name field, enter the name to be used for the DynamoDB table to be created.

671
tDynamoDBInput

15. In the Partition Key field, enter the name of the column to be used to provide parition keys. In this
example, it is DeliveryId.

Extracting a JSON document using advanced filters


Configure tDynamoDBInput to use an advanced filter to read a JSON document from DynamoDB and
use tLogRow to output this document in the console of the Studio.

About this task

Procedure
1. Double-click tDynamoDBInput to open its Component view.

Example

2. Click the ... button next to Edit schema to open the schema editor.

672
tDynamoDBInput

Example

3. Click the + button twice to add two rows, each representing a column of the sample data, and in
the Column column, name these columns to DeliveryId and EventPayload, respectively.
4. On the row for the DeliveryId column, select the check box in the Key column to use
this DeliveryID column as the partition key column of the DynamoDB table to be used. A
DynamoDB table requires a partition key column.
5. In the DB Type column, select JSON for the EventPayload column, as this is the column in
which the JSON documents are stored.
6. In the Access key and Secret key fields, enter the credentials of the AWS account to be used to
access your DynamoDB database.
7. From the Region drop-down list, select the same region as you selected in the previous steps for
tDynamoDBOutput.
8. From the Action drop-down list, select Scan.
9. In the Table Name field, enter the name of the DynamoDB table to be created by
tDynamoDBOutput.
10. Select the Use filter expression check box and then the Use advanced filter expression check box.
11. In the Advanced filter expression field, enter the filter to be used to select JSON documents.

Example

"EventPayload.customerOrderNumber.deliveryCode = :value"

The part on the left of the equals sign reflects the structure within a JSON document of the samp
le data, in the EventPayload column. The purpose is to use the value of deliveryCode
element to filter the document to be read.
You need to define the :value placeholder in the Value mapping table.
12. Under the Value mapping table, click the + button to add one row and do the following:
a) In the value column, enter the value of the JSON element to be used as a filter.

Example
In this example, this element is deliveryCode and you need to extract the JSON document
in which the value of the deliveryCode element is 261. As this value is a string, enter 261
within double quotation marks.

673
tDynamoDBInput

If this value is an integer, do not use any quotation marks.


b) In the Placeholder column, enter the name of the placeholder to be defined, without any
quoation marks. In this example, it is :value, as you have put in the Advanced filter
expression.
A placeholder name must start with a colon (:).
13. Double-click tLogRow to open its Component view and select the Table radio box to display the
extracted data in a table in the console of the Studio.
14. Press Ctrl+S to save the Job and press F6 to run it.

Results
Once done, the retrieved JSON document is displayed in the console of the Run view of the Studio.

In the created DynamoDB table, you can see the both of the sample JSON documents.

674
tDynamoDBOutput

tDynamoDBOutput
Creates, updates or deletes data in an Amazon DynamoDB table.

tDynamoDBOutput Standard properties


These properties are used to configure tDynamoDBOutput running in the Standard Job framework.
The Standard tDynamoDBOutput component belongs to the Big Data and the Databases NoSQL
families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Access Key Enter the access key ID that uniquely identifies an AWS
Account. For further information about how to get your
Access Key and Secret Key, see Getting Your AWS Access
Keys.

Secret Key Enter the secret access key, constituting the security
credentials in combination with the access Key.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Inherit credentials from AWS role Select this check box to leverage the instance profile
credentials. These credentials can be used on Amazon
EC2 instances, and are delivered through the Amazon
EC2 metadata service. To use this option, your Job must
be running within Amazon EC2 or other services that
can leverage IAM Roles for access to resources. For more
information, see Using an IAM Role to Grant Permissions to
Applications Running on Amazon EC2 Instances.

Assume role If you temporarily need some access permissions associated


to an AWS IAM role that is not granted to your user account,
select this check box to assume that role. Then specify
the values for the following parameters to create a new
assumed role session.

Use End Point Select this check box and in the Server Url field displayed,
specify the Web service URL of the DynamoDB database
service.

Region Specify the AWS region by selecting a region name from the
list or entering a region between double quotation marks
(e.g. "us-east-1") in the list. For more information about the
AWS Region, see Regions and Endpoints.

Action on table Select an operation to be performed on the table defined.


• Default: No operation is carried out.
• Drop and create table: The table is removed and
created again.

675
tDynamoDBOutput

• Create table: The table does not exist and gets created.
• Create table if does not exist: The table is created if it
does not exist.
• Drop table if exist and create: The table is removed if it
already exists and created again.

Action on data On the data of the table defined, you can perform one of the
following operations:
• Insert: Insert new items from the input flow.
• Update: Update existing items according to the input
flow.
• Delete: Remove existing items according to the input
flow.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.

  Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
If a column stores JSON documents, select JSON from the
DB Type drop-down list.

Table Name Specify the name of table to be written.

Partition Key Specify the partition key of the specified table.

Sort Key Specify the sorted key of the specified table.

Advanced settings

STS Endpoint Select this check box and in the field displayed, specify
the AWS Security Token Service endpoint, for example,
sts.amazonaws.com, where session credentials are
retrieved from.
This check box is available only when the Assume role
check box is selected.

676
tDynamoDBOutput

Read Capacity Unit Specify the number of read capacity units. For more
information, see Amazon DynamoDB Provisioned
Throughput.

Write Capacity Unit Specify the number of write capacity units. For more
information, see Amazon DynamoDB Provisioned
Throughput.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is usually used as an end component of a


Job or subJob and it always needs an input link.

Related scenarios
No scenario is available for the Standard version of this component yet.

677
tEDIFACTtoXML

tEDIFACTtoXML
Transforms an EDIFACT message file into the XML format for better readability to users and
compatibility with processing tools.
This component reads a United Nations/Electronic Data Interchange For Administration, Commerce
and Transport (UN/EDIFACT) message and transforms it into the XML format according to the
EDIFACT version and the EDIFACT family.

tEDIFACTtoXML Standard properties


These properties are used to configure tEDIFACTtoXML running in the Standard Job framework.
The Standard tEDIFACTtoXML component belongs to the XML family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component.
The schema of this component is fixed and read-only, with
only one column: document.

EDI filename Filepath to the EDIFACT message file to be transformed.

EDI version Select the EDIFACT version of the input file.

Ignore new line Select this check box to skip carriage returns in the input
file.

Die on error Select this check box to stop Job execution when an error
is encountered. By default, this check box is cleared, and
therefore illegal rows are skipped and the process is
completed for the error free rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

678
tEDIFACTtoXML

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is usually linked to an output component to


gather the transformation result.

Reading an EDIFACT message file and saving it to XML


This scenario describes a simple Job that reads a UN/EDIFACT Customs Cargo (CUSCAR) message file
and saves it as an XML file.

Adding and linking the components


Procedure
1. Drop the tEDIFACTtoXML component and the tFileOutputXML component from the Palette to the
design workspace.
2. Connect the tEDIFACTtoXML component and the tFileOutputXML component using a Row > Main
connection.

Results

Configuring the components


Procedure
1. Double-click the tEDIFACTtoXML component to show its Basic settings view.

2. Fill the EDI filename field with the full path to the input EDIFACT message file.
In this use case, the input file is 99a_cuscar.edi.

679
tEDIFACTtoXML

3. From EDI version list, select the EDIFACT version of the input file, D99A in this use case.
4. Select the Ignore new line check box to skip the carriage return characters in the input file during
the transformation.
5. Leave the other parameters as they are.
6. Double-click the tFileOutputXML component to show its Basic settings view.

7. Fill the File Name field with the full path to the output XML file you want to generate.
In this use case, the output XML is 99a_cuscar.xml.
8. Leave the other parameters as they are.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save the Job.
2. Press F6 to run the Job.

Results
The input EDIFACT CUSCAR message file is transformed into the XML format and the output XML file
is generated as defined.

680
tEDIFACTtoXML

681
tELTGreenplumInput

tELTGreenplumInput
Adds as many Input tables as required for the most complicated Insert statement.
The three ELT Greenplum components are closely related, in terms of their operating conditions.
These components should be used to handle Greenplum DB schemas to generate Insert statements,
including clauses, which are to be executed in the DB output table defined.
Provides the table schema to be used for the SQL statement to execute.

tELTGreenplumInput Standard properties


These properties are used to configure tELTGreenplumInput running in the Standard Job framework.
The Standard tELTGreenplumInput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Default Table Name Type in the default table name.

Default Schema Name Type in the default schema name.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

682
tELTGreenplumInput

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTGreenplumInput is to be used along with the


tELTGreenplumMap. Note that the Output link to be used
with these components must correspond strictly to the
syntax of the table name

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Related scenarios
• Mapping data using a simple implicit join on page 686
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

683
tELTGreenplumMap

tELTGreenplumMap
Uses the tables provided as input to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
The three ELT Greenplum components are closely related, in terms of their operating conditions.
These components should be used to handle Greenplum DB schemas to generate Insert statements,
including clauses, which are to be executed in the DB output table defined.
Helps you to build the SQL statement graphically, using the table provided as input.

tELTGreenplumMap Standard properties


These properties are used to configure tELTGreenplumMap running in the Standard Job framework.
The Standard tELTGreenplumMap component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

ELT Greenplum Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.

Style link Select the way in which links are displayed.


Auto: By default, the links between the input and output
schemas and the Web service parameters are in the form of
curves.
Bezier curve: Links between the schema and the Web
service parameters are in the form of curve.
Line: Links between the schema and the Web service
parameters are in the form of straight lines.
This option slightly optimizes performance.

684
tELTGreenplumMap

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the Repository file where Properties are


stored. The following fields are pre-filled in using fetched
data.

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTGreenplumMap is used along with tELTGreenplumInput


and tELTGreenplumOutput. Note that the Output link to be
used with these components must correspond strictly to the
syntax of the table name.

685
tELTGreenplumMap

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Mapping data using a simple implicit join


In this scenario, a tELTGreenplumMap component is deployed to retrieve the data from the source
table employee_by_statecode, compares its statecode column against the table statecode, and then maps
the desired columns from the two tables to the output table employee_by_state.
Before the Job execution, the three tables, employee_by_statecode, statecode and employee_by_state
look like:

686
tELTGreenplumMap

Dropping components

Procedure
1. Add the following components from the Palette to the workspace:
• tGreenplumConnection
• two tELTGreenplumInput
• tELTGreenplumMap
• tELTGreenplumOutput
• tGreenplumCommit
• tGreenplumInput
• tLogRow
2. Rename the following components:
• tGreenplumConnection to connect_to_greenplum_host
• two tELTGreenplumInput to employee+statecode and statecode
• tELTGreenplumMap to match+map
• tELTGreenplumOutput to map_data_output
• tGreenplumCommit to commit_to_host
• tGreenplumInput to read_map_output_table
• tLogRow to show_map_data
3. Connect the components in the Job:
• link tGreenplumConnection to tELTGreenplumMap using an OnSubjobOk trigger
• link tELTGreenplumMap to tGreenplumCommit using an OnSubjobOk trigger
• link tGreenplumCommit to tGreenplumInput using an OnSubjobOk trigger
• link tGreenplumInput to tLogRow using a Row > Main connection
The two tELTGreenplumInput components and tELTGreenplumOutput will be linked to
tELTGreenplumMap later once the relevant tables have been defined.

687
tELTGreenplumMap

Configuring the components


Procedure
1. Double-click tGreenplumConnection to open its Basic settings view in the Component tab.

a) In the Host and Port fields, enter the context variables for the Greenplum server.
b) In the Database field, enter the context variable for the Greenplum database.
c) In the Username and Password fields, enter the context variables for the authentication
credentials.
For more information on context variables, see Talend Studio User Guide.
2. Double-click employee+statecode to open its Basic settings view in the Component tab.

a) In the Default table name field, enter the name of the source table, namely employee_by_st
atecode.
b) Click the [...] button next to the Edit schema field to open the schema editor.

c) Click the [+] button to add three columns, namely id, name and statecode, with the data type as
INT4, VARCHAR, and INT4 respectively.
d) Click OK to close the schema editor.

688
tELTGreenplumMap

e) Link employee+statecode to tELTGreenplumMap using the output employee_by_statecode.


3. Double-click statecode to open its Basic settings view in the Component tab.

a) In the Default table name field, enter the name of the lookup table, namely statecode.
4. Click the [...] button next to the Edit schema field to open the schema editor.

a) Click the [+] button to add two columns, namely state and statecode, with the data type as
VARCHAR and INT4 respectively.
b) Click OK to close the schema editor.
c) Link statecode to tELTGreenplumMap using the output statecode.
5. Click tELTGreenplumMap to open its Basic settings view in the Component tab.

a) Select the Use an existing connection check box.


6. Click the [...] button next to the ELT Greenplum Map Editor field to open the map editor.

689
tELTGreenplumMap

7. Click the [+] button on the upper left corner to open the table selection box.

a) Select tables employee_by_statecode and statecode in sequence and click Ok. The tables appear
on the left panel of the editor.
8. On the upper right corner, click the [+] button to add an output table, namely employee_by_state.
a) Click Ok to close the map editor.
9. Double-click tELTGreenplumOutput to open its Basic settings view in the Component tab.

690
tELTGreenplumMap

a) In the Default table name field, enter the name of the output table, namely employee_by_state.
10. Click the [...] button next to the Edit schema field to open the schema editor.

a) Click the [+] button to add three columns, namely id, name and state, with the data type as
INT4, VARCHAR, and VARCHAR respectively.
b) Click OK to close the schema editor.
c) Link tELTGreenplumMap to tELTGreenplumOutput using the table output employee_by_state.
d) Click OK on the pop-up window below to retrieve the schema of tELTGreenplumOutput.

ow the map editor's output table employee_by_state shares the same schema as that of
tELTGreenplumOutput.
11. Double-click tELTGreenplumMap to open the map editor.
D.
Drop the columns id and name from table employee_by_statecode as well as the column statecode
from table statecode to their counterparts in the output table employee_by_state.
Click Ok to close the map editor.

691
tELTGreenplumMap

a) Drop the column statecode from table employee_by_statecode to its counterpart of the table
statecode, looking for the records in the two tables that have the same statecode values.
12. Double-click tGreenplumInput to open its Basic settings view in the Component tab.

a) Select the Use an existing connection check box.


b) In the Table name field, enter the name of the source table, namely employee_by_state.
c) In the Query field, enter the query statement, namely "SELECT * FROM \"employee_by_
state\"".
13. Double-click tLogRow to open its Basic settings view in the Component tab.

a) In the Mode area, select Table (print values in cells of a table for a better display.

Executing the Job


Procedure
1. Press Ctrl+S to save the Job.
2. Press F6 to run the Job.

As shown above, the desired employee records have been written to the table employee_by_state,
presenting clearer geographical information about the employees.

692
tELTGreenplumMap

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725

693
tELTGreenplumOutput

tELTGreenplumOutput
Executes the SQL Insert, Update and Delete statement to the Greenplum database
The three ELT Greenplum components are closely related, in terms of their operating conditions.
These components should be used to handle Greenplum DB schemas to generate Insert statements,
including clauses, which are to be executed in the DB output table defined.
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.

tELTGreenplumOutput Standard properties


These properties are used to configure tELTGreenplumOutput running in the Standard Job framework.
The Standard tELTGreenplumOutput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Action on data On the data of the table defined, you can perform the
following operation:
Insert: Adds new entries to the table.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

694
tELTGreenplumOutput

Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.

Default Table Name Enter the default table name, between double quotation
marks.

Default Schema Name Enter the default schema name,between double quotation
marks.

Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.

Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field which appears.

Mapping Specify the metadata mapping file for the database to


be used. The metadata mapping file is used for the data
type conversion between database and Java. For more
information about the metadata mapping, see the related
documentation for Type mapping.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTGreenplumOutput is to be used along with the


tELTGreenplumMap. Note that the Output link to be used
with these components must correspond strictly to the
syntax of the table name.

695
tELTGreenplumOutput

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Related scenarios
• Mapping data using a simple implicit join on page 686
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

696
tELTHiveInput

tELTHiveInput
Replicates the schema, which the tELTHiveMap component that follows will use, of the input Hive
table.
The three ELT Hive components are closely related, in terms of their operating conditions. These
components should be used to handle Hive DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
This component provides, for the tELTHiveMap component that follows, the input schema of the Hive
table to be used.

tELTHiveInput Standard properties


These properties are used to configure tELTHiveInput running in the Standard Job framework.
The Standard tELTHiveInput component belongs to the ELT family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Schema A schema is a row description. It defines the number


of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.

  Built-In: You create and store the schema locally for this
component only.
Repository: You have already created the schema and stored
it in the Repository. You can reuse it in various projects and
Job designs.

Edit schema Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Default table name Enter the name of the input table to be used.

Default schema name Enter the name of the database schema to which the input
table to be used is related.

697
tELTHiveInput

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTHiveMap is used along with a tELTHiveInput and


tELTHiveOutput. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name.
If the Studio used to connect to a Hive database is operated
on Windows, you must manually create a folder called tmp
in the root of the disk where this Studio is installed.

Note:
The ELT components do not handle actual data flow but
only schema information.

Usage with Dataproc The ELT Hive components require Tez to be installed on the
Google Cloud Dataproc cluster to be used.
• Use the initialization action explained in this Google
Cloud Platform documentation: Apache Tez on
Dataproc.
• For more details about the general concept of the
initialization actions in a Google Cloud Dataproc
cluster, see the related Google documentation:
Initialization actions.

Related scenarios
• Joining table columns and writing them into Hive on page 710
• Aggregating Snowflake data using context variables as table and connection names on page 725

698
tELTHiveMap

tELTHiveMap
Builds graphically the Hive QL statement in order to transform data.
The three ELT Hive components are closely related, in terms of their operating conditions. These
components should be used to handle Hive DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
This component uses the tables provided as input, to feed the parameter in the built statement. The
statement can include inner or outer joins to be implemented between tables or between one table
and its aliases.

tELTHiveMap Standard properties


These properties are used to configure tELTHiveMap running in the Standard Job framework.
The Standard tELTHiveMap component belongs to the ELT family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:

API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.

Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.

Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.

• When you use this component with Google Dataproc:

Project identifier Enter the ID of your Google Cloud Platform project.

699
tELTHiveMap

If you are not certain about your project ID, check it in the
Manage Resources page of your Google Cloud Platform
services.

Cluster identifier Enter the ID of your Dataproc cluster to be used.

Region From this drop-down list, select the Google Cloud region
to be used.

Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.

Database Fill this field with the name of the database.

Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.

• When you use this component with HDInsight:

WebHCat configuration Enter the address and the authentication information


of the Microsoft HD Insight cluster to be used. For
example, the address could be your_hdinsight
_cluster_name.azurehdinsight.net and the
authentication information is your Azure account name:
ychen. The Studio uses this service to submit the Job to
the HD Insight cluster.
In the Job result folder field, enter the location in which
you want to store the execution result of a Job in the
Azure Storage to be used.

HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.
• The Password is defined when creating your
HDInsight cluster for authentication to this cluster.

Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.

700
tELTHiveMap

In the Hostname field, enter the Primary Blob Service


Endpoint of your Azure Storage account without the
https:// part. You can find this endpoint in the Properties
blade of this storage account.
In the Username field, enter the name of the Azure
Storage account to be used.
In the Password field, enter the access key of the Azure
Storage account to be used. This key can be found in the
Access keys blade of this storage account.

Database Fill this field with the name of the database.

• When you use the other distributions:

Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.

Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.

Host Database server IP address.

Port Listening port number of DB server.

Database Fill this field with the name of the database.

Note:
This field is not available when you select Embedded
from the Connection mode list.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter
the password between double quotes and click OK to
save the settings.

Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave

701
tELTHiveMap

both the Force MapR ticket authentication check box


and the Use Kerberos authentication check box clear,
and then MapR should be able to automatically find
that ticket on the fly.
The values of the following parameters can be found in
the hive-site.xml file of the Hive system to be used.
1. Hive principal uses the value of hive.metastore
.kerberos.principal. This is the service principal of the
Hive Metastore.
2. HiveServer2 local user principal uses the value of
hive.server2.authentication.kerberos.principal.
3. HiveServer2 local user keytab uses the value of
hive.server2.authentication.kerberos.keytab
4. Metastore URL uses the value of javax.jdo.opti
on.ConnectionURL. This is the JDBC connection string
to the Hive Metastore.
5. Driver class uses the value of javax.jdo.opti
on.ConnectionDriverName. This is the name of the
driver for the JDBC connection.
6. Username uses the value of javax.jdo.opti
on.ConnectionUserName. This, as well as the
Password parameter, is the user credential for
connecting to the Hive Metastore.
7. Password uses the value of javax.jdo.opti
on.ConnectionPassword.
For the other parameters that are displayed, please
consult the Hadoop configuration files they belong to.
For example, the Namenode principal can be found in
the hdfs-site.xml file or the hdfs-default.xml file of the
distribution you are using.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.
For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the
keytab file to be used.

Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog

702
tELTHiveMap

box enter the password between double quotes and


click OK to save the settings.
This feature is available only to the HiveServer2 in the
Standalone mode of the following distributions:
• Hortonworks Data Platform 2.0 +
• Cloudera CDH4 +
• Pivotal HD 2.0 +
• Amazon EMR 4.0.0 +

Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their
hostnames. This actually sets the dfs.client.use
.datanode.hostname property to true. When
connecting to a S3N filesystem, you must select this
check box.
For further information about these parameters, see
the documentation or contact the administrator of the
Hadoop cluster to be used.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.

Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you

703
tELTHiveMap

have chosen a machine called masternode as the


NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.

The other properties:

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

ELT Hive Map editor The ELT Map editor helps you to define the output schema
as well as build graphically the Hive QL statement to be
executed. The column names of schema can be different
from the column names in the database.
If you use context variables in the Expression column in the
Map editor to map the input and the output schemas, put
single quotation marks around these context variables, for
example, 'context.v_erpName'.

Style link Select the way in which links are displayed.


Auto: By default, the links between the input and output
schemas and the Web service parameters are in the form of
curves.
Bezier curve: Links between the schema and the Web
service parameters are in the form of curve.
Line: Links between the schema and the Web service
parameters are in the form of straight lines.
This option slightly optimizes performance.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component

704
tELTHiveMap

you are using. Among these options, the following ones


requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

705
tELTHiveMap

Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Execution engine Select this check box and from the drop-down list, select
the framework you need to use to run the Job.
This list is available only when you are using the Embedded
mode for the Hive connection and the distribution you are
working with is:
• Custom: this option allows you connect to a
distribution supporting Tez but not officially supported
by Talend .
Before using Tez, ensure that the Hadoop cluster you are
using supports Tez. You will need to configure the access to
the relevant Tez libraries via the Advanced settings view of
this component.
For further information about Hive on Tez, see Apache's
related documentation in https://cwiki.apache.org/con
fluence/display/Hive/Hive+on+Tez. Some examples are
presented there to show how Tez can be used to gain
performance over MapReduce.

When you need to enable Hive components to access HBase:


These parameters are available only when the Use an existing connection check box is clear.

Store by HBase Select this check box to display the parameters to be set to
allow the Hive components to access HBase tables:
• Once this access is configured, you will be able to use,
in tHiveRow and tHiveInput, the Hive QL statements to
read and write data in HBase.
• If you are using the Kerberos authentication, you
need to define the HBase related principals in the
corresponding fields that are displayed.
For further information about this access involving Hive and
HBase, see Apache's Hive documentation about Hive/HBase
integration.

Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.

Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.

Define the jars to register for HBase Select this check box to display the Register jar for HBase
table, in which you can register any missing jar file required
by HBase, for example, the Hive Storage Handler, by default,
registered along with your Hive installation.

Register jar for HBase Click the [+] button to add rows to this table, then, in the
Jar name column, select the jar file(s) to be registered and

706
tELTHiveMap

in the Jar path column, enter the path(s) pointing to that or


those jar file(s).

Advanced settings

Tez lib Select how the Tez libraries are accessed:


• Auto install: at runtime, the Job uploads and deploys
the Tez libraries provided by the Studio into the
directory you specified in the Install folder in HDFS
field, for example, /tmp/usr/tez.
If you have set the tez.lib.uris property in the properties
table, this directory overrides the value of that
property at runtime. But the other properties set in the
properties table are still effective.
• Use exist: the Job accesses the Tez libraries already
deployed in the Hadoop cluster to be used. You need
to enter the path pointing to those libraries in the Lib
path (folder or file) field.
• Lib jar: this table appears when you have selected Auto
install from the Tez lib list and the distribution you are
using is Custom. In this table, you need to add the Tez
libraries to be uploaded.

Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will

707
tELTHiveMap

override those default ones. For further information for


Hive dedicated properties, see https://cwiki.apache.org/con
fluence/display/Hive/AdminManual+Configuration.
• If you need to use Tez to run your Hive Job, add
hive.execution.engine to the Properties column and
Tez to the Value column, enclosing both of these
strings in double quotation marks.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.

Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are
both 1000 which are normally appropriate for running the
computations.

Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

708
tELTHiveMap

Usage

Usage rule tELTHiveMap is used along with a tELTHiveInput and


tELTHiveOutput. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name.
If the Studio used to connect to a Hive database is operated
on Windows, you must manually create a folder called tmp
in the root of the disk where this Studio is installed.

Note:
The ELT components do not handle actual data flow but
only schema information.

Usage with Dataproc The ELT Hive components require Tez to be installed on the
Google Cloud Dataproc cluster to be used.
• Use the initialization action explained in this Google
Cloud Platform documentation: Apache Tez on
Dataproc.
• For more details about the general concept of the
initialization actions in a Google Cloud Dataproc
cluster, see the related Google documentation:
Initialization actions.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link

709
tELTHiveMap

from MapR: http://www.mapr.com/blog/basic-notes-


on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Joining table columns and writing them into Hive


This scenario applies only to Talend products with Big Data.
This scenario uses a four-component Job to join the columns selected from two Hive tables and write
them into another Hive table.

Preparing the Hive tables


Procedure
1. Create the Hive table you want to write data in. In this scenario, this table is named as agg_result,
and you can create it using the following statement in tHiveRow: create table agg_result
(id int, name string, address string, sum1 string, postal string,
state string, capital string, mostpopulouscity string) partitioned by
(type string) row format delimited fields terminated by ';' location
'/user/ychen/hive/table/agg_result'
In this statement, '/user/ychen/hive/table/agg_result' is the directory used in this scenario to store
this created table in HDFS. You need to replace it with the directory you want to use in your
environment.
For further information about tHiveRow, see tHiveRow on page 1634.

710
tELTHiveMap

2. Create two input Hive tables containing the columns you want to join and aggregate these
columns into the output Hive table, agg_result. The statements to be used are: create table
customer (id int, name string, address string, idState int, id2 int,
regTime string, registerTime string, sum1 string, sum2 string) row
format delimited fields terminated by ';' location '/user/ychen/
hive/table/customer' and create table state_city (id int, postal
string, state string, capital int, mostpopulouscity string) row format
delimited fields terminated by ';' location '/user/ychen/hive/table/
state_city'
3. Use tHiveRow to load data into the two input tables, customer and state_city. The statements to
be used are: "LOAD DATA LOCAL INPATH 'C:/tmp/customer.csv' OVERWRITE INTO
TABLE customer" and "LOAD DATA LOCAL INPATH 'C:/tmp/State_City.csv'
OVERWRITE INTO TABLE state_city"
The two files, customer.csv and State_City.csv, are two local files we created for this scenario. You
need to create your own files to provide data to the input Hive tables. The data schema of each
file should be identical with their corresponding table.
You can use tRowGenerator and tFileOutputDelimited to create these two files easily. For
further information about these two components, see tRowGenerator on page 3134 and
tFileOutputDelimited on page 1113.
For further information about the Hive query language, see https://cwiki.apache.org/confluence/
display/Hive/LanguageManual.

Linking the components


Procedure
1. In the Integration perspective of Talend Studio , create an empty Job from the Job Designs node
in the Repository tree view.
For further information about how to create a Job, see Talend Studio User Guide.
2. Drop two tELTHiveInput components and tELTHiveMap and tELTHiveOutput onto the workspace.
3. Connect them using the Row > Main link.
Each time when you connect two components, a wizard pops up to prompt you to name the
link you are creating. This name must be the same as that of the Hive table you want the active
component to process. In this scenario, the input tables the two tELTHiveInput components will
handle are customer and state_city and the output table tELTHiveOutput will handle is agg_result.

Configuring the input schemas


Procedure
1. Double-click the tELTHiveInput component using the customer link to open its Component view.

711
tELTHiveMap

2. Click the [...] button next to Edit schema to open the schema editor.
3. Click the button as many times as required to add columns and rename them to replicate the
schema of the customer table we created earlier in Hive.

4. In the Default table name field, enter the name of the input table, customer, to be processed by
this component.
5. Double-click the other tELTHiveInput component using the state_city link to open its Component
view.

6. Click the [...] button next to Edit schema to open the schema editor.

712
tELTHiveMap

7. Click the button as many times as required to add columns and rename them to replicate the
schema of the state_city table we created earlier in Hive.

8. In the Default table name field, enter the name of the input table, state_city, to be processed by
this component.

Mapping the input and the output schemas


Configuring the connection to Hive

Procedure
1. Click tELTHiveMap, then, click Component to open its Component view.

2. In the Version area, select the Hadoop distribution you are using and the Hive version.
3. In the Connection mode list, select the connection mode you want to use. If your distribution is
HortonWorks, this mode is Embedded only.

713
tELTHiveMap

4. In the Host field and the Port field, enter the authentication information for the component to
connect to Hive. For example, the host is talend-hdp-all and the port is 9083.
5. Select the Set Jobtracker URI check box and enter the location of the Jobtracker. For example,
talend-hdp-all:50300.
6. Select the Set NameNode URI check box and enter the location of the NameNode. For example,
hdfs://talend-hdp-all:8020. If you are using WebHDFS, the location should be webhdfs://mast
ernode:portnumber; WebHDFS with SSL is not supported yet.

Mapping the schemas

Procedure
1. Click ELT Hive Map Editor to map the schemas

2. On the input side (left in the figure), click the Add alias button to add the table to be used.

3. In the pop-up window, select the customer table, then click OK.
4. Repeat the operations to select the state_city table.
5. Drag and drop the idstate column from the customer table onto the id column of the state_city
table. Thus an inner join is created automatically.
6. On the output side (the right side in the figure), the agg_result table is empty at first. Click
at the bottom of this side to add as many columns as required and rename them to replicate the
schema of the agg_result table you created earlier in Hive.

714
tELTHiveMap

Note:
The type column is the partition column of the agg_result table and should not be replicated in
this schema. For further information about the partition column of the Hive table, see the Hive
manual.

7. From the customer table, drop id, name, address, and sum1 to the corresponding columns in the
agg_result table.
8. From the state_city table, drop postal, state, capital and mostpopulouscity to the corresponding
columns in the agg_result table.
In this scenario, context variables are not used in the Expression column in the Map editor. If you
use context variables, put them in single quotation marks. For example:

9. Click OK to validate these changes.

Configuring the output schema


Procedure
1. Double-click tELTHiveOutput to open its Component view.

715
tELTHiveMap

2. If this component does not have the same schema of the preceding component, a warning icon
appears. In this case, click the Sync columns button to retrieve the schema from the preceding one
and once done, the warning icon disappears.
3. In the Default table name field, enter the output table you want to write data in. In this example,
it is agg_result.
4. In the Field partition table, click to add one row. This allows you to write data in the partition
column of the agg_result table.
This partition column was defined the moment we created the agg_result table using
partitioned by (type string) in the Create statement presented earlier. This partition
column is type, which describes the type of a customer.
5. In Partition column, enter type without any quotation marks and in Partition value, enter
prospective in single quotation marks.

Executing the Job


Procedure
Press F6 to run this Job.

Results
Once done, verify agg_result in Hive using, for example,

select * from agg_result;

716
tELTHiveMap

This figure present only a part of the table. You can find that the selected input columns are
aggregated and written into the agg_result table and the partition column is filled with the value
prospective.

Related scenarios
• Joining table columns and writing them into Hive on page 710
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725

717
tELTHiveOutput

tELTHiveOutput
Works alongside tELTHiveMap to write data into the Hive table.
The three ELT Hive components are closely related, in terms of their operating conditions. These
components should be used to handle Hive DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
This component executes the query built by the preceding tELTHiveMap component to write data into
the specified Hive table.

tELTHiveOutput Standard properties


These properties are used to configure tELTHiveOutput running in the Standard Job framework.
The Standard tELTHiveOutput component belongs to the ELT family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Action on data Select the action to be performed on the data to be written


in the Hive table.
With the Insert option, the data to be written in the Hive
table will be appended to the existing data if there is any.

Schema A schema is a row description. It defines the number


of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.

  Built-In: You create and store the schema locally for this
component only.
Repository: You have already created the schema and stored
it in the Repository. You can reuse it in various projects and
Job designs.

Edit schema Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Default table name Enter the default name of the output table you want to
write data in.

718
tELTHiveOutput

Default schema name Enter the name of the default database schema to which the
output table to be used is related to.

Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.

Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field that appears.
If this table is related to a different database schema from
the default one, you also need to enter the name of that
database schema. The syntax is schema_name.table_name.

The target table uses the Parquet format If the table in which you need to write data is a PARQUET
table, select this check box.
Then from the Compression list that appears, select the
compression mode you need to use to handle the PARQUET
file. The default mode is Uncompressed.

Field Partition In Partition Column, enter the name, without any quotation
marks, of the partition column of the Hive table you want to
write data in.
In Partition Value, enter the value you want to use, in single
quotation marks, for its corresponding partition column.

Mapping Specify the metadata mapping file for the database to


be used. The metadata mapping file is used for the data
type conversion between database and Java. For more
information about the metadata mapping, see the related
documentation for Type mapping.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

719
tELTHiveOutput

Usage

Usage rule tELTHiveMap is used along with a tELTHiveInput and


tELTHiveOutput. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name.
If the Studio used to connect to a Hive database is operated
on Windows, you must manually create a folder called tmp
in the root of the disk where this Studio is installed.

Note:
The ELT components do not handle actual data flow but
only schema information.

Usage with Dataproc The ELT Hive components require Tez to be installed on the
Google Cloud Dataproc cluster to be used.
• Use the initialization action explained in this Google
Cloud Platform documentation: Apache Tez on
Dataproc.
• For more details about the general concept of the
initialization actions in a Google Cloud Dataproc
cluster, see the related Google documentation:
Initialization actions.

Related scenarios
• Joining table columns and writing them into Hive on page 710.
• Aggregating Snowflake data using context variables as table and connection names on page 725

720
tELTInput

tELTInput
Adds as many Input tables as required for the SQL statement to be executed.
The three ELT components are closely related, in terms of their operating conditions. These
components should be used to handle DB schemas to generate Insert statements, including clauses,
which are to be executed in the DB output table defined.
Note that it is highly recommended to use the ELT components for a specific type of database
(if any) instead of the ELT components. For example, for Teradata, it is recommended to use the
tELTTeradataInput, tELTTeradataMap and tELTTeradataOutput components instead.

tELTInput Standard properties


These properties are used to configure tELTInput running in the Standard Job framework.
The Standard tELTInput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the nature and
number of fields to be processed. The schema is either
built-in or remotely stored in the Repository. The Schema
defined is then passed on to the ELT Mapper to be included
to the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Default Table Name Type in the default table name.

Default Schema Name Type in the default schema name.

Mapping Specify the metadata mapping file for the database to


be used. The metadata mapping file is used for the data

721
tELTInput

type conversion between database and Java. For more


information about the metadata mapping, see the related
documentation for Type mapping.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTInput is to be used along with the tELTJDBCMap. Note


that the Output link to be used with these components must
correspond strictly to the syntax of the table name.

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

722
tELTMap

tELTMap
Uses the tables provided as input to feed the parameter in the built SQL statement. The statement
can include inner or outer joins to be implemented between tables or between one table and its
aliases.
The three ELT components are closely related, in terms of their operating conditions. These
components should be used to handle DB schemas to generate Insert statements, including clauses,
which are to be executed in the DB output table defined.
Note that it is highly recommended to use the ELT components for a specific type of database
(if any) instead of the ELT components. For example, for Teradata, it is recommended to use the
tELTTeradataInput, tELTTeradataMap and tELTTeradataOutput components instead.

tELTMap Standard properties


These properties are used to configure tELTMap running in the Standard Job framework.
The Standard tELTMap component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

ELT Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.

Style link Select the way in which links are displayed.


Auto: By default, the links between the input and output
schemas and the Web service parameters are in the form of
curves.
Bezier curve: Links between the schema and the Web
service parameters are in the form of curve.

723
tELTMap

Line: Links between the schema and the Web service


parameters are in the form of straight lines.
This option slightly optimizes performance.

Property Type Either Built-In or Repository.


• Built-In: No property data stored centrally.
• Repository: Select the repository file where the
properties are stored.

JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.

Driver JAR Complete this table to load the driver JARs needed. To do
this, click the [+] button under the table to add as many
rows as needed, each row for a driver JAR, then select
the cell and click the [...] button at the right side of the
cell to open the Module dialog box from which you can
select the driver JAR to be used. For example, the driver jar
RedshiftJDBC41-1.1.13.1013.jar for the Redshift
database.

Class name Enter the class name for the specified driver between
double quotation marks. For example, for the
RedshiftJDBC41-1.1.13.1013.jar driver, the name
to be entered is com.amazon.redshift.jdbc41.D
river.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Mapping Specify the metadata mapping file for the database to


be used. The metadata mapping file is used for the data
type conversion between database and Java. For more
information about the metadata mapping, see the related
documentation for Type mapping.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

724
tELTMap

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTMap is used along with tELTInput and tELTOutput.


Note that the Output link to be used with these components
must correspond strictly to the syntax of the table name.

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Aggregating Snowflake data using context variables as


table and connection names
This scenario shows you an example of aggregating Snowflake data from two source tables STUDENT
and TEACHER to one target table FULLINFO using the ELT components. In this example, set all input
and output table names and connection names to context variables.

725
tELTMap

Creating the Job

Before you begin


• A new Job has been created and the context variables SourceTableS with the value STUDENT,
SourceTableT with the value TEACHER, and TargetTable with the value FULLINFO have
been added to the Job. For more information about how to use context variables, see the related
documentation about using contexts and variables.
• The source table STUDENT with three columns, SID and TID of NUMBER(38,0) type and SNAME
of VARCHAR(50) type, has been created in Snowflake, and the following data has been written
into the table.

#SID;SNAME;TID
11;Alex;22
12;Mark;23
13;Stephane;21
14;Cedric;22
15;Bill;21
16;Jack;23
17;John;22
18;Andrew;23

• The source table TEACHER with three columns, TID of NUMBER(38,0) type and TNAME and
TPHONE of VARCHAR(50) type, has been created in Snowflake, and the following data has been
written into the table.

#TID;TNAME;TPHONE
21;Peter;+86 15812343456
22;Michael;+86 13178964532
23;Candice;+86 13923187456

Procedure
1. Add a tSnowflakeConnection component, a tSnowflakeClose component, two tELTInput
components, a tELTMap component, and a tELTOutput component to your Job.
2. On the Basic setting view of the first tELTInput component, enter the name of the first
source table in the Default Table Name field. In this example, it is the context variable
context.SourceTableS.

726
tELTMap

3. Repeat step 2 to set the value of the default table name for the second tELTInput component
and the tELTOutput component to context.SourceTableT and context.TargetTable
respectively.
4. Link the first tELTInput component to the tELTMap component using the Link > context.Source
TableS (Table) connection.
5. Link the second tELTInput component to the tELTMap component using the Link > context.Source
TableT (Table) connection.
6. Link the tELTMap component to the tELTOutput component using the Link > *New Output*
(Table) connection. The link will be renamed automatically to context.TargetTable
(Table).
7. Link the tSnowflakeConnection component to the tELTMap component using a Trigger > On
Subjob Ok connection.
8. Link the tELTMap component to the tSnowflakeClose component.

Connecting to Snowflake
Configure the tSnowflakeConnection component to connect to Snowflake.

Procedure
1. Double-click the tSnowflakeConnection component to open its Basic settings view.
2. In the Account field, enter the account name assigned by Snowflake.
3. In the Snowflake Region field, select the region where the Snowflake database locates.
4. In the User Id and the Password fields, enter the authentication information accordingly.
Note that this user ID is your user login name. If you do not know your user login name yet, ask
the administrator of your Snowflake system for details.
5. In the Warehouse field, enter the name of the data warehouse to be used in Snowflake.
6. In the Schema field, enter the name of the database schema to be used.
7. In the Database field, enter the name of the database to be used.

Configuring the input components

Procedure
1. Double-click the first tELTInput component to open its Basic settings view.

727
tELTMap

2. Click the [...] button next to Edit schema and in the schema dialog box displayed, define the
schema by adding three columns, SID and TID of INT type and SNAME of VARCHAR type.
3. Select Mapping Snowflake from the Mapping drop-down list.
4. Repeat the previous steps to configure the second tELTInput component, and define its schema by
adding three columns, TID of INT type and TNAME and TPHONE of VARCHAR type.

Configuring the output component

Procedure
1. Double-click the tELTOutput component to open the Basic settings view.
2. Select Create table from the Action on table drop-down list to create the target table.
3. Select the Table name from connection name is variable check box.
4. Select Mapping Snowflake from the Mapping drop-down list.

Configuring the map component for aggregating Snowflake data

Procedure
1. Click the tELTMap component to open its Basic settings view.

2. Select the Use an existing connection check box and from the Component List displayed, select
the connection component you have configured to open the Snowflake connection.
3. Select Mapping Snowflake from the Mapping drop-down list.
4. Click the [...] button next to ELT Map Editor to open its map editor.
5. Add the first input table context.SourceTableS by clicking the [+] button in the upper left
corner of the map editor and then selecting the relevant table name from the drop-down list in
the pop-up dialog box.
6. Do the same to add the second input table context.SourceTableT.
7. Drag the column TID from the first input table context.SourceTableS and drop it onto the
corresponding column TID in the second input table context.SourceTableT.
8. Drag all columns from the input table context.SourceTableS and drop them onto the output
table context.TargetTable in the upper right panel.
9. Do the same to drag two columns TNAME and TPHONE from the input table context.Source
TableT and drop them onto the bottom of the output table. When done, click OK to close the
map editor.
10. Click the Sync columns button on the Basic settings view of the tELTOutput component to set its
schema.

728
tELTMap

Closing the Snowflake connection


Configure the tSnowflakeClose component to close the connection to Snowflake.

Procedure
1. Double-click the tSnowflakeClose component to open the Component tab.
2. From the Connection Component drop-down list, select the component that opens the connection
you need to close, tSnowflakeConnection_1 in this example.

Executing the Job

Procedure
1. Press Ctrl + S to save the Job.
2. Press F6 to execute the Job.

As shown above, Talend Studio executes the Job successfully and inserts eight rows into the
target table.
You can then create and run another Job to retrieve data from the target table by using the
tSnowflakeInput component and the tLogRow component. You will find that the aggregated data
are displayed on the console as shown in below screenshot.

For more information about how to retrieve data from Snowflake, see Writing data into and
reading data from a Snowflake table on page 3407.

Related scenarios
• Aggregating table columns and filtering on page 745.
• Mapping date using using an Alias table on page 749.
• Mapping data using a subquery on page 800, a related scenario using subquery

729
tELTOutput

tELTOutput
Carries out the action on the table specified and inserts the data according to the output schema
defined in the ELT Mapper.
The three ELT components are closely related, in terms of their operating conditions. These
components should be used to handle DB schemas to generate Insert statements, including clauses,
which are to be executed in the DB output table defined.
Note that it is highly recommended to use the ELT components for a specific type of database
(if any) instead of the ELT components. For example, for Teradata, it is recommended to use the
tELTTeradataInput, tELTTeradataMap and tELTTeradataOutput components instead.

tELTOutput Standard properties


These properties are used to configure tELTOutput running in the Standard Job framework.
The Standard tELTOutput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Action on table Select an operation to be performed on the table defined.


• None: No operation is carried out.
• Drop and create table: The table is removed and
created again.
• Create table: The table does not exist and gets created.
• Create table if does not exist: The table is created if it
does not exist.
• Drop table if exist and create: The table is removed if it
already exists and created again.
• Clear table: The table content is deleted. You have the
possibility to rollback the operation.
• Truncate table: The table content is deleted. You do
not have the possibility to rollback the operation.

Action on data On the data of the table defined, you can perform the
following operation:
• Insert: Adds new entries to the table. If duplicates are
found, Job stops.
• Update: Updates entries in the table.
• Delete: Deletes the entries which correspond to the
entry flow.

Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.

730
tELTOutput

• Change to built-in property: choose this option to


change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.

Default Table Name Enter the default table name, between double quotation
marks.

Default Schema Name Enter the default schema name, between double quotation
marks.

Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.

Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field which appears.

Mapping Specify the metadata mapping file for the database to


be used. The metadata mapping file is used for the data
type conversion between database and Java. For more
information about the metadata mapping, see the related
documentation for Type mapping.

Advanced settings

Use update statement without subqueries Select this option to generate an UPDATE statement for the
database.
This option is available when Update is selected from the
Action on data drop-down list in the Basic settings view.

Clause SET Select the column names that will be used to generate the
SET clauses.
SET clauses will not be generated for the columns that are
not selected.
This field appears when Update is selected from the Action
on data drop-down list in the Basic settings view.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

731
tELTOutput

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTOutput is to be used along with the tELTMap. Note that
the Output link to be used with these components must
correspond strictly to the syntax of the table name.
Note that the ELT components do not handle actual data
flow but only schema information.

Limitation Avoid using any keyword for the database as the table/
column name or using any special character in the table/
column name. If you want to, you can enclose the table/
column name in a pair of \" to see whether it works. For
example, when you want to use the keyword number as an
Oracle database column name, you can have the Db Column
value in the schema editor set to \"number\". But note
that this solution does not always work.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

732
tELTMSSqlInput

tELTMSSqlInput
Adds as many Input tables as required for the most complicated Insert statement.
The three ELT MSSql components are closely related, in terms of their operating conditions. These
components should be used to handle MSSql DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
Provides the table schema to be used for the SQL statement to execute.

tELTMSSqlInput Standard properties


These properties are used to configure tELTMSSqlInput running in the Standard Job framework.
The Standard tELTMSSqlInput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the nature and
number of fields to be processed. The schema is either
built-in or remotely stored in the Repository. The Schema
defined is then passed on to the ELT Mapper to be included
to the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Default Table Name Type in the default table name.

Default Schema Name Type in the default schema name.

733
tELTMSSqlInput

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTMySSqlInput is to be used along with the


tELTMSSsqlMap. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name.

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

734
tELTMSSqlMap

tELTMSSqlMap
Uses the tables provided as input to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
The three ELT MSSql components are closely related, in terms of their operating conditions. These
components should be used to handle MSSql DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
Helps you to build the SQL statement graphically, using the table provided as input.

tELTMSSqlMap Standard properties


These properties are used to configure tELTMSSqlMap running in the Standard Job framework.
The Standard tELTMSSqlMap component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

ELT MSSql Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.

Style link Select the way in which links are displayed.


Auto: By default, the links between the input and output
schemas and the Web service parameters are in the form of
curves.
Bezier curve: Links between the schema and the Web
service parameters are in the form of curve.
Line: Links between the schema and the Web service
parameters are in the form of straight lines.
This option slightly optimizes performance.

735
tELTMSSqlMap

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the Repository file where Properties are


stored. The following fields are pre-filled in using fetched
data.

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTMSSqlMap is used along with a tELTMSSqlInput and


tELTMSSqlOutput. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name.

736
tELTMSSqlMap

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725

737
tELTMSSqlOutput

tELTMSSqlOutput
Executes the SQL Insert, Update and Delete statement to the MSSql database
The three ELT MSSql components are closely related, in terms of their operating conditions. These
components should be used to handle MSSql DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.

tELTMSSqlOutput Standard properties


These properties are used to configure tELTMSSqlOutput running in the Standard Job framework.
The Standard tELTMSSqlOutput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Action on data On the data of the table defined, you can perform the
following operation:
Insert: Adds new entries to the table. If duplicates are found,
Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

738
tELTMSSqlOutput

Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.

Default Table Name Enter the default table name, between double quotation
marks.

Default Schema Name Enter the default schema name,between double quotation
marks.

Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.

Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field which appears.

Mapping Specify the metadata mapping file for the database to


be used. The metadata mapping file is used for the data
type conversion between database and Java. For more
information about the metadata mapping, see the related
documentation for Type mapping.

Advanced settings

Use update statement without subqueries Select this option to generate an UPDATE statement for the
MSSql database.
This option is available when Update is selected from the
Action on data drop-down list in the Basic settings view.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

739
tELTMSSqlOutput

Usage

Usage rule tELTMSSqlOutput is to be used along with the


tELTMSSqlMap. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name.

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

740
tELTMysqlInput

tELTMysqlInput
Adds as many Input tables as required for the most complicated Insert statement.
The three ELT Mysql components are closely related, in terms of their operating conditions. These
components should be used to handle Mysql DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
tELTMysqlInput provides the table schema to be used for the SQL statement to execute.

tELTMysqlInput Standard properties


These properties are used to configure tELTMysqlInput running in the Standard Job framework.
The Standard tELTMysqlInput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the nature and
number of fields to be processed. The schema is either
built-in or remotely stored in the Repository. The Schema
defined is then passed on to the ELT Mapper to be included
to the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Default Table Name Enter the default table name, between double quotation
marks.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

741
tELTMysqlInput

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTMysqlInput is to be used along with the tELTMysqlMap.


Note that the Output link to be used with these components
must correspond strictly to the syntax of the table name.

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

742
tELTMysqlMap

tELTMysqlMap
Uses the tables provided as input to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
The three ELT Mysql components are closely related, in terms of their operating conditions. These
components should be used to handle Mysql DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
tELTMysqlMap helps to graphically build the SQL statement using the table provided as input.

tELTMysqlMap Standard properties


These properties are used to configure tELTMysqlMap running in the Standard Job framework.
The Standard tELTMysqlMap component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

ELT Mysql Map editor The ELT Map editor allows you to define the output schema
as well as build graphically the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.

Style link Select the way in which links are displayed.


Auto: By default, the links between the input and output
schemas and the Web service parameters are in the form of
curves.
Bezier curve: Links between the schema and the Web
service parameters are in the form of curve.
Line: Links between the schema and the Web service
parameters are in the form of straight lines.
This option slightly optimizes performance.

743
tELTMysqlMap

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the Repository file where Properties are


stored. The following fields are pre-filled in using fetched
data.

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Advanced settings

Additional JDBC Specify additional JDBC parameters for the database connection created.
Parameters
This property is not available when the Use an existing connection check box in the Basic settings
view is selected.

tStatCatcher Statistics Select this check box to collect log data at the component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTMysqlMap is used along with a tELTMysqlInput and


tELTMysqlOutput. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name.

744
tELTMysqlMap

Note:
The ELT components do not handle actual data flow but
only schema information.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Aggregating table columns and filtering


This scenario describes a Job that gathers together several input DB table schemas and implementing
a clause to filter the output using an SQL statement.

Building a Job

Procedure
1. Add the following components from the Palette onto the design workspace. Label these
components to best describe their functionality.
• three tELTMysqlInput components
• a tELTMysqlMap
• a tELTMysqlOutput
2. Double-click the first tELTMysqlInput component to display its Basic settings view.

745
tELTMysqlMap

3. Select Repository from the Schema list, click the three dot button preceding Edit schema, and
select your DB connection and the desired schema from the Repository Content dialog box.
The selected schema name appears in the Default Table Name field automatically.
In this use case, the DB connection is Talend_MySQL and the schema for the first input component
is owners.
4. Set the second and third tELTMysqlInput components in the same way but select cars and resellers
respectively as their schema names.

Note: In this use case, all the involved schemas are stored in the Metadata node of the
Repository tree view for easy retrieval. For further information concerning metadata, see
Talend Studio User Guide.
You can also select the three input components by dropping the relevant schemas from
the Metadata area onto the design workspace and double-clicking tELTMysqlInput from
the Components dialog box. Doing so allows you to skip the steps of labeling the input
components and defining their schemas manually.

5. Connect the three tELTMysqlInput components to the tELTMysqlMap component using links
named following strictly the actual DB table names: owners, cars and resellers.
6. Connect the tELTMysqlMap component to the tELTMysqlOutput component and name the link
agg_result, which is the name of the database table you will save the aggregation result to.
7. Click the tELTMysqlMap component to display its Basic settings view.

8. Select Repository from the Property Type list, and select the same DB connection that you use for
the input components.
All the database details are automatically retrieved.

Tip: Leave all the other settings as they are.

9. Double-click the tELTMysqlMap component to launch the ELT Map editor to set up joins between
the input tables and define the output flow.

746
tELTMysqlMap

10. Add the input tables by clicking the green plus button at the upper left corner of the ELT Map
editor and selecting the relevant table names in the Add a new alias dialog box.
11. Drop the ID_Owner column from the owners table to the corresponding column of the cars table.
12. In the cars table, select the Explicit join check box in front of the ID_Owner column.
As the default join type, INNER JOIN is displayed on the Join list.
13. Drop the ID_Reseller column from the cars table to the corresponding column of the resellers
table to set up the second join, and define the join as an inner join in the same way.
14. Select the columns to be aggregated into the output table, agg_result.
15. Drop the ID_Owner, Name, and ID_Insurance columns from the owners table to the output table.
16. Drop the Registration, Make, and Color columns from the cars table to the output table.
17. Drop the Name_Reseller and City columns from the resellers table to the output table.
With the relevant columns selected, the mappings are displayed in yellow and the joins are
displayed in dark violet.
18. Set up a filter in the output table. Click the Add filter row button on top of the output table to
display the Additional clauses expression field, drop the City column from the resellers table to the
expression field, and complete a WHERE clause that reads resellers.City ='Augusta'.

747
tELTMysqlMap

19. Click the Generated SQL Select query tab to display the corresponding SQL statement.

20. Click OK to save the ELT Map settings.


21. Double-click the tELTMysqlOutput component to display its Basic settings view.

22. Select an action from the Action on data list as needed.


23. Select Repository as the schema type, and define the output schema in the same way as you
defined the input schemas. In this use case, select agg_result as the output schema, which is the
name of the database table used to store the mapping result.

748
tELTMysqlMap

Note: You can also use a built-in output schema and retrieve the schema structure from the
preceding component; however, make sure that you specify an existing target table having the
same data structure in your database.

Tip: Leave all the other settings as they are.

Running the Job

Procedure
1. Save your Job.
2. Press F6 to launch it.
All selected data is inserted in the agg_result table as specified in the SQL statement.

Mapping date using using an Alias table


This scenario describes a Job that maps information from two input tables and an alias table, serving
as a virtual input table, to an output table. The employees table contains employees' IDs, their
department numbers, their names, and the IDs of their respective managers. The managers are also
considered as employees and hence included in the employees table. The dept table contains the
department information. The alias table retrieves the names of the managers from the employees
table.

749
tELTMysqlMap

Building a Job

Procedure
1. Drop two tELTMysqlInput components, a tELTMysqlMap component, and a tELTMysqlOutput
component to the design workspace, and label them to best describe their functionality.
2. Double-click the first tELTMysqlInput component to display its Basic settings view.

3. Select Repository from the Schema list, and define the DB connection and schema by clicking the
three dot button preceding Edit schema.
The DB connection is Talend_MySQL and the schema for the first input component is employees.

Note:
In this use case, all the involved schemas are stored in the Metadata node of the Repository
tree view for easy retrieval. For further information concerning metadata, see Talend Studio
User Guide.

4. Set the second tELTMysqlInput component in the same way but select dept as its schema.
5. Double-click the tELTMysqlOutput component to display its Basic settings view.

6. Select an action from the Action on data list as needed, Insert in this use case.
7. Select Repository as the schema type, and define the output schema in the same way as you
defined the input schemas. In this use case, select result as the output schema, which is the name
of the database table used to store the mapping result.
The output schema contains all the columns of the input schemas plus a ManagerName column.

Note: Leave all the other parameters as they are.

750
tELTMysqlMap

Connecting the components

Procedure
1. Connect the two tELTMysqlInput components to the tELTMysqlMap component using Link
connections named strictly after the actual input table names, employees and dept in this use case.
2. Connect the tELTMysqlMap component to the tELTMysqlOutput component using a Link
connection. When prompted, click Yes to allow the ELT Mapper to retrieve the output table
structure from the output schema.
3. Click the tELTMysqlMap component and select the Component tab to display its Basic settings
view.

4. Select Repository from the Property Type list, and select the same DB connection that you use for
the input components.
All the DB connection details are automatically retrieved.

Note: Leave all the other parameters as they are.

Configuring the Job

Procedure
1. Click the three-dot button next to ELT Mysql Map Editor or double-click the tELTMysqlMap
component on the design workspace to launch the ELT Map editor.
With the tELTMysqlMap component connected to the output component, the output table is
displayed in the output area.
2. Add the input tables, employees and dept, in the input area by clicking the green plus button and
selecting the relevant table names in the Add a new alias dialog box.
3. Create an alias table based on the employees table by selecting employees from the Select the
table to use list and typing in Managers in the Type in a valid alias field in the Add a new alias
dialog box.

751
tELTMysqlMap

4. Drop the DeptNo column from the employees table to the dept table.
5. Select the Explicit join check box in front of the DeptNo column of the dept table to set up an
inner join.
6. Drop the ManagerID column from the employees table to the ID column of the Managers table.
7. Select the Explicit join check box in front of the ID column of the Managers table and select LEFT
OUTER JOIN from the Join list to allow the output rows to contain Null values.

8. Drop all the columns from the employees table to the corresponding columns of the output table.
9. Drop the DeptName and Location columns from the dept table to the corresponding columns of
the output table.
10. Drop the Name column from the Managers table to the ManagerName column of the output table.

752
tELTMysqlMap

11. Click on the Generated SQL Select query tab to display the SQL query statement to be executed.

Running the Job

Procedure
1. Save your Job.
2. Press F6 to run it.
The output database table result contains all the information about the employees, including the
names of their respective managers.

Related scenarios
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725

753
tELTMysqlOutput

tELTMysqlOutput
tELTMysqlOutput executes the SQL Insert, Update and Delete statement to the Mysql database
The three ELT Mysql components are closely related, in terms of their operating conditions. These
components should be used to handle Mysql DB schemas to generate Insert statements, including
clauses, which are to be executed in the DB output table defined.
tELTMysqlOutput carries out the action on the table specified and inserts the data according to the
output schema defined the ELT Mapper.

tELTMysqlOutput Standard properties


These properties are used to configure tELTMysqlOutput running in the Standard Job framework.
The Standard tELTMysqlOutput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Action on data On the data of the table defined, you can perform the
following operation:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.

Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

754
tELTMysqlOutput

Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.

Default Table Name Enter the default table name, between inverted commas.
Note that the table must exist already. If it does not exist,
you can use tCreateTable to create one first. For more
information about tCreateTable, see tCreateTable on page
540.

Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.

Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field which appears.

Mapping Specify the metadata mapping file for the database to


be used. The metadata mapping file is used for the data
type conversion between database and Java. For more
information about the metadata mapping, see the related
documentation for Type mapping.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTMysqlOutput is to be used along with the


tELTMysqlMap. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name.

755
tELTMysqlOutput

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

756
tELTNetezzaInput

tELTNetezzaInput
Allows you to add as many Input tables as required for the most complicated Insert statement.
The three ELT Netezza components are closely related, in terms of their operating conditions. These
components should be used to handle Netezza database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Provides the table schema to be used for the SQL statement to execute.

tELTNetezzaInput Standard properties


These properties are used to configure tELTNetezzaInput running in the Standard Job framework.
The Standard tELTNetezzaInput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User Guide
User Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Default Table Name Type in the default table name.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

757
tELTNetezzaInput

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTNetezzaInput is to be used along with the


tELTNetezzaMap. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Related scenarios
• Mapping data using a simple implicit join on page 686
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

758
tELTNetezzaMap

tELTNetezzaMap
Uses the tables provided as input, to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
The three ELT Netezza components are closely related, in terms of their operating conditions. These
components should be used to handle Netezza database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.
Helps you to build the SQL statement graphically, using the table provided as input.

tELTNetezzaMap Standard properties


These properties are used to configure tELTNetezzaMap running in the Standard Job framework.
The Standard tELTNetezzaMap component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

ELT Netezza Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.

Style link Select the way in which links are displayed.


Auto: By default, the links between the input and output
schemas and the Web service parameters are in the form of
curves.
Bezier curve: Links between the schema and the Web
service parameters are in the form of curve.
Line (fastest): Links between the schema and the Web
service parameters are in the form of straight lines.
This option slightly optimizes performance.

759
tELTNetezzaMap

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the Repository file where Properties are


stored. The following fields are filled in using fetched data.

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTNetezzaMap is used along with tELTNetezzaInput and


tELTNetezzaOutput. Note that the Output link to be used
with these components must correspond strictly to the
syntax of the table name.

760
tELTNetezzaMap

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
• Mapping data using a simple implicit join on page 686
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725

761
tELTNetezzaOutput

tELTNetezzaOutput
Performs the action (insert, update or delete) on data in the specified Netezza table through the SQL
statement generated by the tELTNetezzaMap component.
The three ELT Netezza components are closely related, in terms of their operating conditions. These
components should be used to handle Netezza database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTNetezzaOutput Standard properties


These properties are used to configure tELTNetezzaOutput running in the Standard Job framework.
The Standard tELTNetezzaOutput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Action on data On the data of the table defined, you can perform the
following operation:
Insert: Adds new entries to the table.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.

Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

762
tELTNetezzaOutput

Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.

Default Table Name Enter the default table name, between double quotation
marks.

Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.

Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field that appears.

Mapping Specify the metadata mapping file for the database to


be used. The metadata mapping file is used for the data
type conversion between database and Java. For more
information about the metadata mapping, see the related
documentation for Type mapping.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTNetezzaOutput is to be used along with the


tELTNetezzaMap. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name.

763
tELTNetezzaOutput

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Related scenarios
• Mapping data using a simple implicit join on page 686
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

764
tELTOracleInput

tELTOracleInput
Provides the Oracle table schema that will be used by the tELTOracleMap component to generate the
SQL SELECT statement.
The three ELT Oracle components are closely related, in terms of their operating conditions. These
components should be used to handle Oracle database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTOracleInput Standard properties


These properties are used to configure tELTOracleInput running in the Standard Job framework.
The Standard tELTOracleInput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the nature and
number of fields to be processed. The schema is either
built-in or remotely stored in the Repository. The Schema
defined is then passed on to the ELT Mapper to be included
to the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Default Table Name Enter the default table name, between double quotation
marks.

Default Schema Name Enter the default schema name,between double quotation
marks.

765
tELTOracleInput

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTOracleInput is to be used along with the


tELTOracleMap. Note that the Output link to be used with
these components must must correspond strictly to the
syntax of the table name

Note:
The ELT components do not handle actual data flow but
only schema information.

Related scenarios
• Updating Oracle database entries on page 769
• Aggregating Snowflake data using context variables as table and connection names on page 725

766
tELTOracleMap

tELTOracleMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTOracleInput
components.
The three ELT Oracle components are closely related, in terms of their operating conditions. These
components should be used to handle Oracle database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTOracleMap Standard properties


These properties are used to configure tELTOracleMap running in the Standard Job framework.
The Standard tELTOracleMap component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

ELT Oracle Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.

Style link Auto: By default, the links between the input and output
schemas and the Web service parameters are in the form of
curves.
Bezier curve: Links between the schema and the Web
service parameters are in the form of curve.
Line: Links between the schema and the Web service
parameters are in the form of straight lines.
This option slightly optimizes performance.

Property type Either Built-in or Repository.

767
tELTOracleMap

  Built-in: No property data stored centrally.

  Repository: Select the Repository file where Properties are


stored. The following fields are pre-filled in using fetched
data.

Connection type Drop-down list of the available drivers.

DB Version Select the Oracle version you are using.

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Mapping Automatically set mapping parameter.

Advanced settings

Additional JDBC Parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

Use Hint Options Select this check box to activate the hint configuration
area to help you optimize a query's execution. In this area,
parameters are:
- HINT: specify the hint you need, using the syntax /*+
*/. - POSITION: specify where you put the hint in a SQL
statement.
- SQL STMT: select the SQL statement you need to use.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

768
tELTOracleMap

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule tELTOracleMap is used along with a tELTOracleInput and


tELTOracleOutput. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name.

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Updating Oracle database entries


This scenario is based on the data aggregation scenario, Aggregating table columns and filtering on
page 745. As the data update action is available in Oracle database, this scenario describes a Job that
updates particular data in the Agg_Result table.

769
tELTOracleMap

Adding components
As described in Aggregating table columns and filtering on page 745, configure a Job for data
aggregation using the corresponding ELT components for Oracle database - tELTOracleInput,
tELTOracleMap, and tELTOracleOutput. Execute the Job to save the aggregation result in a database
table named Agg_Result.

Note:
When defining filters in the ELT Map editor, note that strings are case sensitive in Oracle database.

Procedure
1. Launch the ELT Map editor and add a new output table named update_data.
2. Add a filter row to the update_data table to set up a relationship between input and output tables:
owners.ID_OWNER = agg_result.ID_OWNER.
3. Drop the MAKE column from the cars table to the update_data table.
4. Drop the NAME_RESELLER column from the resellers table to the update_data table.
5. Add a model enclosed in single quotation marks, 'A8' in this use case, to the MAKE column from
the cars table, preceded by a double pipe.

6. Add Sold by enclosed in single quotation marks in front of the NAME_RESELLER column from
the resellers table, with a double pipe in between.

770
tELTOracleMap

7. Check the Generated SQL select query tab to be executed.

8. Click OK to validate the changes in the ELT Mapper.


9. Deactivate the tELTOracleOutput component labeled Agg_Result by right-clicking it and selecting
Deactivate Agg_Result from the contextual menu.
10. Drop a new tELTOracleOutput component from the Palette to the design workspace, and label it
Update_Data to better identify its functionality.
11. Connect the tELTOracleMap component to the new tELTOracleOutput component using the link
corresponding to the new output table defined in the ELT Mapper, update_data in this use
case.
12. Double-click the new tELTOracleOutput component to display its Basic settings view.

13. From the Action on data list, select Update.


14. Check the schema, and click Sync columns to retrieve the schema structure from the preceding
component if necessary.
15. In the WHERE clauses area, add a clause that reads agg_result.MAKE = 'Audi' to update
data relating to the make of Audi in the database table agg_result.
16. Fill the Default Table Name field with the name of the output link, update_data in this use
case.
17. Select the Use different table name check box, and fill the Table name field with the name of the
database table to be updated, agg_result in this use case. Leave the other parameters as they
are.

Running the Job

Procedure
1. Save your Job.

771
tELTOracleMap

2. Click Run to execute the Job.


The relevant data in the database table is updated as defined:

Related scenario
• Updating Oracle database entries on page 769
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725

772
tELTOracleOutput

tELTOracleOutput
Performs the action (insert, update, delete, or merge) on data in the specified Oracle table through the
SQL statement generated by the tELTOracleMap component.
The three ELT Oracle components are closely related, in terms of their operating conditions. These
components should be used to handle Oracle database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTOracleOutput Standard properties


These properties are used to configure tELTOracleOutput running in the Standard Job framework.
The Standard tELTOracleOutput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic Settings

Action on data On the data of the table defined, you can perform the
following operation:
Insert: Add new entries to the table. If duplicates are found,
the Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.
MERGE: Updates and/or adds data to the table. Note
that the options available for the MERGE operation are
different to those available for the Insert, Update or Delete
operations.

Note:
Following global variables are available:
• NB_LINE_INSERTED: Number of lines inserted
during the Insert operation.
• NB_LINE_UPDATED: Number of lines updated during
the Update operation.
• NB_LINE_DELETED: Number of lines deleted during
the Delete operation.
• NB_LINE_MERGED: Number of lines inserted and/or
updated during the MERGE operation.

Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.

773
tELTOracleOutput

• Update repository connection: choose this option


to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.

Use Merge Update (for MERGE) Select this check box to update the data in the output table.
Column : Lists the columns in the entry flow.
Update : Select the check box which corresponds to the
name of the column you want to update.
Use Merge Update Where Clause : Select this check box and
enter the WHERE clause required to filter the data to be
updated, if necessary.
Use Merge Update Delete Clause: Select this check box and
enter the WHERE clause required to filter the data to be
deleted and updated, if necessary.

Use Merge Insert (for MERGE) Select this check box to insert the data in the table.
Column: Lists the entry flow columns.
Check All: Select the check box corresponding to the name
of the column you want to insert.
Use Merge Update Where Clause: Select this check box and
enter the WHERE clause required to filter the data to be
inserted.

Default Table Name Enter a default name for the table, between double
quotation marks.

Default Schema Name Enter a name for the default Oracle schema, between
double quotation marks.

Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.

Use different table name Select this check box to define a different output table
name, between double quotation marks, in the Table name
field which appears.

Mapping Specify the metadata mapping file for the database to


be used. The metadata mapping file is used for the data
type conversion between database and Java. For more
information about the metadata mapping, see the related
documentation for Type mapping.

774
tELTOracleOutput

Advanced settings

Use Hint Options Select this check box to activate the hint configuration
area when you want to use a hint to optimize a query's
execution. In this area, parameters are:
- HINT: specify the hint you need, using the syntax /*+ */.
- POSITION: specify where you put the hint in a SQL
statement.
- SQL STMT: select the SQL statement you need to use.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTOracleOutput is to be used along with the


tELTOracleInput and tELTOracleMap components. Note that
the Output link to be used with these components must
correspond strictly to the syntax of the table name.

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Managing data using the Oracle MERGE function


The sample Job described in this scenario allows you to add new customer information and update
existing customer information in a database table using the Oracle MERGE command.

775
tELTOracleOutput

Linking the components


Procedure
1. Add the following components from the Palette to the design workspace: tELTOracleInput,
tELTOracleMap, and tELTOracleOutput.
2. Label tELTOracleInput as new_customer, tELTOracleMap as ELT Mapper, and tELTOracleOutp
ut as merge_data.
3. Link tELTOracleInput to tELTOracleMap using a Row > New Output (table) connection.
4. When prompted, enter NEW_CUSTOMER as the table name, which should be the actual database
table name.
5. Link tELTOracleMap to tELTOracleOutput using a Row > New Output (table) connection.
6. When prompted, enter customers_merge as the name of the database table, which holds the
merge results.

Configuring the components


Procedure
1. Double-click the tELTOracleInput component to display its Basic settings view.

2. Select Repository from the Schema list and click the [...] button preceding Edit schema.
3. Select your database connection and the desired schema from the Repository Content dialog box.

The selected schema name appears in the Default Table Name field automatically.
• In this use case, the database connection is Talend_Oracle and the schema is
new_customers.
• In this use case, the input schema is stored in the Metadata node of the Repository tree view
for easy retrieval. For further information concerning metadata, see Talend Studio User Guide.

776
tELTOracleOutput

• You can also select the input component by dropping the relevant schema from the Metadata
area onto the design workspace and double-clicking tELTOracleInput from the Components
dialog box. Doing so allows you to skip the steps of labeling the input component and
defining its schema manually.
4. Click the tELTOracleMap component to display its Basic settings view.

5. Select Repository from the Property Type list, and select the same database connection that you
use for the input components.

Remember: All the database details are automatically retrieved. Leave the other settings as
they are.

6. Double-click the tELTOracleMap component to launch the ELT Map editor for setingup the data
transformation flow.
Display the input table by clicking the green plus button at the upper left corner of the ELT Map
editor and selecting the relevant table name in the Add a new alias dialog box.
In this use case, the only input table is new_customers.

777
tELTOracleOutput

7. Select all the columns in the input table and drop them to the output table.

8. Click the Generated SQL Select query tab to display the query statement to be executed.

Click OK to validate the ELT Map settings and close the ELT Map editor.
9. Double-click the tELTOracleOutput component to display its Basic settings view.
a) From the Action on data list, select MERGE.
b) Click the Sync columns button to retrieve the schema from the preceding component.
c) Select the Use Merge Update check box to update the data using Oracle's MERGE function.
10. In the table that appears, select the check boxes for the columns you want to update.
In this use case, youupdate all the data according to the customer ID. Therefore, select all the
check boxes except the one for the ID column.

Warning: The columns defined as the primary key cannot and must not be made subject to
updates.

11. Select the Use Merge Insert check box to insert new data while updating the existing data by
leveraging the OracleMERGE function.
12. In the table that appears, select the check boxes for the columns into which you want to insert
new data.

778
tELTOracleOutput

In this use case, insert all the new customer data. Therefore, select all the check boxes by clicking
the Check All check box.
13. Fill the Default Table Name field with the name of the target table already existing in your
database. In this example, fill in customers_merge.
14. Leave the other parameters as they are.

Executing the Job


Procedure
1. Save the Job.
2. Click Run to execute the Job.
The data is updated and inserted in the database. The query used is displayed on the console.

779
tELTPostgresqlInput

tELTPostgresqlInput
Provides the Postgresql table schema that will be used by the tELTPostgresqlMap component to
generate the SQL SELECT statement.
The three ELT Postgresql components are closely related, in terms of their operating conditions.
These components should be used to handle Postgresql database table schemas to generate SQL
statements, including clauses, which are to be executed in the database output table defined.

tELTPostgresqlInput Standard properties


These properties are used to configure tELTPostgresqlInput running in the Standard Job framework.
The Standard tELTPostgresqlInput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the nature and
number of fields to be processed. The schema is either
built-in or remotely stored in the Repository. The Schema
defined is then passed on to the ELT Mapper to be included
to the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Default Table Name Enter the default table name, between double quotation
marks.

Default Schema Name Enter the default schema name, between double quotation
marks.

780
tELTPostgresqlInput

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTPostgresqlInput is to be used along with the


tELTPostgresqlMap. Note that the Output link to be used
with these components must correspond strictly to the
syntax of the table name

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

781
tELTPostgresqlMap

tELTPostgresqlMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTPostgresql
Input components.
The three ELT Postgresql components are closely related, in terms of their operating conditions.
These components should be used to handle Postgresql database table schemas to generate SQL
statements, including clauses, which are to be executed in the database output table defined.

tELTPostgresqlMap Standard properties


These properties are used to configure tELTPostgresqlMap running in the Standard Job framework.
The Standard tELTPostgresqlMap component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

ELT Postgresql Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.

Style link Select the way in which links are displayed.


Auto: By default, the links between the input and output
schemas and the Web service parameters are in the form of
curves.
Bezier curve: Links between the schema and the Web
service parameters are in the form of curve.
Line: Links between the schema and the Web service
parameters are in the form of straight lines.
This option slightly optimizes performance.

782
tELTPostgresqlMap

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the Repository file where Properties are


stored. The following fields are pre-filled in using fetched
data.

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTPostgresqlMap is used along with a tELTPostgresql


Input and tELTPostgresqlOutput. Note that the Output link
to be used with these components must correspond strictly
to the syntax of the table name.

783
tELTPostgresqlMap

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725

784
tELTPostgresqlOutput

tELTPostgresqlOutput
Performs the action (insert, update or delete) on data in the specified Postgresql table through the
SQL statement generated by the tELTPostgresqlMap component.
The three ELT Postgresql components are closely related, in terms of their operating conditions.
These components should be used to handle Postgresql database table schemas to generate SQL
statements, including clauses, which are to be executed in the database output table defined.

tELTPostgresqlOutput Standard properties


These properties are used to configure tELTPostgresqlOutput running in the Standard Job framework.
The Standard tELTPostgresqlOutput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Action on data On the data of the table defined, you can perform the
following operation:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.

Schema and Edit schema A schema is a row description, that is to say, it defines the
number of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

785
tELTPostgresqlOutput

Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.

Default Table Name Enter the default table name between double quotation
marks.

Default Schema Name Enter the default schema name between double quotation
marks

Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.

Use different table name Select this check box to enter a different output table name,
between double quotation marks, in the Table name field
which appears.

Mapping Specify the metadata mapping file for the database to


be used. The metadata mapping file is used for the data
type conversion between database and Java. For more
information about the metadata mapping, see the related
documentation for Type mapping.

Advanced settings

Use update statement without subqueries Select this option to generate an UPDATE statement for the
database.
This option is available when Update is selected from the
Action on data drop-down list in the Basic settings view.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

786
tELTPostgresqlOutput

Usage

Usage rule tELTPostgresqlOutput is to be used along with the


tELTPostgresqlMap. Note that the Output link to be used
with these components must correspond strictly to the
syntax of the table name.

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

787
tELTSybaseInput

tELTSybaseInput
Provides the Sybase table schema that will be used by the tELTSybaseMap component to generate the
SQL SELECT statement.
The three ELT Sybase components are closely related, in terms of their operating conditions. These
components should be used to handle Sybase database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTSybaseInput Standard properties


These properties are used to configure tELTSybaseInput running in the Standard Job framework.
The Standard tELTSybaseInput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the number and
nature of the fields to be processed. The schema is either
built-in (local) or stored remotely in the Repository. The
Schema defined is then passed on to the ELT Mapper for
inclusion in the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository. Hence, it can be re-used for other projects and
Jobs. Related topic: see Talend Studio User Guide.

Default Table Name Enter a default name for the table, between double
quotation marks.

Default Schema Name Enter a default name for the Sybase schema, between
double quotation marks.

788
tELTSybaseInput

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTSybaseInput is intended for use with tELTSybaseMap.


Note that the Output link to be used with these components
must correspond strictly to the syntax of the table name.

Note:
ELT components only handle schema information. They
do not handle actual data flow..

Limitation This component requires installation of its related jar files.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

789
tELTSybaseMap

tELTSybaseMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTSybaseInpu
t components.
The three ELT Sybase components are closely related, in terms of their operating conditions. These
components should be used to handle Sybase database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTSybaseMap Standard properties


These properties are used to configure tELTSybaseMap running in the Standard Job framework.
The Standard tELTSybaseMap component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

ELT Sybase Map Editor The ELT Map editor allows you to define the output schema
and make a graphical build of the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.

Style link Select the way in which links are displayed.


Auto: By default, the links between the input and output
schemas and the Web service parameters are in the form of
curves.
Bezier curve: Links between the schema and the Web
service parameters are in the form of curve.
Line: Links between the schema and the Web service
parameters are in the form of straight lines.
This option slightly optimizes performance.

790
tELTSybaseMap

Property type Can be either Built-in or Repository.

  Built-in : No property data is stored centrally.

  Repository : Select the Repository file where the component


properties are stored. The following fields are pre-filled
using collected data.

DB Version Select the version of the Sybase database to be used from


the drop-down list.

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTSybaseMap is intended for use with tELTSybaseInpu


t and tELTSybaseOutput. Note that the Output link to be
used with these components must correspond strictly to the
syntax of the table name.

Note:
The ELT components only handle schema information.
They do not handle actual data flow.

791
tELTSybaseMap

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation This component requires installation of its related jar files.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

792
tELTSybaseOutput

tELTSybaseOutput
Performs the action (insert, update or delete) on data in the specified Sybase table through the SQL
statement generated by the tELTSybaseMap component.
The three ELT Sybase components are closely related, in terms of their operating conditions. These
components should be used to handle Sybase database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTSybaseOutput Standard properties


These properties are used to configure tELTSybaseOutput running in the Standard Job framework.
The Standard tELTSybaseOutput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Action on data On the data of the table defined, you can perform the
following operation:
Insert: Add new entries to the table. If duplicates are found,
the Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.

Schema and Edit schema A schema is a row description, that is to say, it defines the
number and nature of the fields to be processed and passed
on to the next component. The schema is either Built-in
(local) or stored remotely in the Repository . The Schema
defined is then passed on to the ELT Mapper for inclusion in
the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository. Hence, it can be re-used for other projects and
Jobs. Related topic: see Talend Studio User Guide.

793
tELTSybaseOutput

Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.

Default Table Name Enter a default name for the table, between double
quotation marks.
Note that the table must exist already. If it does not exist,
you can use tCreateTable to create one first. For more
information about tCreateTable, see tCreateTable on page
540.

Default Schema Name Enter a default name for the Sybase schema, between
double quotation marks.

Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.

Use different table name Select this check box to enter a different output table name,
between double quotation marks, in the Table name field
which appears.

Mapping Specify the metadata mapping file for the database to


be used. The metadata mapping file is used for the data
type conversion between database and Java. For more
information about the metadata mapping, see the related
documentation for Type mapping.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

794
tELTSybaseOutput

Usage

Usage rule tELTSybaseOutput is intended for use with the


tELTMysqlInput and tELTSybaseMap components. Note
that the Output link to be used with these components must
correspond strictly to the syntax of the table name..

Note:
ELT components only handle schema information. They
do not handle actual data flow.

Limitation This component requires installation of its related jar files.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

795
tELTTeradataInput

tELTTeradataInput
Provides the Teradata table schema that will be used by the tELTTeradataMap component to generate
the SQL SELECT statement.
The three ELT Teradata components are closely related, in terms of their operating conditions. These
components should be used to handle Teradata database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTTeradataInput Standard properties


These properties are used to configure tELTTeradataInput running in the Standard Job framework.
The Standard tELTTeradataInput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, that is to say, it defines the
nature and number of fields to be processed. The schema
is either built-in or remotely stored in the Repository. The
Schema defined is then passed on to the ELT Mapper to be
included to the Insert SQL statement.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Default Table Name Enter a default name for the table, between double
quotation marks.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at component level.

796
tELTTeradataInput

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTTeradataInput is to be used along with the


tELTTeradataMap. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

797
tELTTeradataMap

tELTTeradataMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTTeradataIn
put components.
The three ELT Teradata components are closely related, in terms of their operating conditions. These
components should be used to handle Teradata database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTTeradataMap Standard properties


These properties are used to configure tELTTeradataMap running in the Standard Job framework.
The Standard tELTTeradataMap component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

ELT Teradata Map editor The ELT Map editor allows you to define the output schema
as well as build graphically the SQL statement to be
executed. The column names of schema can be different
from the column names in the database.

Style link Select the way in which links are displayed.


Auto: By default, the links between the input and output
schemas and the Web service parameters are in the form of
curves.
Bezier curve: Links between the schema and the Web
service parameters are in the form of curve.
Line: Links between the schema and the Web service
parameters are in the form of straight lines.
This option slightly optimizes performance.

798
tELTTeradataMap

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the Repository file where Properties are


stored. The following fields are pre-filled in using fetched
data.

Host Database server IP address

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Advanced settings

Query band Select this check box to use the Teradata Query Banding
feature to add metadata to the query to be processed,
such as the user running the query. This can help you, for
example, identify the origin of this query.
Once selecting the check box, the Query Band parameters
table is displayed, in which you need to enter the metadata
information to be added. This information takes the form of
key/value pairs, for example, DpID in the Key column and
Finance in the Value column.
This check box actually generates the SET QUERY_BAND
FOR SESSION statement with the key/value pairs declared
in the Query Band parameters table. For further information
about this statement, see https://docs.teradata.com/search/
all?query=End+logging+syntax.
This check box is not available when you have selected
the Using an existing connection check box. In this
situation, if you need to use the Query Band feature, set it
in the Advanced settings tab of the Teradata connection
component to be used.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

799
tELTTeradataMap

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule tELTTeradataMap is used along with a tELTTeradataInput


and tELTTeradataOutput. Note that the Output link to be
used with these components must faithfully reflect the
name of the tables.

Note:
The ELT components do not handle actual data flow but
only schema information.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Mapping data using a subquery


The sample Job described in this scenario maps the data from two input tables, PreferredSubject and
CourseScore, to the output table, TotalScoreOfPreferredSubject, using a subquery.

Prerequisite
Ensure that you have added an Oracle database connection in the Metadata > Db Connections section
prior to creating the Job. For more information, see the Centralizing database metadata section of the
Talend Data Integration Studio User Guide.

The Standard Job and the Prejob design


In this scenario, design the Standard Job such as the following:

800
tELTTeradataMap

Design the Prejob that includes the data in this scenario as follows:

The PreferredSubject table contains the student's preferred subject data. To reproduce this scenario,
you can load the following data to the Oracle table from a CSV file:

SeqID;StuName;Subject;Detail
1;Amanda;art;Amanda prefers art.
2;Ford;science;Ford prefers science.
3;Kate;art;Kate prefers art.

801
tELTTeradataMap

The CourseScore table contains the student's subject score data. To reproduce this scenario, you can
load the following data to the Oracle table from a CSV file:

SeqID;StuName;Subject;Course;Score;Detail
1;Amanda;science;math;85;science score
2;Amanda;science;physics;75;science score
3;Amanda;science;chemistry;80;science score
4;Amanda;art;chinese;85;art score
5;Amanda;art;history;95;art score
6;Amanda;art;geography;80;art score
7;Ford;science;math;95;science score
8;Ford;science;physics;85;science score
9;Ford;science;chemistry;80;science score
10;Ford;art;chinese;75;art score
11;Ford;art;history;80;art score
12;Ford;art;geography;85;art score
13;Kate;science;math;65;science score
14;Kate;science;physics;75;science score
15;Kate;science;chemistry;80;science score
16;Kate;art;chinese;85;art score
17;Kate;art;history;80;art score
18;Kate;art;geography;95;art score

Before the Job execution, the output table TotalScoreOfPreferredSubject does not contain any data:

SeqID;StuName;PreferredSubject;TotalScore

Creating the Prejob


Create the Prejob that contains the data that you wish to load to the Oracle table.
See the Prejob design image in The Standard Job and the Prejob design section.

Procedure
1. Create a Standard Job.
2. Add the following components:
• Prejob
• two tFixedFlowInput components
• two tOracleOutput components
• two tOracleInput components
• one tCreateTable component
• two tLogRow components
3. Configure the first tFixedFlowInput component:
a) Select the tFixedFlowInput component to display the Basic settings view.
b) Select Use Inline Content(delimited file) from the Mode options.
c) Add the following data to the Content field:

1;Amanda;art;Amanda prefers art.


2;Ford;science;Ford prefers science.
3;Kate;art;Kate prefers art.

d) Click ... next to the Edit Schema field to open the Schema Editor.
e) Add four columns with the following names and corresponding parameters:

802
tELTTeradataMap

4. Configure the second tFixedFlowInput component:


a) Repeat steps 3a and 3b.
b) Add the following data to the Content field:

1;Amanda;science;math;85;science score
2;Amanda;science;physics;75;science score
3;Amanda;science;chemistry;80;science score
4;Amanda;art;chinese;85;art score
5;Amanda;art;history;95;art score
6;Amanda;art;geography;80;art score
7;Ford;science;math;95;science score
8;Ford;science;physics;85;science score
9;Ford;science;chemistry;80;science score
10;Ford;art;chinese;75;art score
11;Ford;art;history;80;art score
12;Ford;art;geography;85;art score
13;Kate;science;math;65;science score
14;Kate;science;physics;75;science score
15;Kate;science;chemistry;80;science score
16;Kate;art;chinese;85;art score
17;Kate;art;history;80;art score
18;Kate;art;geography;95;art score

c) Click ... next to the Edit Schema field to open the Schema Editor.
d) Add six columns with the following names and corresponding parameters:

5. Select the first tOracleOutput component to open the Basic settings view.
a) Select Repository from the Property Type drop-down list.
b) Specify the Oracle database connection the you have previously added by clicking .... This
automatically populates the database information in the fields provided.
Repeat step 6 and steps 6a-6b to configure the second tOracleOutput component.
6. Select the tCreateTable component to open the Basic settings view.
a) Select Oracle from the Database Type drop-down list.

803
tELTTeradataMap

b) Select Repository from the Property Type drop-down list.


c) Specify the Oracle database connection that you have previously added by clicking .... This
automatically populates the database information in the fields provided.
d) Enter TotalScoreOfPreferredSubject in the Table Name field.
e) Select Drop table if exists and create from the Table Action drop-down list.
f) Click ... next to the Edit schema field to open the Schema editor.
g) Add four columns with the following corresponding names and parameters:

Adding the components


Procedure
1. Add the following components by typing their names in the design workspace or dropping them
from the Palette:
• twotELTOracleInput components
• two tELTOracleMap components
• one tELTOracleOutput component
• one tOracleInput component
• one tLogRow component
2. Rename the tELTOracleMap components to SubqueryMap and ELTMap.

Configuring the input components


Procedure
1. Select the first tELTOracleInput component to display the Basic settings tab.
2. Enter "PreferredSubject" in the Default Table Name field.
3. Click [...] next to Edit schema to define the schema of the input table PreferredSubject in the
schema editor.
4. Click [+] to add four columns:
• SeqID with the DB Type set to INTEGER
• StuName, Subject, and Detail with the DB Type set to VARCHAR

804
tELTTeradataMap

Click OK to validate these changes and close the schema editor.


5. Connect the first tELTOracleInput component to the second tELTOracleMap component using the
Link > PreferredSubject(Table).
6. Select the second tELTOracleInput component to display the Basic settings tab.
7. Enter "CourseScore" in the Default Table Name field.
8. Click [...] next to Edit schema to define the schema of the input table CourseScore in the schema
editor.
9. Click the [+] button to add six columns:
• SeqID and Score with the DB Type set to INTEGER
• StuName, Subject, Course, and Detail with the DB Type set to VARCHAR

Click OK to validate these changes and close the schema editor.


10. Connect the second tELTOracleInput component to the first tELTOracleMap component using the
Link > CourseScore(Table).

805
tELTTeradataMap

Configuring the output component


Procedure
1. Select the tELTOracleOutput component to display the Basic settings view.

2. Enter "TotalScore OfPreferredSubject" in the Default Table Name field.


3. Click [...] next to Edit schema to define the schema of the output table in the schema editor.
4. Click [+] to add four columns:
• SeqID and TotalScore with the DB Type set to INTEGER
• StuName and PreferredSubject with the DB Type set to VARCHAR

Click OK to validate these changes and close the schema editor.


5. Click Sync columns to sychronize the Input and Output tables of the tELTOracleOutput
component.

806
tELTTeradataMap

Configuring data mapping to generate a subquery


Procedure
1. Click the SubqueryMap component (next to the second tELTOracleInput) to open its Basic settings
view.

Note: Specify the Oracle database connection information in the second ELTMap component in
the Job.

2. Click [...] next to ELT Oracle Map Editor to open its map editor.

3. Add the input table CourseScore by clicking [+] in the upper left corner of the map editor and
then selecting the relevant table name from the drop-down list in the pop-up dialog box.
4. Add an output table by clicking [+] in the upper right corner of the map editor and then entering
the table name TotalScore in the corresponding field in the pop-up dialog box.
5. Drag StuName, Subject, and Score columns in the input table and then drop them to the output
table.
6. Click the Add filter row button in the upper right corner of the output table and select Add an
other(GROUP...) clause from the pop-up menu. Then in the Additional other clauses (GROUP/

807
tELTTeradataMap

ORDER BY...) field displayed, enter the clause GROUP BY CourseScore.StuName,


CourseScore.Subject.
Add the aggregate function SUM for the column Score of the output table by changing the
expression of this column to SUM(CourseScore.Score).
7. Click the Generated SQL Select query for 'table1' output tab at the bottom of the map editor to
display the corresponding generated SQL statement.

This SQL query will appear as a subquery in the SQL query generated by the ELTMap component.
8. Click OK to validate these changes and close the map editor.
9. Connect the first SubqueryMap to ELTMap using the Link > TotalScore (table1) link. Note that
the link is renamed automatically to TotalScore (Table_ref) since the output table TotalScore is a
reference table.

Mapping the input and output schemas


Procedure
1. Right-click ELTMap and select Link > *New Output* (Table) from the contextual menu.
2. Click TotalScoreOfPreferredSubject. In the pop-up dialog box, click Yes to get the schema from
the target component.
3. Click ELTMap to open its Basic settings view.
4. Select Repository from the Property Type drop-down list. Specify the Oracle database you
previously added to automatically propagate the database connection information.

5. Click [...] next to ELT Oracle Map Editor to open its map editor.
6. Add the input table PreferredSubject by clicking the [+] button in the upper left corner
of the map editor and selecting the relevant table name from the drop-down list in the pop-up
dialog box.
Repeat the step to add another input table TotalScore.

808
tELTTeradataMap

7. Drag the StuName column in the input table PreferredSubject and drop it to the corresponding
column in the input table TotalScore. Then select the Explicit join check box for the StuName
column in the input table TotalScore.
Repeat the step for the Subject column.
8. Drag the SeqID column in the input table PreferredSubject and drop it to the corresponding
column in the output table.
Repeat the step to drag the StuName and Subject columns in the input table PreferredSubject and
the Score column in the input table TotalScore and drop them to the corresponding column in the
output table.
9. Click the Generated SQL Select query for "table2" output tab at the bottom of the map editor to
display the corresponding generated SQL statement.

The SQL query generated in the SubqueryMap component appears as a subquery in the SQL query
generated by this component. Alias will be automatically added for the selected columns in the
subquery.
10. Click OK to validate these changes and close the map editor.

Executing the Job


Procedure
Click Run to execute the Job.

The select statement is generated and the mapping data are written into the output table.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

809
tELTTeradataOutput

tELTTeradataOutput
Performs the action (insert, update or delete) on data in the specified Teradata table through the SQL
statement generated by the tELTTeradataMap component.
The three ELT Teradata components are closely related, in terms of their operating conditions. These
components should be used to handle Teradata database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTTeradataOutput Standard properties


These properties are used to configure tELTTeradataOutput running in the Standard Job framework.
The Standard tELTTeradataOutput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Action on data On the data of the table defined, you can perform the
following operation:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry
flow.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

810
tELTTeradataOutput

Where clauses for (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operations.

Default Table Name Enter a default name for the table, between double
quotation marks.
Note that the table must exist already. If it does not exist,
you can use tCreateTable to create one first. For more
information about tCreateTable, see tCreateTable on page
540.

Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.

Use different table name Select this check box to enter a different output table name,
between double quotation marks, in the Table name field
which appears.

Mapping Specify the metadata mapping file for the database to


be used. The metadata mapping file is used for the data
type conversion between database and Java. For more
information about the metadata mapping, see the related
documentation for Type mapping.

Advanced settings

Clause SET Select the column names that will be used to generate the
SET clauses.
SET clauses will not be generated for the columns that are
not selected.
This field appears when Update is selected from the Action
on data drop-down list in the Basic settings view.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

811
tELTTeradataOutput

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule tELTTeradataOutput is to be used along with the


tELTTeradataMap. Note that the Output link to be used with
these components must correspond strictly to the syntax of
the table name.

Note:
Note that the ELT components do not handle actual data
flow but only schema information.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Aggregating Snowflake data using context variables as table and connection names on page 725

812
tELTVerticaInput

tELTVerticaInput
Provides the Vertica table schema that will be used by the tELTVerticaMap component to generate the
SQL SELECT statement.
The three ELT Vertica components are closely related, in terms of their operating conditions. These
components should be used to handle Vertica database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTVerticaInput Standard properties


These properties are used to configure tELTVerticaInput running in the Standard Job framework.
The Standard tELTVerticaInput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

  Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Default Table Name Type in the default table name.

Default Schema Name Type in the default schema name.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

813
tELTVerticaInput

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTVerticaInput is used along with tELTVerticaMap. Note


that the Output link to be used with these components must
correspond strictly to the syntax of the table name.

Note:
The ELT components do not handle actual data flow but
only schema information.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800
• Aggregating Snowflake data using context variables as table and connection names on page 725

814
tELTVerticaMap

tELTVerticaMap
Builds the SQL SELECT statement using the table schema(s) provided by one or more tELTVerticaInput
components.
The three ELT Vertica components are closely related, in terms of their operating conditions. These
components should be used to handle Vertica database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTVerticaMap Standard properties


These properties are used to configure tELTVerticaMap running in the Standard Job framework.
The Standard tELTVerticaMap component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

DB Version Select the version of the Vertica database being used.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

ELT Vertica Map Editor The ELT Map editor allows you to define the output schema
as well as build graphically the SQL statement to be
executed. The column names of the schema can be different
from the column names in the database.

Style link Select a way in which links are displayed.


• Auto: By default, the links between the input and
output schemas and the Web service parameters are in
the form of curves.
• Bezier curve: The links between the schema and the
Web service parameters are in the form of curve.
• Line (fastest): The links between the schema and the
Web service parameters are in the form of straight
lines.

815
tELTVerticaMap

This option slightly optimizes performance.

Property Type Either Built-In or Repository.

  Built-In: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The database connection fields that
follow are completed automatically using the data retrieved.

Host Type in the IP address or hostname of the database.

Port Type in the listening port number of the database.

Database Type in the name of the database you want to use.

Additional JDBC Parameters Specify additional connection properties for the database
connection you are creating.

Username and Password Type in the database user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tELTVerticaMap is used along with tELTVerticaInput and


tELTVerticaOutput. Note that the Output link to be used
with these components must correspond strictly to the
syntax of the table name.

816
tELTVerticaMap

Note:
The ELT components do not handle actual data flow but
only schema information.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800, a related scenario using subquery
• Aggregating Snowflake data using context variables as table and connection names on page 725

817
tELTVerticaOutput

tELTVerticaOutput
Performs the action (insert, update or delete) on data in the specified Vertica table through the SQL
statement generated by the tELTVerticaMap component.
The three ELT Vertica components are closely related, in terms of their operating conditions. These
components should be used to handle Vertica database table schemas to generate SQL statements,
including clauses, which are to be executed in the database output table defined.

tELTVerticaOutput Standard properties


These properties are used to configure tELTVerticaOutput running in the Standard Job framework.
The Standard tELTVerticaOutput component belongs to the ELT family.
The component in this framework is available in all Talend products.

Basic settings

Action on data On the data of the table defined, you can perform one of the
following operations:
• Insert: Add new entries to the table. If duplicates are
found, Job stops.
• Update: Updates entries in the table.
• Delete: Deletes entries which correspond to the entry
flow.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

Built-In: You create and store the schema locally for this
component only.

Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Sync columns Click this button to retrieve the schema from the previous
component connected in the Job.

818
tELTVerticaOutput

Where clauses (for UPDATE and DELETE only) Enter a clause to filter the data to be updated or deleted
during the update or delete operation.
This field is available only when Update or Delete is
selected from the Action on data drop-down list.

Default Table Name Type in the default table name.

Default Schema Name Type in the default schema name.

Table name from connection name is variable Select this check box when the name of the connection
to this component is set to a variable, such as a context
variable.

Use different table name Select this check box to use a different output table name.

Table name Type in the output table name.


This field is available only when the Use different table
name check box is selected.

Advanced settings

Direct Select this check box to write the data directly to disk,
bypassing memory.
This check box is not visible when the Set SQL Label check
box is selected.

Set SQL Label Select this check box and specify the label that identifies
the query. For more information, see How to label queries
for profiling.
This check box is not visible when the Direct check box is
selected.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

819
tELTVerticaOutput

Usage

Usage rule tELTVerticaOutput is used along with the tELTVerticaMap.


Note that the Output link to be used with these components
must correspond strictly to the syntax of the table name.

Note:
The ELT components do not handle actual data flow but
only schema information.

Related scenarios
• Aggregating table columns and filtering on page 745
• Mapping date using using an Alias table on page 749
• Mapping data using a subquery on page 800
• Aggregating Snowflake data using context variables as table and connection names on page 725

820
tESBConsumer

tESBConsumer
Calls the defined method from the invoked Web service and returns the class as defined, based on the
given parameters.

tESBConsumer Standard properties


These properties are used to configure tESBConsumer running in the Standard Job framework.
The Standard tESBConsumer component belongs to the ESB family.
The component in this framework is available in all Talend products.

Basic settings

Service configuration Description of Web service bindings and configuration. The


Endpoint field gets filled in automatically upon completion
of the service configuration.

Input Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Response Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.

821
tESBConsumer

• Change to built-in property: choose this option to


change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Fault Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Use Service Registry This option is only available if you subscribed to Talend
Enterprise ESB solutions.
Select this check box to enable the Service Registry. It
provides dynamic endpoint lookup and allows services to
be redirected based upon information retrieved from the
registry. It works in runtime only.
Enter the authentication credentials in the Username and
Password field.
If SAML token is registered in the service registry, you need
to specify the client's role in the Role field. You can also
select the Propagate Credentials check box to make the call
on behalf of an already authenticated user by propagating
the existing credentials. You can enter the username and
the password to authenticate via STS to propagate using
username and password, or provide the alias, username

822
tESBConsumer

and the password to propagate using certificate. For more


information, see the Use Authentication option. Select
the Encryption/Signature body check box to enable XML
Encryption/XML Signature. For more information, see the
chapter about XKMS Service in the Talend ESB Infrastructure
Services Configuration Guide.
In the Correlation Value field, specify a correlation ID or
leave this field empty. For more information, see the Use
Business Correlation option.
For more information about how to set up and use the
Service Registry, see the Talend Administration Center User
Guide and Talend ESB Infrastructure Services Configuration
Guide.

Use Service Locator Maintains the availability of the service to help meet
demands and service level agreements (SLAs).
This option will not show if the Use Service Registry check
box is selected.

 Use Service Activity Monitor Captures events and stores this information to facilitate
in-depth analysis of service activity and track-and-trace
of messages throughout a business transaction. This can
be used to analyze service response times, identify traffic
patterns, perform root cause analysis and more.
This option is disabled when the Use Service Registry check
box is selected if you subscribed to Talend Enterprise ESB
solutions.

 Use Authentication Select this check box to enable the authentication option.
Select from Basic HTTP, HTTP Digest, Username Token,
and SAML Token (ESB runtime only). Enter a username
and a password in the corresponding fields as required.
Authentication with Basic HTTP, HTTP Digest, and
Username Token work in both the studio and runtime.
Authentication with the SAML Token works in runtime only.
When SAML Token (ESB runtime only) is selected, you can
either provide the user credentials to send the request or
make the call on behalf of an already authenticated user by
propagating the existing credentials. Select from:
-: Enter the username and the password in the
corresponding fields to access the service.
Propagate using U/P: Enter the user name and the password
used to authenticate against STS.
Propagate using Certificate: Enter the alias and the
password used to authenticate against STS.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
This option will not show if the Use Service Registry check
box is selected.

Use Business Correlation Select this check box to create a correlation ID in this
component.
You can specify a correlation ID in the Correlation Value
field. In this case the correlation ID will be passed on to the
service it calls so that chained service calls will be grouped

823
tESBConsumer

under this correlation ID. If you leave this field empty, this
value will be generated automatically at runtime.
When this option is enabled, tESBConsumer will also extract
the correlation ID from the response header and store it in
the component variable for further use in the flow.
This option will be enabled automatically when the Use
Service Registry check box is selected.

Use GZip Compress Select this check box to compress the incoming messages
into GZip format before sending.

Die on error Select this check box to kill the Job when an error occurs.

Advanced settings

Log messages Select this check box to log the message exchange
between the service provider and the consumer.

Service Locator Custom Properties This table appears when Use Service Locator is selected.
You can add as many lines as needed in the table to
customize the relevant properties. Enter the name and the
value of each property between double quotation marks
in the Property Name field and the Property Value field r
espectively.

Service Activity Custom Properties This table appears when Use Service Activity Monitor is
selected. You can add as many lines as needed in the table
to customize the relevant properties. Enter the name and
the value of each property between double quotation marks
in the Property Name field and the Property Value field r
espectively.

Connection time out(second) Set a value in seconds for Web service connection time out.
This option only works in the studio. To use it after the
component is deployed in runtime:
1. Create a configuration file with the name
org.apache.cxf.http.conduits-
<endpoint_name>.cfg in the <TalendRuntime
Path>/container/etc/ folder.
2. Specify the url of the Web service and the
client.ConnectionTimeout parameter in
milliseconds in the configuration file. If you need
to use the Receive time out option, specify the
client.ReceiveTimeout in milliseconds too.
The url can be a full endpoint address or a regular
expression containing wild cards, for example:

url = http://localhost:8040/*
client.ConnectionTimeout=10000000
client.ReceiveTimeout=20000000

, in which http://localhost:8040/* matches all


urls starting with http://localhost:8040/.

Receive time out(second) Set a value in seconds for server answer.


This option only works in the studio. For how to use it after
the component is deployed in runtime, see the Connection
time out option.

824
tESBConsumer

Disable Chunking Select this check box to disable encoding the payload
as chunks. In general, chunking will perform better as
the streaming can take place directly. But sometimes the
payload is truncated with chunking enabled. If you are
getting strange errors when trying to interact with a service,
try turning off chunking to see if that helps.

Trust server with SSL/TrustStore file and TrustStore Select this check box to validate the server certificate to
password the client via an SSL protocol and fill in the corresponding
fields:
TrustStore file: Enter the path (including filename) to
the certificate TrustStore file that contains the list of
certificates that the client trusts.
TrustStore password: Enter the password used to check the
integrity of the TrustStore data.

Use http proxy/Proxy host, Proxy port, Proxy user, and Select this check box if you are using a proxy server and fill
Proxy password in the necessary information.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

HTTP Headers Click [+] as many times as required to add the name-value
pair(s) for HTTP headers to define the parameters of the
requested HTTP operation.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
CORRELATION_ID: the correlation ID by which chained
service calls will be grouped. This is a Flow variable and it
returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
HTTP_RESPONSE_CODE: HTTP response status code. This is
an After variable and it returns an Integer.
HTTP_HEADERS: the set of HTTP headers from response.
This is a Flow variable and it returns map object
java.util.Map<String, java.util.List<?>>.
Header name is represented by map key. Header values are
represented by java.util.List<?>.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

825
tESBConsumer

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule This component can be used as an intermediate component.


It requires to be linked to an output component.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to turn on or off the Use
Authentication or Use HTTP proxy option dynamically at
runtime. You can add two rows in the table to set both
options.
Once a dynamic parameter is defined, the corresponding
option becomes highlighted and unusable in the Basic
settings view or Advanced settings view.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation A JDK is required for this component to operate.

Using tESBConsumer to retrieve the valid email


This scenario describes a Job that uses a tESBConsumer component to retrieve the valid email.

Dropping and linking the components


Procedure
1. Drop the following components from the Palette onto the design workspace: a tFixedFlowInput, a
tESBConsumer, two tXMLMap, and two tLogRow components.
2. Right-click the tFixedFlowInput component, select Row > Main from the contextual menu and
click the first tXMLMap component.
3. Right-click the tXMLMap component, select Row > *New Output* (Main) from the contextual
menu and click the tESBConsumer component. Enter payload in the popup dialog box to name

826
tESBConsumer

this row and accept the propagation that prompts you to get the schema from the tESBConsumer
component.
4. Right-click the tESBConsumer component, select Row > Response from the contextual menu and
click the second tXMLMap component.
5. Right-click the second tXMLMap component, select Row > *New Output* (Main) from the
contextual menu and click the second tLogRow component. Enter response in the popup dialog
box to name this row.
6. Right-click the tESBConsumer component again, select Row > Fault from the contextual menu
and click the other tLogRow component.

Configuring the components


The tLogRow components will monitor the exchanges from the response and fault messages and does
not need any configuration. Press Ctrl+S to save your Job.

Configuring the tESBConsumer component

About this task


In this scenario, a public web service which is available at http://www.webservicex.net/V
alidateEmail.asmx will be called by the tESBConsumer component to returns true or false for an
email address. You can view the WSDL definition of the service at http://www.webservicex.net/V
alidateEmail.asmx?WSDL for the service description.

Procedure
1. In the design workspace, double-click the tESBConsumer component to open its Basic settings
view in the Component tab.

2. Click the three-dot button next to Service configuration.

827
tESBConsumer

3. In the dialog box that appears, type in: http://www.webservicex.net/ValidateEmail.asmx?WSDL in


the WSDL field and click the refresh button to retrieve port name and operation name. In the Port
Name list, select the port you want to use, ValidateEmailSoap in this example.
Select the Populate schema to repository on finish to retrieve the schema from the WSDL
definition, which will be used by the tFixedFlowInput component. This option is only available to
users of Talend Studio with ESB. If you don't have this option, please ignore it. The schema can
be created manually in the tFixedFlowInput component, which will be shown later.
Click Finish to validate your settings and close the dialog box.
4. Click the Advanced settings view in the Component tab.

5. Select the Log messages check box to show the exchange log in the execution console.

Configuring the tFixedFlowInput component

Procedure
1. Double-click the tFixedFlowInput component to open its Basic settings view in the Component
tab.

828
tESBConsumer

2. For users of Talend Studio with ESB who have retrieved the schema from the service WSDL
definition in the configuration of the tESBConsumer component, select Repository from the
Schema list. Then click the [...] of the next field to show the Repository Content dialog box. Select
the metadata under the IsValidEmail node to use it as the schema of the input message. Click OK
to close the dialog box.
For users of Talend Studio without ESB, please go to the next step.

829
tESBConsumer

3. For users of Talend Studio without ESB, the schema need to be created manually. Select Built-In
from the Schema list.

Click the three-dot button next to Edit Schema. In the schema dialog box, click the plus button to
add a new line of String type and name it Email. Click OK to close the dialog box.

4. In the Number of rows field, set the number of rows as 1.


5. In the Mode area, select Use Single Table and input the following request in double quotation
marks into the Value field:
[email protected]

Configuring the tXMLMap component in the input flow

About this task


Talend data integration uses schemas based on rows and columns since it has roots in relational data
warehouse integration. But SOAP messages uses the XML format. XML is hierarchical and supports
richer structure than rows or columns. So we need the tXMLMap to convert from the relational row/
column structure to the schema expected by the SOAP service.

830
tESBConsumer

Procedure
1. In the design workspace, double-click the tXMLMap component to open the Map Editor.
2. In the output table, right-click the root node and select Rename from the contextual menu. Enter
IsValidEmail in the dialog box that appears.
3. Right-click the IsValidEmail node and select Set A Namespace from the contextual menu. Enter
http://www.webservicex.net in the dialog box that appears.
4. Right-click the IsValidEmail node again and select Create Sub-Element from the contextual menu.
Enter Email in the dialog box that appears.
5. Right-click the Email node and select As loop element from the contextual menu.
6. Click the Email node in the input table and drop it to the Expression column in the row of the
Email node in the output table.

7. Click OK to validate the mapping and close the Map Editor.

Configuring the tXMLMap component in the output flow

About this task


The tXMLMap in the output flow will convert the response message from the XML format to the row/
column structure.

Procedure
1. In the design workspace, double-click the tXMLMap component in the output flow to open the
Map Editor.
2. In the input table, right-click the root node and select Rename from the contextual menu. Enter
IsValidEmailResponse in the dialog box that appears.
3. Right-click the IsValidEmailResponse node and select Set A Namespace from the contextual menu.
Enter http://www.webservicex.net in the dialog box that appears.

831
tESBConsumer

4. Right-click the IsValidEmailResponse node again and select Create Sub-Element from the cont
extual menu. Enter IsValidEmailResult in the dialog box that appears.
5. Right-click the IsValidEmailResult node and select As loop element from the contextual menu.
6. On the lower right part of the map editor , click [+] to add a row of String type to the output t
able and name it response.
7. Click the IsValidEmailResult node in the input table and drop it to the Expression column in the
row of the response node in the output table.

8. Click OK to validate the mapping and close the Map Editor.

Executing the Job


Click the Run view to display it and click the Run button to launch the execution of your Job. You can
also press F6 to execute it. In the execution log you will see:

832
tESBConsumer

The email address [email protected] is returned as false. The input and output SOAP
messages in XML are also shown in the console.

Using tESBConsumer with custom SOAP Headers


This scenario is similar to the previous one. It describes a Job that uses a tESBConsumer component to
retrieve a valid email address with custom SOAP headers in the request message.

833
tESBConsumer

Dropping and linking the components


Procedure
1. Drop the following components from the Palette onto the design workspace: a tESBConsumer, a
tMap, two tFixedFlowInput, three tXMLMap, and two tLogRow.
2. Connect each of the tFixedFlowInput with a tXMLMap using the Row > Main connection.
3. Right-click the first tXMLMap, select Row > *New Output* (Main) from the contextual menu and
click tMap. Enter payload in the popup dialog box to name this row.
Repeat this operation to connect another tXMLMap to tMap and name the output row header.
4. Right-click the tMap component, select Row > *New Output* (Main) from the contextual menu and
click the tESBConsumer component. Enter request in the popup dialog box to name this row and
accept the propagation that prompts you to get the schema from the tESBConsumer component.
5. Right-click the tESBConsumer component, select Row > Response from the contextual menu and
click the third tXMLMap component.
6. Right-click the third tXMLMap component, select Row > *New Output* (Main) from the contextual
menu and click one of the tLogRow components. Enter response in the popup dialog box to name
this row.
7. Right-click the tESBConsumer component again, select Row > Fault from the contextual menu
and click the other tLogRow component.

Configuring the components


The tLogRow components will monitor the exchanges from the response and fault messages and does
not need any configuration. Press Ctrl+S to save your Job.

Configuring the tESBConsumer component

About this task


In this scenario, a public web service which is available at http://www.webservicex.net/V
alidateEmail.asmx will be called by the tESBConsumer component to returns true or false for an
email address. You can view the WSDL definition of the service at http://www.webservicex.net/V
alidateEmail.asmx?WSDL for the service description.

834
tESBConsumer

Procedure
1. In the design workspace, double-click the tESBConsumer component to open its Basic settings
view in the Component tab.

2. Click the [...] button next to Service configuration.

3. In the dialog box that appears, type in: http://www.webservicex.net/ValidateEmail.asmx?WSDL in


the WSDL field and click the refresh button to retrieve port name and operation name. In the Port
Name list, select the port you want to use, ValidateEmailSoap in this example. Click OK to validate
your settings and close the dialog box.
Select the Populate schema to repository on finish to retrieve the schema from the WSDL
definition, which will be used by the tFixedFlowInput component. This option is only available to
users of Talend Studio with ESB. If you don't have this option, please ignore it. The schema can
be created manually in the tFixedFlowInput component, which will be shown later.

835
tESBConsumer

Click Finish to validate your settings and close the dialog box.
4. In the Advanced settings view, select the Log messages check box to log the content of the
messages.

Configuring the tFixedFlowInput components

Procedure
1. Double-click the first tFixedFlowInput component to open its Basic settings view in the
Component tab.

2. For users of Talend Studio with ESB who have retrieved the schema from the service WSDL
definition in the configuration of the tESBConsumer component, select Repository from the
Schema list. Then click the [...] of the next field to show the Repository Content dialog box. Select
the metadata under the IsValidEmail node to use it as the schema of the input message. Click OK
to close the dialog box.
For users of Talend Studio without ESB, please go to the next step.

836
tESBConsumer

3. For users of Talend Studio without ESB, the schema need to be created manually. Select Built-In
from the Schema list.

Click the [...] button next to Edit Schema. In the schema dialog box, click the [+] button to add a
new line of String type and name it Email. Click OK to close the dialog box.

837
tESBConsumer

4. In the Number of rows field, set the number of rows as 1.


5. In the Mode area, select Use Single Table and enter "[email protected]" into the Value
field, which is the payload of the request message.
6. Configure the second tFixedFlowInput as the first one, except for its schema.
Add two rows of String type to the schema and name them id and company respectively.

Give the value Hello world! to id and Talend to company, which are the headers of the request
message.

838
tESBConsumer

Configuring the tXMLMap components in the input flow

About this task


Talend data integration uses schemas based on rows and columns since it has roots in relational data
warehouse integration. But SOAP messages uses the XML format. XML is hierarchical and supports
richer structure than rows or columns. So we need the tXMLMap to convert from the relational row/
column structure to the schema expected by the SOAP service.

Procedure
1. In the design workspace, double-click the first tXMLMap component to open the Map Editor.
2. In the output table, right-click the root node and select Rename from the contextual menu. Enter
IsValidEmail in the dialog box that appears.
3. Right-click the IsValidEmail node and select Set A Namespace from the contextual menu. Enter
http://www.webservicex.net in the dialog box that appears.
4. Right-click the IsValidEmail node again and select Create Sub-Element from the contextual menu.
Enter Email in the dialog box that appears.
5. Right-click the Email node and select As loop element from the contextual menu.
6. Click the Email node in the input table and drop it to the Expression column in the row of the
Email node in the output table.

839
tESBConsumer

7. Click OK to validate the mapping and close the Map Editor.


8. Configure the other tXMLMap in the same way. Add a row of Document type to the output table
and name it header. Create two sub-elements to it, id and company. Map the id and the company
nodes in the input table to the corresponding nodes in the output table.

840
tESBConsumer

Configuring the tMap component

Procedure
1. In the design workspace, double-click tMap to open the Map Editor.

2. On the lower right part of the map editor, click [+] to add two rows of Document type to the outp
ut table and name them payload and headers respectively.
3. Click the payload node in the input table and drop it to the Expression column in the row of the
payload node in the output table.
4. Click the header node in the input table and drop it to the Expression column in the row of the
headers node in the output table.

Configuring the tXMLMap component in the output flow

About this task


The tXMLMap in the output flow will convert the response message from the XML format to the row/
column structure.

Procedure
1. In the design workspace, double-click the tXMLMap component in the output flow to open the
Map Editor.
2. In the input table, right-click the root node and select Rename from the contextual menu. Enter
IsValidEmailResponse in the dialog box that appears.
3. Right-click the IsValidEmailResponse node and select Set A Namespace from the contextual menu.
Enter http://www.webservicex.net in the dialog box that appears.

841
tESBConsumer

4. Right-click the IsValidEmailResponse node again and select Create Sub-Element from the cont
extual menu. Enter IsValidEmailResult in the dialog box that appears.
5. Right-click the IsValidEmailResult node and select As loop element from the contextual menu.
6. On the lower right part of the map editor, click [+] to add a row of String type to the output t
able and name it response.
7. Click the IsValidEmailResult node in the input table and drop it to the Expression column in the
row of the response node in the output table.

8. Click OK to validate the mapping and close the Map Editor.

Executing the Job


Click the Run view to display it and click the Run button to launch the execution of your Job. You can
also press F6 to execute it.

842
tESBConsumer

As shown in the execution log, the email address [email protected] is returned as false. The
input and output SOAP messages in XML is also shown in the console. The SOAP header is sent with
the request to the service.

843
tESBProviderFault

tESBProviderFault
Serves a Talend Job cycle result as a Fault message of the Web service in case of a request response
communication style.
It acts as Fault message of the Web Service response at the end of a Talend Job cycle.

tESBProviderFault Standard properties


These properties are used to configure tESBProviderFault running in the Standard Job framework.
The Standard tESBProviderFault component belongs to the ESB family.
This component is relevant only when used with one of the Talend solutions with ESB, as it should be
used with the Service Repository node and the Data Service creation related wizard(s).

Basic settings

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

EBS service settings Fault title: Value of the faultString column in the Fault
message.

Note:
The Row > Fault flow of tESBConsumer has a pre-defined
schema whose column, faultString, is filled up with the
content of the field Fault title of tESBProviderFault.

844
tESBProviderFault

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component should only be used with the


tESBProviderRequest component.

Limitation A JDK is required for this component to operate.

Requesting airport names based on country codes


This scenario applies only to Talend Open Studio for ESB, Talend Data Services Platform and Talend
Data Fabric.
This scenario involves two Jobs, one as the data service provider and the other as data service
consumer. The former listens to the requests from the consumer via tESBProviderRequest, matches
the country code wrapped in the request against a MySQL database table that has the country code/
airport pairs via tMap, and finally returns the correct airport name via tESBProviderResponse or if
no matches are found, the error message via tESBProviderFault. The Consumer sends requests to the
Provider and receives the airport information or error reminders via tESBConsumer.

Building the data service provider to publish a service


The data service airport has already been defined under the Services node of the Repository tree view.
Its schema has three major elements as shown below:

845
tESBProviderFault

For how to define a service in the Studio, see Talend Studio User Guide.

Assigning a Job to the defined service

Procedure
1. Right-click getAirportInformationByISOCountryCode under the Web service airport and from the
contextual menu, select Assign Job.
2. In the Operation Choice window, select Create a new Job and Assign it to this Service Operation.

3. Click Next to open the Job description window. The Job name airportSoap_getAirportInform
ationByISOCountryCode is automatically filled in.

846
tESBProviderFault

4. Click Finish to create the Job and open it in the workspace. Three components are already
available.

Adding components to arrange the data flow

Procedure
1. Drop tXMLMap and tMysqlInput from the Palette to the workspace.
2. Link tESBProviderRequest to tXMLMap using a Row > Main connection.
3. Link tMysqlInput to tXMLMap using a Row > Main connection.
4. Link tXMLMap to tESBProviderResponse using a Row > *New Output*(Main) connection.
In the new Output name pop-up window, enter the output table name, airport_response.
Click OK in the pop-up window that asks whether to get the schema of the target component.

847
tESBProviderFault

5. Link tXMLMap to tESBProviderFault using a Row > *New Output*(Main) connection.


In the new Output name pop-up window, enter the output table name, fault_message.
Click OK in the pop-up window that asks whether to get the schema of the target component.

Configuring how requests are processed

Procedure
1. Double-click tMysqlInput to display its Basic settings view.

2. Fill up the basic settings for the Mysql connection and database table.
Click the [...] button to open the schema editor.

848
tESBProviderFault

3. Click the [+] button to add two columns, id and name, with the type of string.
Click OK to close the editor.
Click Guess Query to retrieve the SQL query.
4. Double-click tXMLMap to open its mapper.

5. In the main : row1 table of the input flow side (left), right-click the column name payload and from
the contextual menu, select Import from Repository. Then the Metadata wizard is opened.

849
tESBProviderFault

Select the schema of the request message and click OK to validate this selection. In this example,
the schema is getAirportInformationByISOCountryCode.
6. Do the same to import the hierarchical schemas for the response/fault messages (right). In this
example, these schemas are getAirportInformationByISOCountryCodeResponse and getAirportInfo
rmationByISOCountryCodeFault respectively.
7. Then to create the join to the lookup data, drop the CountryAbbrviation node from the main flow
onto the id column of the lookup flow.
8. On the lookup flow table, click the wrench icon on the upper right corner to open the setting
panel.
Set Lookup Model as Reload at each row, Match Model as All matches and Join Model as Inner
join.
9. On the airport_response output flow table, click the wrench icon on the upper right corner to open
the setting panel.
Set the All in one option as true. This ensures that only one response is returned for each request
if multiple airport matches are found in the database.
10. On the fault_message output flow table, click the wrench icon on the upper right corner to open
the setting panel.
Set the Catch Lookup Inner Join Reject option as true to monitor the mismatches between the
country code in the request and the records in the database table. Once such a situation occurs, a
fault message will be generated by tESBConsumer and outputted via its Row > Fault flow.

850
tESBProviderFault

Note:
The Row > Fault flow of tESBConsumer has a predefined schema in which the faultString
column is filled with the content of the field Fault title of tESBProviderFault.

11. Drop the name column in the lookup flow onto the Expression area next to the tns:getAirport
InformationByISOCountryCodeResult node in the airport_response output flow.
Drop the tns:CountryAbbreviation node in the main flow onto the Expression area next to the
tns:getAirportInformationByISOCountryCodeFaultString node in the fault_message output flow. This
way, the incorrect country code in the request will be shown in the faultDetail column of the Row
> Fault flow of tESBConsumer.
Click OK to close the editor and validate this configuration.
12. Double-click tESBProviderFault to display its Basic settings view:

13. In the field Fault title, enter the context variable context.fault_message.
For how to define context variables, see Talend Studio User Guide.

Publishing the service to listen to requests

Procedure
1. Press Ctrl +S to save the Job.
2. Press F6 to run this Job.

Results
The data service is published and will listen to all the requests until you click the Kill button to stop it
as by default, the Keep listening option of tESBProviderRequest is selected automatically.
Now is the time to configure the consumer Job that interacts with the data service.

851
tESBProviderFault

Building the data service consumer to request the service


Built upon tESBConsumer, the consumer Job sends two requests that contain the country codes to the
Web service for the relevant airport names. If wrong country code is wrapped in the request, the error
message will be returned. The country codes and the MySQL database records are as follows:

Dropping and linking the components

Procedure
1. Drop a tFileInputDelimited, a tXMLMap, a tESBConsumer and two tLogRow from the Palette to
the workspace.
2. Rename one tLogRow as response and the other as fault_message.
3. Link tFileInputDelimited to tXMLMap using a Row > Main connection.
4. Link tXMLMap to tESBConsumer using a Row > *New Output*(Main) connection.
In the new Output name pop-up window, enter the output table name, for example request.
Click OK in the pop-up window that asks whether to get the schema of the target component.
5. Link tESBConsumer to response using the Row > Response connection.
6. Link tESBConsumer to fault_message using the Row > Fault connection.

Configuring the components

Procedure
1. Double-click tFileInputDelimited to open its Basic settings view.

852
tESBProviderFault

2. In the File name/stream field, enter the context variable for the file that has the country codes,
context.filepath.
3. Click the [...] button to open the schema editor.

4. Click the [+] button to add a column, country_code, for example, with the type of string.
Click OK to close the editor.
5. Double-click tXMLMap to open its Map editor.

853
tESBProviderFault

6. In the request table of the output flow side, right-click the column name payload and from the
contextual menu, select Import from Repository. Then the Metadata wizard is opened.

Select the schema of the request message and click OK to validate this selection. In this example,
the schema is getAirportInformationByISOCountryCode.
7. Drop the country_code column in the main flow onto the Expression area next to the
tns:CountryAbbreviation node in the request output flow.
Click OK to close the editor and validate this configuration.
8. Double-click tESBConsumer to open its service configuration wizard:

854
tESBProviderFault

9. Click the Browse... button to select the desired WSDL file. The Port name and Operation are
automatically filled up once the WSDL file is selected.
Click OK to close the wizard.
10. Double-click response to open its Basic settings view:

11. Select Vertical (each row is a key/value list) and then Print label for a better view of the results.
Do the same to the other tLogRow, fault_message.

Executing the Job

Procedure
1. Press Ctrl +S to save the Job.
2. Press F6 to run this Job.

855
tESBProviderFault

As shown above, two messages are returned, one giving the airport name that matches the
country code CN and the other giving the error details caused by the country code CC.

856
tESBProviderRequest

tESBProviderRequest
Wraps Talend Job as web service.
It waits for a request message from a consumer and passes it to the next component.

tESBProviderRequest Standard properties


These properties are used to configure tESBProviderRequest running in the Standard Job framework.
The Standard tESBProviderRequest component belongs to the ESB family.
This component is relevant only when used with one of the Talend solutions with ESB, as it should be
used with the Service Repository node and the Data Service creation related wizard(s).

Basic settings

Property Type Either Built-in or Repository .

  Built-in: No WSDL file is configured for the Job.

  Repository: Select the desired web service from the


Repository, to the granularity of the port name and
operation.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema is created and stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Keep listening Check this box when you want to ensure that the provider
(and therefore Talend Job) will continue listening for
requests after processing the first incoming request.

857
tESBProviderRequest

Advanced settings

Log messages (Studio only) Select this check box to log the message exchange
between the service provider and the consumer. This option
works in the Studio only.

Response timeout, sec Specify the time limit in seconds for sending response to
the consumer. This parameter is necessary to avoid locking
of message exchanges.

Request processing queue size Specify the maximum number of received requests that
can be processed in parallel by the components between
tESBProviderRequest and tESBProviderResponse. Note that
this parameter is different from the queueSize in the
<TalendRuntimePath>\etc\org.apache.cxf.wor
kqueues-default.cfg which defines pool configuration
for incoming requests on CXF level.

Request processing timeout, sec Specify the time limit in seconds for requests to be
processed by the components between the tESBProviderRe
quest and the tESBProviderResponse.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
CORRELATION_ID: the correlation ID by which chained
service calls will be grouped. This is a Flow variable and it
returns a string.
SECURITY_TOKEN: the user identity information in the
request header. This is a Flow variable and it returns an
XML node.
HEADERS_SOAP: the headers of the SOAP request. This is a
Flow variable and it returns all SOAP request headers.
HEADERS_HTTP: the headers of the HTTP request. This is a
Flow variable and it returns all HTTP request headers.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

858
tESBProviderRequest

Usage

Usage rule This component covers the possibility that a Talend Job can
be wrapped as a service, with the ability to input a request
to a service into a Job and return the Job result as a service
response.
The tESBProviderResponse component can both deliver the
payload of a SOAP message and also access the HTTP and
SOAP headers of a service.
The tESBProviderRequest component should be used
with the tESBProviderResponse component to provide
a Job result as a response, in case of a request-response
communication style.
When the SAML Token or the Service Registry is enabled in
the service runtime options and if the SAML Token exists
in the request header, the tESBProviderRequest compo
nent will get and store the SAML Token in the component
variable for further use in the flow.
The tESBProviderRequest component will get the
Correlation Value in the request header if it exists and
stored it in the component variable. When the Business
Correlation or the Service Registry is enabled in the service
runtime options, the Correlation Value will also be added to
the response. In this case, tESBProviderRequest will create a
Correlation Value if it does not exist.
Note that the Service Registry option is only available if you
subscribed to Talend Enterprise ESB solutions. For more
information about how to set the runtime options, see the
corresponding section in the Talend Studio User Guide.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to turn on or off the Keep
listening option dynamically at runtime.
When a dynamic parameter is defined, the corresponding
Keep listening option in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation A JDK is required for this component to operate.

Sending a message without expecting a response


This scenario applies only to Talend Open Studio for ESB, Talend Data Services Platform and Talend
Data Fabric.
The Jobs, which are built upon the components under the ESB/Web Services family, act as the
implementations of web services defined in the Services node of the Repository. They require the
creation of and association with relevant services. For more information about services, see the
related topics in the Talend Studio User Guide.

859
tESBProviderRequest

In this scenario, a provider Job and a consumer Job are needed. In the meantime, the related
service should already exist in the Services node, with the WSDL URI being http://127.0.0.1.8088/
esb/provider/?WSDL, the port name being TEST_ProviderJobSoapBinding and the operation being
invoke(anyType):anyType.
The provider Job consists of a tESBProviderRequest, a tXMLMap, and two tLogRow components.

• Drop the following components from the Palette onto the design workspace: a tESBProviderRe
quest, a tXMLMap, and two tLogRow.
• Double-click tESBProviderRequest_1 in the design workspace to display its Component view and
set its Basic settings.

• Select Repository from the Property Type list and click the three-dot button to choose the service,
to the granularity of port name and operation.

• Click OK.
• Click the three-dot button next to Edit schema to view the schema of tESBProviderRequest_1.

860
tESBProviderRequest

• Click OK.
• Connect tESBProviderRequest_1 to tLogRow_1.
• Double-click tLogRow_1 in the design workspace to display its Component view and set its Basic
settings.

• Click the three-dot button next to Edit schema. and define the schema as follow.

• Connect tLogRow_1 to tXMLMap_1.


• Connect tXMLMap_1 to tLogRow_2 and name this row as payload.
• In the design workspace, double-click tXMLMap_1 to open the Map Editor.

861
tESBProviderRequest

• On the lower right part of the map editor, click the plus button to add one row to the payload
table and name this row as payload.
• In the Type column of this payload row, select Document as the data type. The corresponding XML
root is added automatically to the top table on the right side which represents the output flow.
• In the payload table, right-click root to open the contextual menu.
• From the contextual menu, select Create Sub-Element and type in response in the popup dialog
box.
• Right-click the response node and select As loop element from the contextual menu.
• Repeat this operation to create a sub-element request of the root node in the input table and set
the request node as loop element.
• Click the request node in the input table and drop it to the Expression column in the row of the
response node in the output table.

• Click OK to validate the mapping and close the map editor.


• Double-click tLogRow_2 in the design workspace to display its Component view and set its Basic
settings.

862
tESBProviderRequest

• Click the three-dot button next to Edit Schema and define the schema as follow.

• Save the Job.


The consumer Job consists of a tFixedFlowInput, a tXMLMap, a tESBConsumer, and two tLogRow
components.

• Drop the following components from the Palette onto the design workspace: a tFixedFlowInput, a
tXMLMap, a tESBConsumer, and two tLogRow.
• Double-click tFixedFlowInput_1 in the design workspace to display its Component view and set its
Basic settings.

863
tESBProviderRequest

• Edit the schema of the tFixedFlowInput_1 component.

• Click the plus button to add a new line of string type and name it payloadString.
• Click OK.
• In the Number of rows field, set the number of rows as 1.
• In the Mode area, select Use Single Table and input world in quotations into the Value field.
• Connect tFixedFlowInput_1 to tXMLMap_1.
• Connect tXMLMap_1 to tESBConsumer_1 and name this row as payload.
• In the design workspace, double-click tXMLMap_1 to open the Map Editor.
• In the output table, right-click the root node to open the contextual menu.
• From the contextual menu, select Create Sub-Element and type in request in the popup dialog
box.
• Right-click the request node and select As loop element from the contextual menu.
• Click the payloadstring node in the input table and drop it to the Expression column in the row of
the request node in the output table.

864
tESBProviderRequest

• Click OK to validate the mapping and close the Map Editor.


• Start the Provider Job. In the executing log you can see:

...
web service [endpoint: http://127.0.0.1:8088/esb/provider] published
...

• In the tESBConsumer_1 Component view, set its Basic settings.

• Click the three-dot button next to the Service Configuration to open the editor.

865
tESBProviderRequest

• In the WSDL field, type in: http://127.0.0.1:8088/esb/provider?WSDL.


• Click the Refresh button to retrieve port name and operation name.
• Click OK.
• In the Basic settings of the tESBConsumer, set the Input Schema as follow:

• Set the Response Schema as follow:

• Set the Fault Schema as follow:

• Connect tESBConsumer_1 to tLogRow_1 and tLogRow_2.


• In the design workspace, double-click the tLogRow_1 component to display its Component view
and set its Basic settings.

866
tESBProviderRequest

• Click the three-dot button next to Edit Schema and define the schema as follow:

• In the Job Design, double-click tLogRow_2 to display its Component view and set its Basic
settings.

• Click the three-dot button next to Edit Schema and define the schema as follow.

• Save the Job.

867
tESBProviderRequest

• Run the provider Job. In the execution log you will see:

INFO: Setting the server's publish address to be http://127.0.0.1:8088/esb/provider


2011-04-21 14:14:36.793:INFO::jetty-7.2.2.v20101205
2011-04-21 14:14:37.856:INFO::Started
[email protected]:8088
web service [endpoint: http://127.0.0.1:8088/esb/provider] published
• Run the consumer Job. In the execution log of the Job you will see:

Starting job CallProvider at 14:15 21/04/2011.

[statistics] connecting to socket on port 3942


[statistics] connected
TEST_ESBProvider2
TEST_ESBProvider2SoapBingding
|
[tLogRow_2] payloadString: <request>world</request>
{http://talend.org/esb/service/job}TEST_ESBProvider2
{http://talend.org/esb/service/job}TEST_ESBProvider2SoapBinding
invoke
[tLogRow_1] payload: null
[statistics] disconnected
Job CallProvider2 ended at 14:16 21/04/2011. [exit code=0]

• In the provider's log you will see the trace log:

web service [endpoint: http://127.0.0.1:8088/esb/provider]


published
[tLogRow_1] payload: <?xml version="1.0" encoding="UTF-8"?>
<request>world</request>
### world
[tLogRow_2] content: world
[tLogRow_3] payload: <?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://talend.org/esb/service/job">Hello, world!</response>
web service [endpoint: http://127.0.0.1:8088/esb/provider] unpublished
[statistics] disconnected
Job ESBProvider2 ended at 14:16 21/04/2011. [exit code=0]

868
tESBProviderResponse

tESBProviderResponse
Serves a Talend Job cycle result as a response message.
It acts as a service provider response builder at the end of each Talend Job cycle.

tESBProviderResponse Standard properties


These properties are used to configure tESBProviderResponse running in the Standard Job framework.
The Standard tESBProviderResponse component belongs to the ESB family.
This component is relevant only when used with one of the Talend solutions with ESB, as it should be
used with the Service Repository node and the Data Service creation related wizard(s).

Basic settings

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

869
tESBProviderResponse

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule The tESBProviderResponse component should only be used


with the tESBProviderRequest component to provide a Job
result as response for a web service provider, in case of a
request-response communication style.

Limitation A JDK is required for this component to operate.

Returning Hello world response


This scenario applies only to Talend Open Studio for ESB, Talend Data Services Platform and Talend
Data Fabric.
The Jobs, which are built upon the components under the ESB/Web Services family, act as the
implementations of web services defined in the Services node of the Repository. They require the
creation of and association with relevant services. For more information about services, see the
related topics in the Talend Studio User Guide.
In this scenario, a provider Job and a consumer Job are needed. In the meantime, the related
service should already exist in the Services node, with the WSDL URI being http://127.0.0.1.8088/
esb/provider/?WSDL, the port name being TEST_ProviderJobSoapBinding and the operation being
invoke(anyType):anyType.
The provider Job consists of a tESBProviderRequest, a tESBProviderResponse, a tXMLMap, and two
tLogRow components.

870
tESBProviderResponse

• Drop the following components from the Palette onto the design workspace: a tESBProviderRe
quest, a tESBProviderResponse, a tXMLMap, and two tLogRow.
• In the design workspace, double-click tESBProviderRequest_1 to display its Component view and
set its Basic settings.

• Select Repository from the Property Type list and click the three-dot button to choose the service,
to the granularity of port name and operation.

• Click OK.
• Click the three-dot button next to Edit schema to view its schema.

871
tESBProviderResponse

• Connect tESBProviderRequest_1 to tLogRow_1.


• Double-click tLogRow_1 to display its Component view and set its Basic settings.

• Click the three-dot button next to Edit schema and define the schema as follow.

• Connect tLogRow_1 to tXMLMap_1.


• Connect tXMLMap_1 to tLogRow_2 and name this row as payload.
• In the design workspace, double-click tXMLMap_1 to open the Map Editor.
• On the lower right part of the map editor, click the plus button to add one row to the payload
table and name this row as payload.

872
tESBProviderResponse

• In the Type column of this payload row, select Document as the data type. The corresponding XML
root is added automatically to the top table on the right side which represents the output flow.
• In the payload table, right-click root to open the contextual menu.
• From the contextual menu, select Create Sub-Element and type in response in the popup dialog
box.
• Right-click the response node and select As loop element from the contextual menu.
• Repeat this operation to create a sub-element request of the root node in the input table and set
the request node as loop element.
• Click the request node in the input table and drop it to the Expression column in the row of the
response node in the output table.

• Click OK to validate the mapping and close the map editor.


• In the design workspace, double-click tLogRow_2 to display its Component view and set its Basic
settings.

873
tESBProviderResponse

• Click the three-dot button next to Edit schema and define the schema as follow.

• Connect tLogRow_2 to tESBProviderResponse_1.


• In the design workspace, double-click tESBProviderResponse_1 to open its Component view and
set its Basic settings.

• Click the three-dot button next to Edit schema and define the schema as follow.

• Save the provider Job.


The consumer Job consists of a tFixedFlowInput, a tXMLMap, a tESBConsumer, and two tLogRow
components.

874
tESBProviderResponse

• Drop the following components from the Palette onto the design workspace: a tFixedFlowInput, a
tXMLMap, a tESBConsumer, and two tLogRow.
• Double-click tFixedFlowInput_1 in the design workspace to display its Component view and set its
Basic settings.

• Click the three-dot button next to Edit schema.

• Click the plus button to add a new line of string type and name it payloadString.
• Click OK.

875
tESBProviderResponse

• In the Number of rows field, set the number of rows as 1.


• In the Mode area, select Use Single Table and input world in quotations into the Value field.
• Connect tFixedFlowInput to tXMLMap.
• Connect tXMLMap to tESBConsumer and name this row as payload.
• In the design workspace, double-click tXMLMap_1 to open the Map Editor.
• In the payload table, right-click root to open the contextual menu.
• From the contextual menu, select Create Sub-Element and type in request in the popup dialog
box.
• Right-click the request node and select As loop element from the contextual menu.
• Click the payloadstring node in the input table and drop it to the Expression column in the row of
the request node in the output table.

• Click OK to validate the mapping and close the Map Editor.


• Start the Provider Job. In the executing log you can see:

...
web service [endpoint: http://127.0.0.1:8088/esb/provider] published
...

• In the tESBConsumer_1 Component view, set its Basic settings.

876
tESBProviderResponse

• Click the three-dot button next to the Service Configuration to open the editor.

• In the WSDL field, type in: http://127.0.0.1:8088/esb/provider/?WSDL


• Click the Refresh button to retrieve port name and operation name.
• Click OK.
• In the Basic settings of the tESBConsumer, set the Input Schema as follows:

• Set the Response Schema as follows:

877
tESBProviderResponse

• Set the Fault Schema as follows:

• Connect tESBConsumer_1 to tLogRow_1 and tLogRow_2.


• In the design workspace, double-click tLogRow_1 to display its Component view and set its Basic
settings.

• Click the three-dot button next to Edit Schema and define the schema as follow.

• In the Job Design, double-click tLogRow_2 to display its Component view and set its Basic
settings.

878
tESBProviderResponse

• Click the three-dot button next to Edit Schema and define the schema as follow:

• Save the consumer Job.


• Run the provider Job. In the execution log you will see:

2011-04-21 15:28:26.874:INFO::jetty-7.2.2.v20101205
2011-04-21 15:28:27.108:INFO::Started
[email protected]:8088
web service [endpoint: http://127.0.0.1:8088/esb/provider] published
• Run the consumer Job. In the execution log of the Job you will see:

Starting job CallProvider at 15:29 21/04/2011.

[statistics] connecting to socket on port 3690


[statistics] connected
TEST_ProviderJob
TEST_ProviderJobSoapBingding
|
{http://talend.org/esb/service/job}TEST_ProviderJob
{http://talend.org/esb/service/job}TEST_ProviderJobSoapBinding
invoke
[tLogRow_2] payload: <?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://talend.org/esb/service/job">Hello, world!</response>
[statistics] disconnected
Job ConsumerJob ended at 15:29 21/04/2011. [exit code=0]

• In the provider's log you will see the trace log:

879
tESBProviderResponse

[tLogRow_1] payload: <?xml version="1.0" encoding="UTF-8"?>


<request>world</request>
### world
[tLogRow_2] content: world
[tLogRow_3] payload: <?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://talend.org/esb/service/job">Hello, world!</response>
web service [endpoint: http://127.0.0.1:8088/esb/provider] unpublished
[statistics] disconnected
Job ProviderJob ended at 15:29 21/04/2011. [exit code=0]

880
tEXABulkExec

tEXABulkExec
Imports data into an EXASolution database table using the IMPORT command provided by the
EXASolution database in a fast way.
The import will be cancelled after a configurable number of records fail to import. Erroneous records
can be sent to a log table in the same database or to a local log file.

tEXABulkExec Standard properties


These properties are used to configure tEXABulkExec running in the Standard Job framework.
The Standard tEXABulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and from the list displayed select the
relevant connection component to reuse the connection
details you have already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Property Type Either Built-In or Repository.


• Built-In: No property data stored centrally.
• Repository: Select the repository file in which the
properties are stored. The database connection fields
that follow are completed automatically using the data
retrieved.

Host Enter the host or host list of the EXASol database servers.
EXASol can run in a cluster environment. The valid value
can be a simple IP address (for example, 172.16.173.128
), an IP range list (for example, 172.16.173.128..130 that
represents three servers 172.16.173.128, 172.16.173.129
, and 172.16.173.130), or a comma-separated host list
(for example, server1,server2,server3) of the EXASolution
database cluster.

Port Enter the listening port number of the EXASolution


database cluster.

881
tEXABulkExec

Schema Enter the name of the schema you want to use.

User and Password Enter the user authentication data to access the
EXASolution database.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Enter the name of the table to be written.

Note:
Typically the table names are stored in upper case. If you
need mixed case identifiers, you have to enter the name
in double quotes. For example, "\"TEST_data_LOAD\"".

Action on table On the table defined, you can perform one of the following
operations before running the import:
• None: No operation is carried out.
• Drop and create table: The table is removed and
created again.
• Create table: The table does not exist and gets created.
• Create table if not exists: The table is created if it does
not exist.
• Truncate table: The table content is deleted. You do
not have the possibility to rollback the operation.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.

Note:
The columns in the schema must be in the same order
as they are in the CSV file. It is not necessary to fill all
columns of the defined table unless the use case or table
definition expects that.

  Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon

882
tEXABulkExec

completion and choose this schema metadata again in


the Repository Content window.

Advanced settings

Additional JDBC Parameters Specify additional connection properties for the database
connection you are creating. The properties are separated
by semicolon and each property is a key-value pair, for
example, encryption=1;clientname=Talend.
This field is not available if the Use an existing connection
check box is selected.

Column Formats Specify the format for Date and numeric columns if the
default can not be applied.
• Column: The cells in this column are automatically
filled with the defined schema column names.
• Has Thousand Delimiters: Select this check box if
the value of the corresponding numeric column (only
for numeric column) in the file contains thousand
separators.
• Alternative Format: Specify the necessary format
as String value if a special format is expected. The
necessary format will be created from the schema
column length and precision. For more information
about format models, see EXASolution User Manual.

Source table columns If the source is a database, configure the mapping between
the source columns and the target columns in this table.
Specifically configuring the mapping is optional. If you set
nothing here, it is assumed that the source table has the
same structure as the target table.
• Column: The schema column in the target table.
• Source column name: The name of the column in the
source table.

Column Separator Enter the separator for the columns of a row in the local
file.

Column Delimiter Enter the delimiter that encapsulates the field content in
the local file.

Row Separator Enter the char used to separate the rows in the local file.

Null representation Enter the string that represents a NULL value in the local
file. If not specified, NULL values are represented as the
empty string.

Skip rows Enter the number of rows (for example, header or any other
prefix rows) to be omitted.

Encoding Enter the character set used in the local file. By default, it is
UTF8.

Trim column values Specify whether spaces are deleted at the border of CSV
columns.
• No trim: no spaces are trimmed.
• Trim: spaces from both left and right sides are
trimmed.

883
tEXABulkExec

• Trim only left: spaces from only the left side are
trimmed.
• Trim only right: spaces from only the right side are
trimmed.

Default Date Format Specify the format for datetime values. By default, it is
YYYY-MM-DD.

Default Timestamp Format Specify the timestamp format used. By default, it is YYYY-
MM-DD HH24:MI:SS.FF3.

Thousands Separator Specify the character used to separate thousand groups in a


numeric text value. In the numeric format, the character will
be applied to the placeholder G. If the text values contain
this char, you have to configure it also in the Column
Formats table.
Note that this setting affects the connection property
NLS_NUMERIC_CHARACTERS that defines the decimal and
group characters used for representing numbers.

Decimal Separator Specify the character used to separate the integer part
of a number from the fraction. In the numeric format, the
character will be applied to the placeholder D.
Note that this setting affects the connection property
NLS_NUMERIC_CHARACTERS that defines the decimal and
group characters used for representing numbers.

Minimal number errors to reject the transfer Specify the maximum number of invalid rows allowed
during the data loading process. For example, the value 2
means the loading process will stop if the third error occurs.

Log Error Destination Specify the location where error messages will be stored.
• No Logging: error messages will not be saved.
• Local Log File: error messages will be stored in a
specified local file.
• Local Error Log File: specify the path to the local
file that stores error messages.
• Add current timestamp to log file name (before
extension): select this check box to add the
current timestamp before the extension of the file
name for identification reasons in case you use
the same file multiple times.
• Logging Table: error messages will be stored in a
specified table. The table will be created if it does not
exist.
• Error Log Table: enter the name of the table that
stores error messages.
• Use current timestamp to build log table: select
this check box to use the current timestamp to
build the log table for identification reasons in
case you use the same table multiple times.

Transfer files secure Select this check box to transfer the file over HTTPS instead
of HTTP.

Test mode (no statements are executed) Select this check box to have the component running in test
mode, where no statements are executed.

884
tEXABulkExec

Use precision and length from schema Select this check box to check column values that are of
numeric types (that is, Double, Float, BigDecimal, Integer,
Long, and Short) against the Length setting (which sets the
number of integer digits) and the Precision setting (which
sets the number of decimal digits) in the schema. Only
the values with neither their number of integer digits nor
number of decimal digits larger than the Length setting and
the Precision setting are loaded.
For example, with Length set to 4 and Precision set to 3,
the values 8888.8888 and 88888.888 will be dropped;
the values 8888.88 and 888.888 will be loaded.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE_INSERTED: the number of rows inserted. This is an


After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
FILENAME: the name of the file processed. This is an After
variable and it returns a string.
ERROR_LOG_FILE: the path to the local log file. This is an
After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is usually used as a standalone component.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the

885
tEXABulkExec

Component List box in the Basic settings view becomes


unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Settings for different sources of import data


The settings for this component change depending on the source of your import data.
The component handles data coming from any of the following sources:
• Local file
• Remote file
• EXASol database
• Oracle database
• JDBC-compliant database

Local file
The local file is not transferred by uploading the file. Instead, the driver establishes a (secure)
web service that sends the URL to the database, and the database retrieves the file from this local
web service. Because the port of this service cannot be explicitly defined, this method requires a
transparent network between the local Talend Job and the EXASolution database.

File name  Specify the path to the local file to be imported.

Remote file
This method works with a file that is accessible on a server through the following protocols: SCP,
SFTP, FTP, HTTP, or HTTPS.

Use predefined connection It is possible, via the SQL interface, to set up a named connection in the
EXASolution database itself. Select this option if you want to use such a
connection, and provide its name.
To know what connections are available, look at the table SYS.EXA_DBA_CO
NNECTIONS in the database.
The connection must contain a URL with one of the following protocols: SCP,
SFTP, FTP, HTTP, or HTTPS.
The URL must not contain the file name. The file name is always dynamic and
must be provided by the component configuration.

Remote file server URL Specify the URL to the file server, without the file name itself.

File name Specify the name of the file you want to fetch from the server.

Query parameters If the web service depends on query parameters, specify them here.
For example, if you want to get a file from an HDFS file system via the web
service, you need to add some additional parameters such as open=true.

886
tEXABulkExec

Use user authentication Select this check box if you want to use Basic Authentication when connecting to
the web server.

Remote user and Remote users password Enter the user name and password need to access the web server.

EXASol database
An EXASolution database can also serve as a remote source for the data. The source can be a table or
a specific query.

Use predefined connection It is possible, via the SQL interface, to set up a named connection in the
EXASolution database itself. Select this option if you want to use such a
connection, and provide its name.
To know what connections are available, look at the table SYS.EXA_DBA_CO
NNECTIONS in the database.
The username and password must by provided by the component and not as part
of the predefined connection.

EXASol database host Specify the host of the remote EXASolution database.
This field can also be used to access a cluster.

Use self defined query Select this check box if you want to use a specific query to get the data.
This method is preferred if, for example, your data needs to be filtered (using a
where condition), joined or converted.

Source query If you want to use a specific query, enter the query in this field.

Database or schema If you are not using a specific query, enter the schema name for the source table
in this field.

Source table If you are not using a specific query, enter the table name in this field.
The mapping between the source table columns and the target table columns
(schema columns) can be set in the advanced settings.

Use user authentication Select this check box if you want to use Basic Authentication when connecting to
the source database.

Remote user and Remote users password Enter the user name and password needed to access the source database.

Oracle database
An Oracle database can also serve as remote source for the data. Access to an Oracle database
requires an Enterprise license for the EXASolution database and does not work with the free edition.
The source can be a table or a specific query.

Use predefined connection It is possible, via the SQL interface, to set up a named connection in the
EXASolution database itself. Select this option if you want to use such a
connection, and provide its name.
To know what connections are available, look at the table SYS.EXA_DBA_CO
NNECTIONS in the database.
The username and password must by provided by the component and not as part
of the predefined connection.

Oracle database URL Specify the JDBC URL to the Oracle database.

887
tEXABulkExec

Use self defined query Select this check box if you want to use a specific query to get the data.
This method is preferred if, for example, your data needs to be filtered (using a
where condition), joined or converted.

Source query If you want to use a specific query, enter the query in this field.

Database or schema If you are not using a specific query, enter the schema name for the source table
in this field.

Source table If you are not using a specific query, enter the table name in this field.
The mapping between the source table columns and the target table columns
(schema columns) can be set in the advanced settings.

Use user authentication Select this check box if you want to use Basic Authentication when connecting to
the source database.

Remote user and Remote users password Enter the user name and password needed to access the source database.

JDBC-compliant database
The free edition of the EXASolution database supports MySQL and PostgreSQL databases, and others
are available in the Enterprise edition. The source can be table or a self defined query.
Nearly all enterprise-grade databases provide a JDBC interface.

Use predefined connection It is possible, via the SQL interface, to set up a named connection in the
EXASolution database itself. Select this option if you want to use such a
connection, and provide its name.
To know what connections are available, look at the table SYS.EXA_DBA_CO
NNECTIONS in the database.
The username and password must by provided by the component and not as part
of the predefined connection.

JDBC database URL Specify the JDBC URL to the source database.

Use self defined query Select this check box if you want to use a specific query to get the data.
This method is preferred if, for example, your data needs to be filtered (using a
where condition), joined or converted.

Source query If you want to use a specific query, enter the query in this field.

Database or schema If you are not using a specific query, enter the schema name for the source table
in this field.

Source table If you are not using a specific query, enter the table name in this field.
The mapping between the source table columns and the target table columns
(schema columns) can be set in the advanced settings.

Use user authentication Select this check box if you want to use Basic Authentication when connecting to
the source database.

Remote user and Remote users password Enter the user name and password needed to access the source database.

888
tEXABulkExec

Importing data into an EXASolution database table from a


local CSV file
This scenario describes a Job that writes employee information into a CSV file, then loads the
data from this local file into a newly created EXASolution database table using the tEXABulkExec
component, and finally retrieves the data from the table and displays it on the console.

Dropping and linking the components


Procedure
1. Create a new Job and add the following components by typing their names in the design
workspace or dropping them from the Palette: a tFixedFlowInput component, a tFileOutputDel
imited component, a tEXABulkExec component, a tEXAInput component, and a tLogRow
component.
2. Connect the tFixedFlowInput component to the tFileOutputDelimited component using a Row >
Main connection.
3. Do the same to connect the tEXAInput component to the tLogRow component.
4. Connect the tFixedFlowInput component to the tEXABulkExec component using a Trigger > On
Subjob Ok connection.
5. Do the same to connect the tEXABulkExec component to the tEXAInput component.

Configuring the components


Preparing the source data

Procedure
1. Double-click the tFixedFlowInput component to open its Basic settings view.

889
tEXABulkExec

2. Click the [...] button next to Edit schema to open the Schema dialog box.

3. Click the [+] button to add six columns: EmployeeID of the Integer type, EmployeeName, OrgTeam
and JobTitle of the String type, OnboardDate of the Data type with the yyyy-MM-dd date pattern,
and MonthSalary of the Double type.
4. Click OK to close the dialog box and accept schema propagation to the next component.

890
tEXABulkExec

5. In the Mode area, select Use Inline Content (delimited file) and enter the following employee data
in the Content field.

12000;James;Dev Team;Developer;2008-01-01;15000.01
12001;Jimmy;Dev Team;Developer;2008-11-22;13000.11
12002;Herbert;QA Team;Tester;2008-05-12;12000.22
12003;Harry;Doc Team;Technical Writer;2009-03-10;12000.33
12004;Ronald;QA Team;Tester;2009-06-20;12500.44
12005;Mike;Dev Team;Developer;2009-10-15;14000.55
12006;Jack;QA Team;Tester;2009-03-25;13500.66
12007;Thomas;Dev Team;Developer;2010-02-20;16000.77
12008;Michael;Dev Team;Developer;2010-07-15;14000.88
12009;Peter;Doc Team;Technical Writer;2011-02-10;12500.99

6. Double-click the tFileOutputDelimited component to open its Basic settings view.

7. In the File Name field, specify the file into which the input data will be written. In this example, it
is "E:/employee.csv".
8. Click Advanced settings to open the Advanced settings view of the tFileOutputDelimited
component.

9. Select the Advanced separator (for numbers) check box and in the Thousands separator and
Decimal separator fields displayed, specify the separators for thousands and decimal. In this
example, the default values "," and "." are used.

Loading the source data into a newly created EXASolution database table

Procedure
1. Double-click the tEXABulkExec component to open its Basic settings view.

891
tEXABulkExec

2. Fill in the Host, Port, Schema, User and Password fields with your EXASolution database
connection details.
3. In the Table field, enter the name of the table into which the source data will be written. In this
example, the target database table is named "employee" and it does not exist.
4. Select Create table from the Action on table list to create the specified table.
5. In the Source area, select Local file as the source for the input data, and then specify the file that
contains the source data. In this example, it is "E:/employee.csv".
6. Click the [...] button next to Edit schema to open the Schema dialog box and define the schema,
which should be the same as that of the tFixedFlowInput component.
Then click OK to validate these changes and close the dialog box.
7. Click Advanced settings to open the Advanced settings view of the tEXABulkExec component.

8. In the Column Formats table, for the two numeric fields EmployeeID and MonthSalary, select the
corresponding check boxes in the Has Thousand Delimiters column, and then define their format

892
tEXABulkExec

model strings in the corresponding fields of the Alternative Format column. In this example,
"99G999" for EmployeeID and "99G999D99" for MonthSalary.
9. Make sure that the Thousands Separator and Decimal Separator fields have values identical to
those of the tFileOutputDelimited component and keep the default settings for the other options.

Retrieving data from the EXASolution database table

Procedure
1. Double-click the tEXAInput component to open its Basic settings view.

2. Fill in the Host name, Port, Schema name, Username and Password fields with your EXASolution
database connection details.
3. In the Table Name field, enter the name of the table from which the data will be retrieved. In this
example, it is "employee".
4. Click the [...] button next to Edit schema to open the Schema dialog box and define the schema,
which should be the same as that of the tFixedFlowInput component.
Then click OK to close the dialog box and accept schema propagation to the next component.
5. Click the Guess Query button to fill the Query field with the following auto-generated SQL
statement that will be executed on the specified table.

SELECT employee.EmployeeID,
employee.EmployeeName,
employee.OrgTeam,
employee.JobTitle,
employee.OnboardDate,
employee.MonthSalary
FROM employee

6. Double-click the tLogRow component to open its Basic settings view.

893
tEXABulkExec

7. In the Mode area, select the Table (print values in cells of a table) option for better readability of
the output.

Saving and executing the Job


Procedure
1. Press Ctrl + S to save the Job.
2. Press F6 to execute the Job.

As shown above, the employee data is written into the specified EXASolution database table and
is then retrieved and displayed on the console.

894
tEXAClose

tEXAClose
Closes an active connection to an EXASolution database instance to release the occupied resources.

tEXAClose Standard properties


These properties are used to configure tEXAClose running in the Standard Job framework.
The Standard tEXAClose component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component List Select the tEXAConnection component that opens the


connection you need to close from the list.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is more commonly used with


other EXASolution components, especially with the
tEXAConnection and tEXACommit components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database

895
tEXAClose

connection dynamically from multiple connections planned


in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenario
No scenario is available for the Standard version of this component yet.

896
tEXACommit

tEXACommit
Validates the data processed through the Job into the connected EXASolution database.
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.

tEXACommit Standard properties


These properties are used to configure tEXACommit running in the Standard Job framework.
The Standard tEXACommit component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component List Select the tEXAConnection component for which you want
the commit action to be performed.

Close Connection This check box is selected by default and it allows you
to close the database connection once the commit is
done. Clear this check box to continue to use the selected
connection after the component has performed its task.

Warning:
If you want to use a Row > Main connection to link
tEXACommit to your Job, your data will be committed row
by row. In this case, do not select the Close Connection
check box or your connection will be closed before the end
of your first row commit.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

897
tEXACommit

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is more commonly used with


other EXASolution components, especially with the
tEXAConnection and tEXARollback components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenario
For a similar scenario using other database, see Inserting data in mother/daughter tables on page
2426.

898
tEXAConnection

tEXAConnection
Opens a connection to an EXASolution database instance that can then be reused by other
EXASolution components.

tEXAConnection Standard properties


These properties are used to configure tEXAConnection running in the Standard Job framework.
The Standard tEXAConnection component belongs to the Databases and the ELT families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Either Built-In or Repository.


• Built-In: No property data stored centrally.
• Repository: Select the repository file in which the
properties are stored. The database connection fields
that follow are completed automatically using the data
retrieved.

Host Enter the host or host list of the EXASol database servers.
EXASol can run in a cluster environment. The valid value
can be a simple IP address (for example, 172.16.173.128
), an IP range list (for example, 172.16.173.128..130 that
represents three servers 172.16.173.128, 172.16.173.129
, and 172.16.173.130), or a comma-separated host list
(for example, server1,server2,server3) of the EXASolution
database cluster.

Port Enter the listening port number of the EXASolution


database cluster.

Schema Enter the name of the schema you want to use.

Username and Password Enter the user authentication data to access the
EXASolution database.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database

899
tEXAConnection

connection components from different Job levels that can


be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.

Advanced settings

Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.

Additional JDBC Parameters Specify additional connection properties for the database
connection you are creating. The properties are separated
by semicolon and each property is a key-value pair, for
example, encryption=1;clientname=Talend.
This field is not available if the Use an existing connection
check box is selected.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

900
tEXAConnection

Usage

Usage rule This component is more commonly used with other


EXASolution components, especially with the tEXACommit
and tEXARollback components.

Related scenario
For a similar scenario using other database, see Inserting data in mother/daughter tables on page
2426.

901
tEXAInput

tEXAInput
Retrieves data from an EXASolution database based on a query with a strictly defined order which
corresponds to the schema definition, and passes the data to the next component.

tEXAInput Standard properties


These properties are used to configure tEXAInput running in the Standard Job framework.
The Standard tEXAInput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Either Built-In or Repository.


• Built-In: No property data stored centrally.
• Repository: Select the repository file in which the
properties are stored. The database connection fields
that follow are completed automatically using the
data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Use an existing connection Select this check box and from the list displayed select the
relevant connection component to reuse the connection
details you have already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

902
tEXAInput

Host name Enter the host or host list of the EXASol database servers.
EXASol can run in a cluster environment. The valid value
can be a simple IP address (for example, 172.16.173.128
), an IP range list (for example, 172.16.173.128..130 that
represents three servers 172.16.173.128, 172.16.173.129
, and 172.16.173.130), or a comma-separated host list
(for example, server1,server2,server3) of the EXASolution
database cluster.

Port Enter the listening port number of the EXASolution


database cluster.

Schema name Enter the name of the schema you want to use.

Username and Password Enter the user authentication data to access the
EXASolution database.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Table Name Enter the name of the table to be queried.

Query Type and Query Enter the database query, paying particularly attention to
the proper sequence of the fields in order to match the
schema definition.

Guess Query Click the button to generate the query that corresponds to
the table schema in the Query field.

Guess schema Click the button to retrieve the schema from the table.

903
tEXAInput

Advanced settings

Change fetch size Select this check box to change the fetch size which
specifies the amount of resultset data sent during one
single communication step with the database. In the Fetch
size field displayed, you need to enter the size in KB.

Additional JDBC parameters Specify additional connection properties for the database
connection you are creating. The properties are separated
by semicolon and each property is a key-value pair, for
example, encryption=1;clientname=Talend.
This field is not available if the Use an existing connection
check box is selected.

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespaces from all the String/Char columns.

Trim column Select the check box in the Trim column to remove leading
and trailing whitespaces from the corresponding field.
This table is not available if the Trim all the String/Char
columns check box is selected.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is usually used as a start component of a


Job or subJob and it needs an output link.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an

904
tEXAInput

environment where you cannot change your Job settings, for


example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenario
For a related scenario, see Importing data into an EXASolution database table from a local CSV file on
page 889.
For similar scenarios using other databases, see:

905
tEXAOutput

tEXAOutput
Writes, updates, modifies or deletes data in an EXASolution database by executing the action
defined on the table and/or on the data in the table, based on the flow incoming from the preceding
component.

tEXAOutput Standard properties


These properties are used to configure tEXAOutput running in the Standard Job framework.
The Standard tEXAOutput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Either Built-In or Repository.


• Built-In: No property data stored centrally.
• Repository: Select the repository file in which the
properties are stored. The database connection fields
that follow are completed automatically using the
data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Use an existing connection Select this check box and from the list displayed select the
relevant connection component to reuse the connection
details you have already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

906
tEXAOutput

Host Enter the host or host list of the EXASol database servers.
EXASol can run in a cluster environment. The valid value
can be a simple IP address (for example, 172.16.173.128
), an IP range list (for example, 172.16.173.128..130 that
represents three servers 172.16.173.128, 172.16.173.129
, and 172.16.173.130), or a comma-separated host list
(for example, server1,server2,server3) of the EXASolution
database cluster.

Port Enter the listening port number of the EXASolution


database cluster.

Schema name Enter the name of the schema you want to use.

Username and Password Enter the user authentication data to access the
EXASolution database.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Enter the name of the table to be written. Note that only on
e table can be written at a time.

Action on table On the table defined, you can perform one of the following
operations:
• None: No operation is carried out.
• Drop and create table: The table is removed and
created again.
• Create table: The table does not exist and gets
created.
• Create table if does not exist: The table is created if it
does not exist.
• Drop table if exists and create: The table is removed if
it already exists and created again.
• Clear table: The table content is deleted.
• Truncate table: The table content is deleted. You don
not have the possibility to rollback the operation.

Action on data On the data of the table defined, you can perform:
• Insert: Add new entries to the table. If duplicates are
found, Job stops.
• Update: Make changes to existing entries
• Insert or update: Insert a new record. If the record with
the given reference already exists, an update would be
made.
• Update or insert: Update the record with the given
reference. If the record does not exist, a new record
would be inserted.
• Delete: Remove entries corresponding to the input
flow.

907
tEXAOutput

Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You
can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define
primary keys for the update and delete operations. To
do that: Select the Use field options check box and then
in the Key in update column, select the check boxes
next to the column name on which you want to base
the update operation. Do the same in the Key in delete
column for the deletion operation.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

Use commit control Select this box to display the Commit every field in which
you can define the number of rows to be processed before
committing.

908
tEXAOutput

Additional JDBC parameters Specify additional connection properties for the database
connection you are creating. The properties are separated
by semicolon and each property is a key-value pair, for
example, encryption=1;clientname=Talend.
This field is not available if the Use an existing connection
check box is selected.

Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.
• Name: Enter the name of the column to be modified or
inserted.
• SQL expression: Enter the SQL expression to be
executed to modify or insert data in the corresponding
columns.
• Position: Select Before, After or Replace, depending on
the action to be carried out on the reference column.
• Reference column: Type in a column of reference that
can be used to place or replace the new or altered
column.

Use field options Select this check box to customize a request for the
corresponding column, particularly if multiple actions are
being carried out on the data.
• Key in update: Select the check box for the
corresponding column based on which the data is
updated.
• Key in delete: Select the check box for the
corresponding column based on which the data is
deleted.
• Updatable: Select the check box if the data in the
corresponding column can be updated.
• Insertable: Select the check box if the data in the
corresponding column can be inserted.

Debug query mode Select this check box to display each step during processing
entries in a database.

Use batch mode Select this check box to activate the batch mode for data
processing, and in the Batch Size field displayed enter the
number of records to be processed in each batch.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.

909
tEXAOutput

NB_LINE_REJECTED: the number of rows rejected. This is an


After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in an EXASolution database. It also allows you to
create a reject flow using a Row > Rejects link to filter data
in error. For a related scenario, see Retrieving data in error
with a Reject link on page 2474.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

910
tEXAOutput

Related scenario
For similar scenarios using other databases, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.

911
tEXARollback

tEXARollback
Cancels the transaction commit in the connected EXASolution database.
It allows you to roll back any changes made in the EXASolution database to prevent partial
transaction commit if an error occurs.

tEXARollback Standard properties


These properties are used to configure tEXARollback running in the Standard Job framework.
The Standard tEXARollback component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component List Select the tEXAConnection component for which you want
the rollback action to be performed.

Close Connection This check box is selected by default and it allows you
to close the database connection once the rollback is
done. Clear this check box to continue to use the selected
connection after the component has performed its task.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

912
tEXARollback

Usage

Usage rule This component is more commonly used with


other EXASolution components, especially with the
tEXAConnection and tEXACommit components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related Scenario
For a similar scenario using other database, see Rollback from inserting data in mother/daughter
tables on page 2429.

913
tEXARow

tEXARow
Executes SQL queries on an EXASolution database.
Depending on the nature of the query and the database, tEXARow acts on the actual structure of the
database, or indeed the data, although without modifying them. The Row suffix indicates that it is
used to channel a flow in a Job although it does not produce any output data.

tEXARow Standard properties


These properties are used to configure tEXARow running in the Standard Job framework.
The Standard tEXARow component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Either Built-In or Repository.


• Built-In: No property data stored centrally.
• Repository: Select the repository file in which the
properties are stored. The database connection fields
that follow are completed automatically using the
data retrieved.

Use an existing connection Select this check box and from the list displayed select the
relevant connection component to reuse the connection
details you have already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Enter the host or host list of the EXASol database servers.
EXASol can run in a cluster environment. The valid value
can be a simple IP address (for example, 172.16.173.128
), an IP range list (for example, 172.16.173.128..130 that

914
tEXARow

represents three servers 172.16.173.128, 172.16.173.129


, and 172.16.173.130), or a comma-separated host list
(for example, server1,server2,server3) of the EXASolution
database cluster.

Port Enter the listening port number of the EXASolution


database cluster.

Schema name Enter the name of the schema you want to use.

Username and Password Enter the user authentication data to access the
EXASolution database.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.

  Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Table Name Enter the name of the table to be processed.

Query Type Either Built-In or Repository .


• Built-In: Enter the query manually or with the help of
the SQLBuilder.
• Repository: Select the appropriate query from the
Repository. The Query field is then completed
automatically.

Guess Query Click the Guess Query button to generate the query that
corresponds to the table schema in the Query field.

Query Enter the database query paying particularly attention to


the proper sequence of the fields in order to match the
schema definition.

915
tEXARow

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the database
connection you are creating. The properties are separated
by semicolon and each property is a key-value pair, for
example, encryption=1;clientname=Talend.
This field is not available if the Use an existing connection
check box is selected.

Propagate QUERY's recordset Select this check box to insert the query results in one of
the flow columns. Select the particular column from the use
column list.

Use PreparedStatement Select this check box to use prepared statements and in
the Set PreparedStatement Parameters table displayed,
add as many parameters as needed and set the following
attributes for each parameter:
• Parameter Index: enter the index of the prepared
statement parameter.
• Parameter Type: click in the cell and select the type of
the parameter from the list.
• Parameter Value: enter the value of the parameter.

Commit every Enter the number of rows to be included in each batch


before the data is written. This option guarantees the
quality of the transaction (although there is no rollback
option) and improves performance.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

916
tEXARow

Usage

Usage rule This component offers query flexibility as it covers all


possible SQL query requirements.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related Scenario
For similar scenarios using other databases, see:
• Procedure on page 622,
• Removing and regenerating a MySQL table index on page 2497.

917
tEXistConnection

tEXistConnection
Opens a connection to an eXist database in order that a transaction may be carried out.

tEXistConnection Standard properties


These properties are used to configure tEXistConnection running in the Standard Job framework.
The Standard tEXistConnection component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

URI URI of the database you want to connect to.

Collection Enter the path to the collection of interest on the database


server.

Driver This field is automatically populated with the standard


driver.

Note:
Users can enter a different driver, depending on their
needs.

Username and Password User authentication information.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.

Usage

Usage rule This component is more commonly used with other tEXist*
components, especially with the tEXistGet and tEXistPut
components. If you set the connection properties in the
tEXistConnection component, you can reuse the connection
for other tEXist* components in the same Job.
eXist-db is an open source database management system
built using XML technology. It stores XML data according
to the XML data model and features efficient, index-based
XQuery processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension,
see XQuery update extension.

918
tEXistConnection

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For tEXistConnection related scenario, see tMysqlConnection on page 2425

919
tEXistDelete

tEXistDelete
Deletes specified resources from a remote eXist database.

tEXistDelete Standard properties


These properties are used to configure tEXistDelete running in the Standard Job framework.
The Standard tEXistDelete component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection/Component List Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

URI URI of the database you want to connect to.

Collection Enter the path to the collection of interest on the database


server.

Driver This field is automatically populated with the standard


driver.

Note:
Users can enter a different driver, depending on their
needs.

Username and Password User authentication information.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Target Type Either Resource, Collection, or All.

Files Click the plus button to add the lines you want to use as
filters:
Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.

920
tEXistDelete

Global Variables

Global Variables NB_FILE: the number of files processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is typically used as a single component


subJob but can also be used as an output or end object.
eXist-db is an open source database management system
built using XML technology. It stores XML data according
to the XML data model and features efficient, index-based
XQuery processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension,
see XQuery update extension.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
No scenario is available for the Standard version of this component yet.

921
tEXistGet

tEXistGet
Retrieves selected resources from a remote eXist database to a defined local directory.

tEXistGet Standard properties


These properties are used to configure tEXistGet running in the Standard Job framework.
The Standard tEXistGet component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection/Component List Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

URI URI of the database you want to connect to.

Collection Enter the path to the collection of interest on the database


server.

Driver This field is automatically populated with the standard


driver.

Note:
Users can enter a different driver, depending on their
needs.

Username and Password User authentication information.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Local directory Path to the file's destination location.

Files Click the plus button to add the lines you want to use as
filters:
Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.

922
tEXistGet

Global Variables

Global Variables NB_FILE: the number of files processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is typically used as a single component


subJob but can also be used as an output or end object.
eXist-db is an open source database management system
built using XML technology. It stores XML data according
to the XML data model and features efficient, index-based
XQuery processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension,
see XQuery update extension.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Retrieving resources from a remote eXist DB server


This is a single-component Job that retrieves data from a remote eXist DB server and download the
data to a defined local directory.
This simple Job requires one component: tEXistGet.

923
tEXistGet

Procedure
Procedure
1. Drop the tEXistGet component from the Palette into the design workspace.
2. Double-click the tEXistGet component to open the Component view and define the properties in
its Basic settings view.

3. Fill in the URI field with the URI of the eXist database you want to connect to.
In this scenario, the URI is xmldb:exist://192.168.0.165:8080/exist/xmlrpc. Note that the URI used in
this use case is for demonstration purposes only and is not an active address.
4. Fill in the Collection field with the path to the collection of interest on the database server, /db/
talend in this scenario.
5. Fill in the Driver field with the driver for the XML database, org.exist.xmldb.DatabaseImpl in this
scenario.
6. Fill in the Username and Password fields by typing in admin and talend respectively in this
scenario.
7. Click the three-dot button next to the Local directory field to set a path for saving the XML file
downloaded from the remote database server.
In this scenario, set the path to your desktop, for example C:/Documents and Settings/galano/Des
ktop/ExistGet.
8. In the Files field, click the plus button to add a new line in the Filemask area, and fill it with a
complete file name to retrieve data from a particular file on the server, or a filemask to retrieve
data from a set of files. In this scenario, fill in dictionary_en.xml.
9. Save your Job and press F6 to execute it.

924
tEXistGet

The XML file dictionary_en.xml is retrieved and downloaded to the defined local directory.

925
tEXistList

tEXistList
Lists the resources stored on a remote eXist database.

tEXistList Standard properties


These properties are used to configure tEXistList running in the Standard Job framework.
The Standard tEXistList component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection/Component List Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

URI URI of the database you want to connect to.

Collection Enter the path to the collection of interest on the database


server.

Driver This field is automatically populated with the standard


driver.

Note:
Users can enter a different driver, depending on their
needs.

Username and Password Server authentication information.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Files Click the plus button to add the lines you want to use as fil
ters:.
Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions.

Target Type Either Resource, Collection or All contents:

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.

926
tEXistList

Global Variables

Global Variables NB_FILE: the number of files iterated upon. This is an After
variable, and it returns an integer.
CURRENT_FILE: the current file name. This is a Flow
variable and it returns a string.
CURRENT_FILEPATH: the current file path. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is typically used along with a


tEXistGetcomponent to retrieve the files listed, for example.
eXist-db is an open source database management system
built using XML technology. It stores XML data according
to the XML data model and features efficient, index-based
XQuery processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension,
see XQuery update extension.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenario
For a related scenario, see Listing and getting files/folders on an FTP directory on page 1230.

927
tEXistPut

tEXistPut
Uploads specified files from a defined local directory to a remote eXist database.

tEXistPut Standard properties


These properties are used to configure tEXistPut running in the Standard Job framework.
The Standard tEXistPut component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection/Component List Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

URI URI of the database you want to connect to.

Collection Enter a path to indicate where the resource is to be saved


on the server.

Driver This field is automatically populated with the standard


driver.

Note:
Users can enter a different driver, depending on their
needs.

Username and Password User authentication information.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Local directory Path to the source location of the file(s).

Files Click the plus button to add the lines you want to use as fil
ters:.
Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.

928
tEXistPut

Global Variables

Global Variables NB_FILE: the number of files processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is typically used as a single component


subJob but can also be used as an output or end object.
eXist-db is an open source database management system
built using XML technology. It stores XML data according
to the XML data model and features efficient, index-based
XQuery processing.
For further information about XQuery, see http://exist-
db.org/exist/apps/doc/documentation.xml.
For further information about the XQuery update extension,
see http://exist-db.org/exist/apps/doc/update_ext.xml.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
No scenario is available for the Standard version of this component yet.

929
tEXistXQuery

tEXistXQuery
Queries XML files located on remote databases using local files containing XPath queries and outputs
the results to an XML file stored locally.

tEXistXQuery Standard properties


These properties are used to configure tEXistXQuery running in the Standard Job framework.
The Standard tEXistXQuery component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection/Component List Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

URI URI of the database you want to connect to.

Collection Enter the path to the XML file location on the database.

Driver This field is automatically populated with the standard


driver.

Note:
Users can enter a different driver, depending on their
needs.

Username and Password DB server authentication information.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

XQuery Input File Browse to the local file containing the query to be executed.

Local Output Browse to the directory in which the query results should be
saved.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.

Global Variables

Global Variables NB_FILE: the number of files processed. This is an After


variable and it returns an integer.

930
tEXistXQuery

ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is typically used as a single component Job


but can also be used as part of a more complex Job.
eXist-db is an open source database management system
built using XML technology. It stores XML data according
to the XML data model and features efficient, index-based
XQuery processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension,
see XQuery update extension.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
No scenario is available for the Standard version of this component yet.

931
tEXistXUpdate

tEXistXUpdate
Processes XML file records and updates the existing records on the database server.

tEXistXUpdate Standard properties


These properties are used to configure tEXistXUpdate running in the Standard Job framework.
The Standard tEXistXUpdate component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection/Component List Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

URI URI of the database you want to connect to.

Collection Enter the path to the collection and file of interest on the da
tabase server.

Driver This field is automatically populated with the standard


driver.

Note:
Users can enter a different driver, depending on their
needs.

Username and Password DB server authentication information.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Update File Browse to the local file in the local directory to be used to
update the records on the database.

Advanced settings

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a job level as well as at each component level.

Global Variables

Global Variables NB_FILE: the number of files processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable

932
tEXistXUpdate

and it returns a string. This variable functions only if the


Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is typically used as a single component Job


but can also be used as part of a more complex Job.
eXist-db is an open source database management system
built using XML technology. It stores XML data according
to the XML data model and features efficient, index-based
XQuery processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension,
see XQuery update extension.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
No scenario is available for the Standard version of this component yet.

933
tExternalSortRow

tExternalSortRow
Sorts input data based on one or several columns, by sort type and order, using an external sort
application.

tExternalSortRow Standard properties


These properties are used to configure tExternalSortRow running in the Standard Job framework.
The Standard tExternalSortRow component belongs to the Processing family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component connected in the Job.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
flowcharts. Related topic: see Talend Studio User Guide.

File Name Name or path to the file to be processed and/or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Field separator Character, string or regular expression to separate fields.

External command "sort" path Enter the path to the external file containing the sorting
algorithm to use.

934
tExternalSortRow

Criteria Click the plus button to add as many lines as required for
the sort to be complete. By default the first column defined
in your schema is selected.

  Schema column: Select the column label from your schema,


which the sort will be based on. Note that the order is
essential as it determines the sorting priority.

  Sort type: Numerical and Alphabetical order are proposed.


More sorting types to come.

  Order: Ascending or descending order.

Advanced settings

Maximum memory Type in the size of physical memory you want to allocate to
sort processing.

Temporary directory Specify the temporary directory to process the sorting


command.

Set temporary input file directory Select the check box to activate the field in which you can
specify the directory to handle your temporary input file.

Add a dummy EOF line Select this check box when using the tAggregateSortedRow
component.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows read by an input component
or transferred to an output component. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component handles flow of data therefore it requires


input and output, hence is defined as an intermediary step.

935
tExternalSortRow

Related scenario
For related use case, see tSortRow on page 3465.

936
tExtractDelimitedFields

tExtractDelimitedFields
Generates multiple columns from a delimited string column.
The extracted fields are written in new columns of the output schema. If you need to keep the original
columns in the output of this component, define these columns in the output schema using the same
column names as the original ones.

tExtractDelimitedFields Standard properties


These properties are used to configure tExtractDelimitedFields running in the Standard Job
framework.
The Standard tExtractDelimitedFields component belongs to the Processing family.
The component in this framework is available in all Talend products.

Basic settings

Field to split Select an incoming field from the Field to split list to split.

Ignore NULL as the source data Select this check box to ignore the Null value in the source
data.
Clear this check box to generate the Null records that
correspond to the Null value in the source data.

Field separator Enter character, string or regular expression to separate


fields for the transferred data.

Note:
Since this component uses regex to split a filed and the
regex syntax uses special characters as operators, make
sure to precede the regex operator you use as a field
separator by a double backslash. For example, you have
to use "\\|" instead of "|".

Die on error Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the

937
tExtractDelimitedFields

Jobs upon completion. If you just want to propagate


the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Advanced settings

Advanced separator (for number) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).

Trim column Select this check box to remove leading and trailing
whitespace from all columns.

Check each row structure against schema Select this check box to check whether the total number
of columns in each row is consistent with the schema. If
not consistent, an error message will be displayed on the
console.

Validate date Select this check box to check the date format strictly
against the input schema.

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows read by an input component
or transferred to an output component. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

938
tExtractDelimitedFields

Usage

Usage rule This component handles flow of data therefore it requires


input and output components. It allows you to extract data
from a delimited field, using a Row > Main link, and enables
you to create a reject flow filtering data which type does not
match the defined type.

Extracting a delimited string column of a database table


This scenario describes a Job that writes data including a delimited string column into a MySQL
database table and displays the data on the console, then extracts the delimited string column into
multiple columns and displays the data after extraction on the console.

Adding and linking components


Procedure
1. Create a new Job and add the following components by typing their names in the design
workspace or dropping them from the Palette: a tFixedFlowInput component, a tMysqlOutput
component, a tMysqlInput component, a tExtractDelimitedFields component, two tLogRow
components.
2. Link tFixedFlowInput to tMysqlOutput using a Row > Main connection.
3. Do the same to link tMysqlOutput to the first tLogRow, link tMysqlInput to tExtractDelimi
tedFields, link tExtractDelimitedFields to the second tLogRow.
4. Link tFixedFlowInput to tMysqlInput using a Trigger > On Subjob Ok connection.

Configuring the components


Populating data in a MySQL database table

Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.

939
tExtractDelimitedFields

2. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
three columns: Id of Integer type, and Name and DelimitedField of String type.

Click OK to close the schema editor and accept the propagation prompted by the pop-up dialog
box.
3. In the Mode area, select Use Inline Content(delimited file). Then in the Content field displayed,
enter the data to write to the database. This input data includes a delimited string column. In this
example, the input data is as follows:

1;Adam;32,Component Team,Developer
2;Bill;28,Component Team,Tester
3;Chris;30,Doc Team,Writer
4;David;35,Doc Team,Leader
5;Eddie;33,QA Team,Tester

4. Double-click tMysqlOutput to open its Basic settings view.

940
tExtractDelimitedFields

5. Fill the Host, Port, Database, Username, Password fields with the MySQL database connection
information.
6. Fill the Table field with the name of the table to be written. In this example, it is employee.
7. Select Drop table if exists and create from the Action on table list.
8. Double-click the first tLogRow to open its Basic settings view.

In the Mode area, select Table (print values in cells of a table) for better readability of the result.

Extracting the delimited string column in the database table into multiple columns

Procedure
1. Double-click tMysqlInput to open its Basic settings view.

941
tExtractDelimitedFields

2. Fill the Host, Port, Database, Username, Password fields with the MySQL database connection
information.
3. Click the [...] button next to Edit schema and in the pop-up window define the schema of the
tMysqlInput component same as the schema of the tMysqlOutput component.

4. In the Table Name field, enter the name of the table into which the data was written. In this
example, it is employee.
5. Click the Guess Query button to fill the Query field with the SQL query statement to be executed
on the specified table. In this example, it is as follows:

SELECT
`employee`.`Id`,
`employee`.`Name`,
`employee`.`DelimitedField`
FROM `employee`

6. Double-click tExtractDelimitedFields to open its Basic settings view.

942
tExtractDelimitedFields

7. In the Field to split list, select the delimited string column to be extracted. In this example, it is
DelimitedField.
In the Field separator, enter the separator used to separate the fields in the delimited string
column. In this example, it is ,.
8. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
five columns: Id of Integer type, and Name, Age, Team, Title of String type.
In this example, the delimited string column DelimitedField is split into three columns Age, Team
and Title, and the Id and Name columns are kept as well.

Click OK to close the schema editor and accept the propagation prompted by the pop-up dialog
box.
9. Double-click the second tLogRow to open its Basic settings view.

In the Mode area, select Table (print values in cells of a table) for better readability of the result.

Saving and executing the Job


Procedure
1. Press Ctrl + S to save the Job.
2. Execute the Job by pressing F6 or clicking Run on the Run tab.

943
tExtractDelimitedFields

As shown above, the primitive input data and the data after extraction are displayed on the
console, and the delimited string column DelimitedField is extracted into three columns Age, Team,
and Title.

944
tExtractJSONFields

tExtractJSONFields
Extracts the desired data from JSON fields based on the JSONPath or XPath query.

tExtractJSONFields Standard properties


These properties are used to configure tExtractJSONFields running in the Standard Job framework.
The Standard tExtractJSONFields component belongs to the Processing family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-In or Repository.

  Built-In: No property data stored centrally.

  Repository: Select the repository file where the properties


are stored.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.

Note: If you make changes, the schema automatically


becomes built-in.

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Read By Select a way of extracting JSON data in the file.


• JsonPath: Extracts JSON data based on the JSONPath
query. With this option selected, you need to select a
JSONPath API version from the API version drop-down
list. It is recommended to read data by JSONPath in
order to gain better performance.
• Xpath: Extracts JSON data based on the XPath query.

JSON field List of the JSON fields to be extracted.

945
tExtractJSONFields

Loop Jasonpath query Enter the path pointing to the node within the JSON field,
on which the loop is based.
Note if you have selected Xpath from the Read by drop-
down list, the Loop Xpath query field is displayed instead.

Mapping Complete this table to map the columns defined in the


schema to the corresponding JSON nodes.
• Column: The Column cells are automatically filled with
the defined schema column names.
• Json query/JSONPath query: Specify the JSONPath
node that holds the desired data. For more information
about JSONPath expressions, see http://goessner.net/
articles/JsonPath/.
This column is available only when JsonPath is
selected from the Read By list.
• XPath query: Specify the XPath node that holds the
desired data.
This column is available only when Xpath is selected
from the Read By list.
• Get Nodes: Select this check box to extract the JSON
data of all the nodes or select the check box next to a
specific node to extract the data of that node.
This column is available only when Xpath is selected
from the Read By list.
• Is Array: select this check box when the JSON field to
be extracted is an array instead of an object.
This column is available only when Xpath is selected
from the Read By list.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

Use the loop node as root Select this check box to use the loop node as the root for
querying the file.
The loop node is set in the Loop Json query text frame in
the Basic Settings view. If this option is checked, only the
child elements of the loop node are available for querying;
otherwise, both the parent elements and the child elements
of the loop node can be queried. You can specify a parent
element through JSON path syntax.
This check box is available only when JsonPath is selected
in the Read By drop-down list of the Basic settings view.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

946
tExtractJSONFields

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

NB_LINE The number of rows processed. This is an After variable and


it returns an integer.

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.
This variable functions only if the Die on error check box is
selected.

Usage

Usage rule This component is an intermediate component. It needs an


input and an output components.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Retrieving error messages while extracting data from JSON


fields
In this scenario, tWriteJSONField wraps the incoming data into JSON fields, data of which is then
extracted by tExtractJSONFields. Meanwhile, the error messages generated due to extraction failure,
which include the concerned JSON fields and errors, are retrieved via a Row > Reject link.

Linking the components


Procedure
1. Drop the following components from the Palette onto the design workspace: tFixedFlowInpu
t, tWriteJSONField, tExtractJSONFields, and tLogRow (X2). The two tLogRow components are
renamed as data_extracted and reject_info.
2. Link tFixedFlowInput and tWriteJSONField using a Row > Main connection.
3. Link tWriteJSONField and tExtractJSONFields using a Row > Main connection.
4. Link tExtractJSONFields and data_extracted using a Row > Main connection.
5. Link tExtractJSONFields and reject_info using a Row > Reject connection.

947
tExtractJSONFields

Configuring the components


Setting up the tFixedFlowInput

Procedure
1. Double-click tFixedFlowInput to display its Basic settings view.

2. Click Edit schema to open the schema editor.

948
tExtractJSONFields

Click the [+] button to add three columns, namely firstname, lastname and dept, with the type of
string.
Click OK to close the editor.
3. Select Use Inline Content and enter the data below in the Content box:

Andrew;Wallace;Doc
John;Smith;R&D
Christian;Dior;Sales

Setting up the tWriteJSONField

Procedure
1. Click tWriteJSONField to display its Basic settings view.

2. Click Configure JSON Tree to open the XML tree editor.

The schema of tFixedFlowInput appears in the Linker source panel.


3. In the Linker target panel, click the default rootTag and type in staff, which is the root node of the
JSON field to be generated.
4. Right-click staff and select Add Sub-element from the context menu.
5. In the pop-up box, enter the sub-node name, namely firstname.

949
tExtractJSONFields

Repeat the steps to add two more sub-nodes, namely lastname and dept.
6. Right-click firstname and select Set As Loop Element from the context menu.
7. Drop firstname from the Linker source panel to its counterpart in the Linker target panel.
In the pop-up dialog box, select Add linker to target node.

Click OK to close the dialog box.


8. Repeat the steps to link the two other items.
Click OK to close the XML tree editor.
9. Click Edit schema to open the schema editor.

10. Click the [+] button in the right panel to add one column, namely staff, which will hold the JSON
data generated.
Click OK to close the editor.

950
tExtractJSONFields

Setting up the tExtractJSONFields

Procedure
1. Double-click tExtractJSONFields to display its Basic settings view.

2. Click Edit schema to open the schema editor.

3. Click the [+] button in the right panel to add three columns, namely firstname, lastname and dept,
which will hold the data of their counterpart nodes in the JSON field staff.
Click OK to close the editor.
4. In the pop-up Propagate box, click Yes to propagate the schema to the subsequent components.

951
tExtractJSONFields

5. In the Loop XPath query field, enter "/staff", which is the root node of the JSON data.
6. In the Mapping area, type in the node name of the JSON data under the XPath query part. The
data of those nodes will be extracted and passed to their counterpart columns defined in the
output schema.
7. Specifically, define the XPath query "firstname" for the column firstname, "lastname" for the column
lastname, and "" for the column dept. Note that "" is not a valid XPath query and will lead to
execution errors.

Setting up the tLogRow components

Procedure
1. Double-click data_extracted to display its Basic settings view.

2. Select Table (print values in cells of a table) for a better display of the results.
3. Perform the same setup on the other tLogRow component, namely reject_info.

Executing the Job


Procedure
1. Press Ctrl + S to save the Job.
2. Click F6 to execute the Job.

As shown above, the reject row offers such details as the data extracted, the JSON fields whose
data is not extracted and the cause of the extraction failure.

Collecting data from your favorite online social network


In this scenario, tFileInputJSON retrieves the friends node from a JSON file that contains the data of a
Facebook user and tExtractJSONFields extracts the data from the friends node for flat data output.

952
tExtractJSONFields

Linking the components


Procedure
1. Drop the following components from the Palette onto the design workspace: tFileInputJSON,
tExtractJSONFields and tLogRow.
2. Link tFileInputJSON and tExtractJSONFields using a Row > Main connection.
3. Link tExtractJSONFields and tLogRow using a Row > Main connection.

Configuring the components


Procedure
1. Double-click tFileInputJSON to display its Basic settings view.

2. Click Edit schema to open the schema editor.

953
tExtractJSONFields

Click the [+] button to add one column, namely friends, of the String type.
Click OK to close the editor.
3. Click the [...] button to browse for the JSON file, facebook.json in this case:

{ "user": { "id": "9999912398",


"name": "Kelly Clarkson",
"friends": [
{ "name": "Tom Cruise",
"id": "55555555555555",
"likes": {
"data": [
{ "category": "Movie",
"name": "The Shawshank Redemption",
"id": "103636093053996",
"created_time": "2012-11-20T15:52:07+0000"
},
{ "category": "Community",
"name": "Positiveretribution",
"id": "471389562899413",
"created_time": "2012-12-16T21:13:26+0000"
}
]
}
},
{ "name": "Tom Hanks",
"id": "88888888888888"
"likes": {
"data": [
{ "category": "Journalist",
"name": "Janelle Wang",
"id": "136009823148851",
"created_time": "2013-01-01T08:22:17+0000"
},
{ "category": "Tv show",
"name": "Now With Alex Wagner",
"id": "305948749433410",
"created_time": "2012-11-20T06:14:10+0000"
}
]
}
}
]
}
}

4. Clear the Read by XPath check box.


In the Mapping table, enter the JSONPath query "$.user.friends[*]" next to the friends column,
retrieving the entire friends node from the source file.
5. Double-click tExtractJSONFields to display its Basic settings view.

954
tExtractJSONFields

6. Click Edit schema to open the schema editor.

7. Click the [+] button in the right panel to add five columns, namely id, name, like_id, like_name and
like_category, which will hold the data of relevant nodes in the JSON field friends.
Click OK to close the editor.
8. In the pop-up Propagate box, click Yes to propagate the schema to the subsequent components.

9. In the Loop XPath query field, enter "/likes/data".

955
tExtractJSONFields

10. In the Mapping area, type in the queries of the JSON nodes in the XPath query column. The data
of those nodes will be extracted and passed to their counterpart columns defined in the output
schema.
11. Specifically, define the XPath query "../../id" (querying the "/friends/id" node) for the column id, "../../
name" (querying the "/friends/name" node) for the column name, "id" for the column like_id, "name"
for the column like_name, and "category" for the column like_category.
12. Double-click tLogRow to display its Basic settings view.

13. Select Table (print values in cells of a table) for a better display of the results.

Executing the Job


Procedure
1. Press Ctrl + S to save the Job.
2. Click F6 to execute the Job.

As shown above, the friends data of the Facebook user Kelly Clarkson is extracted correctly.

Extracting data from a JSON file through looping


This scenario describes a Job that extracts data from a JSON file through multiple loops and displays
the data on the console.

956
tExtractJSONFields

The following lists the content of the JSON file, sample.json.

{
"Guid": "a2hdge9-5517-4e12-b9j6-887ft29e1711",
"Transactions": [
{
"TransactionId": 1,
"Products": [
{
"ProductId": "A1",
"Packs": [
{
"Quantity": 20,
"Price": 40.00,
"Due_Date": "2019/03/01"
}
]
}
]
},
{
"TransactionId": 2,
"Products": [
{
"ProductId": "B1",
"Packs": [
{
"Quantity": 1,
"Price": 15.00,
"Due_Date": "2019/01/01"
},
{
"Quantity": 21,
"Price": 315.00,
"Due_Date": "2019/02/14"
}
]
}
]
},
{
"TransactionId": 3,
"Products": [
{
"ProductId": "C1",
"Packs": [
{
"Quantity": 2,
"Price": 5.00,
"Due_Date": "2019/02/19"
},
{
"Quantity": 3,
"Price": 7.50,
"Due_Date": "2019/05/21"
}
]
}
]
}
]
}

This Job extracts the values of the following elements.


• Guid
• TransactionId
• ProductId
• Quantity
• Price
• Due-Date

957
tExtractJSONFields

Establishing the tExtractJSONFields looping Job

Procedure
1. Create a Job and add a tFileInputJSON component, three tExtractJsonFields components, and a
tLogRow component.
2. Connect the components using Row > Main connections.

Configuring tExtractJSONFields looping input

About this task


This task assumes that you know the structure of the JSON file.

Procedure
1. In the Basic settings view of the tFileInputJSON component, select JsonPath from the Read By
drop-down list.

2. In the filename field, specified the input JSON file, sample.json in this example.
3. In the schema editor, add two columns, Guid (type String) and Transactions (type Object).

958
tExtractJSONFields

4. Click Yes in the subsequent dialog box to propagate the schema to the next component.
The columns just added appear in the Mapping table of the Basic settings view.
5. In the Basic settings view, enter "$" in the Loop Json query text box to loop the elements within
the root elements.
6. In the Json query column of the Mapping table, enter the following Json query expressions in
double quotation marks.
• $.Guid to extract the value of the Guid element;
• $.Transactions to extract the content of the Transactions element.

Configuring the tExtractJSONFields components for looping

Procedure
1. In the schema editor of the first tExtractJSONFileds component, add the following columns in the
output table.
• Guid, type String;
• TransactionId, type Integer;
• Products, type Object

2. Close the schema editor and click Yes in the subsequent dialog box to propagate the schema to
the next component.
The columns just added appear in the Mapping table of the Basic settings view.

959
tExtractJSONFields

3. Set the other options in the Basic settings view as follows.


• JSON field: Transactions;
• Loop Jsonpath query: "*" (in double quotation marks);
• Guid: empty, for receiving the Guid value from the previous component;
• TransactionId: "TransactionId" (in double quotation marks);
• Products: "Products" (in double quotation marks);
• Others: unchanged

The settings loop all the elements within the Transactions element and extract the values of the
TransactionId and the Products elements.
4. In the schema editor of the second tExtractJSONFileds component, add the following columns in
the output table.
• Guid, type String;
• TransactionId, type Integer;
• ProductId, type String;
• Packs, type Object
5. Close the schema editor and click Yes in the subsequent dialog box to propagate the schema to
the next component.
The columns just added appear in the Mapping table of the Basic settings view.
6. Set the other options in the Basic settings view as follows.
• JSON field: Products;
• Loop Jsonpath query: "*" (in double quotation marks);
• Guid: empty, for receiving the Guid value from the previous component;
• TransactionId: empty, for receiving the TransactionId from the previous component;
• ProductId: "ProductId" (in double quotation marks);
• Packs: "Packs" (in double quotation marks);
• Others: unchanged
The settings in the above figure loop all the elements within the Products element and extract
the values of the ProductId and the Packs elements.

960
tExtractJSONFields

7. In the schema editor of the third tExtractJSONFileds component, add the following columns in the
output table.
• Guid, type String;
• TransactionId, type Integer;
• ProductId, type String;
• Quantity, type Integer;
• Price, type Float;
• Due_Date, type Date
8. Close the schema editor and click Yes in the subsequent dialog box to propagate the schema to
the next component.
The columns just added appear in the Mapping table of the Basic settings view.
9. Set the other options in the Basic settings view as follows.
• JSON field: Packs;
• Loop Jsonpath query: "*" (in double quotation marks);
• Guid: empty, for receiving the Guid value from the previous component;
• TransactionId: empty, for receiving the TransactionId value from the previous component;
• ProductId: empty, for receiving the ProductId value from the previous component;
• Quantity: "Quantity" (in double quotation marks);
• Price: "Price" (in double quotation marks);
• Due_Date: "Due_Date" (in double quotation marks);
• Others: unchanged
The settings in the above figure loop all the elements within the Packs element and extract the
values of the Quantity, the Price, and the Due_Date elements.

Setting the display for tExtractJSONFields values

Procedure
1. Open the Basic settings view of the tLogRow component.

2. Select the preferred option in the Mode section.

961
tExtractJSONFields

Executing tExtractJSONFields loop Job

Procedure
1. Press Ctrl+S to save the Job.
2. Press F6 to execute the Job. The following figure shows the result.

The values of the Guid element, the TransactionId element, the ProductId element, the Quantity
element, the Price element, and the Due_date element are extracted from the source JSON file
and displayed.

962
tExtractPositionalFields

tExtractPositionalFields
Extracts data and generates multiple columns from a formatted string using positional fields.
tExtractPositionalFields generates multiple columns from one column using positional fields.

tExtractPositionalFields Standard properties


These properties are used to configure tExtractPositionalFields running in the Standard Job
framework.
The Standard tExtractPositionalFields component belongs to the Processing family.
The component in this framework is available in all Talend products.

Basic settings

Field Select an incoming field from the Field list to extract.

Ignore NULL as the source data Select this check box to ignore the Null value in the source
data.
Clear this check box to generate the Null records that co
rrespond to the Null value in the source data.

Customize Select this check box to customize the data format of the
positional file and define the table columns:
Column: Select the column you want to customize.
Size: Enter the column size.
Padding char: Type in between inverted commas the
padding character used, in order for it to be removed from
the field. A space by default.
Alignment: Select the appropriate alignment parameter.

Pattern Enter the pattern to use as basis for the extraction.


A pattern is length values separated by commas,
interpreted as a string between quotes. Make sure the
values entered in this fields are consistent with the schema
defined.

Die on error Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.

963
tExtractPositionalFields

• Update repository connection: choose this option


to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Advanced settings

Advanced separator (for number) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).

Trim Column Select this check box to remove leading and trailing
whitespace from all columns.

Check each row structure against schema Select this check box to check whether the total number
of columns in each row is consistent with the schema. If
not consistent, an error message will be displayed on the
console.

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows read by an input component
or transferred to an output component. This is an After
variable and it returns an integer.
The NB_LINE variable is not available to the Map/Reduce
version.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

964
tExtractPositionalFields

Usage

Usage rule This component handles flow of data therefore it requires


input and output components. It allows you to extract data
from a delimited field, using a Row > Main link, and enables
you to create a reject flow filtering data which type does not
match the defined type.

Related scenario
For a related scenario, see Extracting name, domain and TLD from e-mail addresses on page 967.

965
tExtractRegexFields

tExtractRegexFields
Extracts data and generates multiple columns from a formatted string using regex matching.

tExtractRegexFields Standard properties


These properties are used to configure tExtractRegexFields running in the Standard Job framework.
The Standard tExtractRegexFields component belongs to the Data Quality and the Processing
families.
The component in this framework is available in all Talend products.

Basic settings

Field to split Select an incoming field from the Field to split list to split.

Regex Enter a regular expression according to the programming


language you are using.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component connected in the Job.

Warning:
Make sure that the output schema does not contain any
column with the same name as the input column to be
split. Otherwise, the regular expression will not work as
expected.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

966
tExtractRegexFields

Advanced settings

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Check each row structure against schema Select this check box to check whether the total number
of columns in each row is consistent with the schema. If
not consistent, an error message will be displayed on the
console.

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component handles flow of data therefore it requires


input and output components. It allows you to extract data
from a delimited field, using a Row > Main link, and enables
you to create a reject flow filtering data which type does not
match the defined type.

Extracting name, domain and TLD from e-mail addresses


This scenario describes a three-component Job where tExtractRegexFields is used to specify a
regular expression that corresponds to one column in the input data, email. The tExtractRegexF
ields component is used to perform the actual regular expression matching. This regular expression
includes field identifiers for user name, domain name and Top-Level Domain (TLD) name portions in
each e-mail address. If the given e-mail address is valid, the name, domain and TLD are extracted and
displayed on the console in three separate columns. Data in the other two input columns, id and age is
extracted and routed to destination as well.

967
tExtractRegexFields

Setting up the Job


Procedure
1. Drop the following components from the Palette onto the design workspace: tFileInputDelimited,
tExtractRegexFields, and tLogRow.
2. Connect tFileInputDelimited to tExtractRegexFields using a Row > Main link, and do the same to
connect tExtractRegexFields to tLogRow.

Configuring the components


Procedure
1. Double-click the tFileInputDelimited component to open its Basic settings view in the Component
tab.

2. Click the [...] button next to the File name/Stream field to browse to the file where you want to
extract information from.
The input file used in this scenario is called test4. It is a text file that holds three columns: id,
email, and age.

id;email;age
1;[email protected];24
2;[email protected];31
3;[email protected];20

For more information, see tFileInputDelimited on page 1015.


3. Click Edit schema to define the data structure of this input file.
4. Double-click the tExtractRegexFields component to open its Basic settings view.

968
tExtractRegexFields

5. Select the column to split from the Field to split list: email in this scenario.
6. Enter the regular expression you want to use to perform data matching in the Regex panel. In
this scenario, the regular expression "([a-z]*)@([a-z]*).([a-z]*)" is used to match the
three parts of an email address: user name, domain name and TLD name.
For more information about the regular expression, see http://en.wikipedia.org/wiki/
Regular_expression.
7. Click Edit schema to open the Schema of tExtractRegexFields dialog box, and click the plus button
to add five columns for the output schema.
In this scenario, we want to split the input email column into three columns in the output flow,
name, domain, and tld. The two other input columns will be extracted as they are.

8. Double-click the tLogRow component to open its Component view.


9. In the Mode area, select Table (print values in cells of a table).

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Execute the Job by pressing F6 or clicking Run on the Run tab.

969
tExtractRegexFields

Results
The tExtractRegexFields component matches all given e-mail addresses with the defined regular
expression and extracts the name, domain, and TLD names and displays them on the console in three
separate columns. The two other columns, id and age, are extracted as they are.

970
tExtractXMLField

tExtractXMLField
Reads the XML structured data from an XML field and sends the data as defined in the schema to the
following component.

tExtractXMLField Standard properties


These properties are used to configure tExtractXMLField running in the Standard Job framework.
The Standard tExtractXMLField component belongs to the Processing and the XML families.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-In or Repository.


Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: No property data stored centrally.

  Repository: Select the repository file where the properties


are stored.
When this file is selected, the fields that follow are pre-
filled in using fetched data.

Schema type and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

XML field Name of the XML field to be processed.


Related topic: see Talend Studio User Guide.

Loop XPath query Node of the XML tree, which the loop is based on.

971
tExtractXMLField

Mapping Column: reflects the schema as defined by the Schema type


field.
XPath Query: Enter the fields to be extracted from the
structured input.
Get nodes: Select this check box to recuperate the XML
content of all current nodes specified in the Xpath query
list or select the check box next to specific XML nodes to
recuperate only the content of the selected nodes.

Limit Maximum number of rows to be processed. If Limit is 0, no


rows are read or processed.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

Ignore the namespaces Select this check box to ignore namespaces when reading
and extracting the XML data.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is an intermediate component. It needs an


input and an output components.

972
tExtractXMLField

Extracting XML data from a field in a database table


This three-component scenario allows to read the XML structure included in the fields of a database
table and then extracts the data.

Procedure
Procedure
1. Drop the following components from the Palette onto the design workspace: tMysqlInput,
tExtractXMLField, and tFileOutputDelimited.
Connect the three components using Main links.

2. Double-click tMysqlInput to display its Basic settings view and define its properties.

3. If you have already stored the input schema in the Repository tree view, select Repository first
from the Property Type list and then from the Schema list to display the Repository Content
dialog box where you can select the relevant metadata.
For more information about storing schema metadata in the Repository tree view, see Talend
Studio User Guide.
If you have not stored the input schema locally, select Built-in in the Property Type and Schema
fields and enter the database connection and the data structure information manually. For more
information about tMysqlInput properties, see tMysqlInput on page 2437.
4. In the Table Name field, enter the name of the table holding the XML data, customerdetails in this
example.
Click Guess Query to display the query corresponding to your schema.
5. Double-click tExtractXMLField to display its Basic settings view and define its properties.

973
tExtractXMLField

6. Click Sync columns to retrieve the schema from the preceding component. You can click the
three-dot button next to Edit schema to view/modify the schema.
The Column field in the Mapping table will be automatically populated with the defined schema.
7. In the Xml field list, select the column from which you want to extract the XML data. In this
example, the filed holding the XML data is called CustomerDetails.
In the Loop XPath query field, enter the node of the XML tree on which to loop to retrieve data.
In the Xpath query column, enter between inverted commas the node of the XML field holding the
data you want to extract, CustomerName in this example.
8. Double-click tFileOutputDelimited to display its Basic settings view and define its properties.

9. In the File Name field, define or browse to the path of the output file you want to write the
extracted data in.
Click Sync columns to retrieve the schema from the preceding component. If needed, click the
three-dot button next to Edit schema to view the schema.
10. Save your Job and click F6 to execute it.

974
tExtractXMLField

Results

tExtractXMLField read and extracted the clients names under the node CustomerName of the
CustomerDetails field of the defined database table.

Extracting correct and erroneous data from an XML field in


a delimited file
This scenario describes a four-component Job that reads an XML structure from a delimited file,
outputs the main data and rejects the erroneous data.

Procedure
Procedure
1. Drop the following components from the Palette to the design workspace: tFileInputDelimited,
tExtractXMLField, tFileOutputDelimited and tLogRow.
Connect the first three components using Row Main links.
Connect tExtractXMLField to tLogRow using a Row Reject link.

2. Double-click tFileInputDelimited to open its Basic settings view and define the component
properties.

975
tExtractXMLField

3. Select Built-in in the Schema list and fill in the file metadata manually in the corresponding
fields.
Click the three-dot button next to Edit schema to display a dialog box where you can define the
structure of your data.
Click the plus button to add as many columns as needed to your data structure. In this example,
we have one column in the schema: xmlStr.
Click OK to validate your changes and close the dialog box.

Note:
If you have already stored the schema in the Metadata folder under File delimited, select
Repository from the Schema list and click the three-dot button next to the field to display the
Repository Content dialog box where you can select the relevant schema from the list. Click Ok
to close the dialog box and have the fields automatically filled in with the schema metadata.
For more information about storing schema metadata in the Repository tree view, see Talend
Studio User Guide.

4. In the File Name field, click the three-dot button and browse to the input delimited file you want
to process, CustomerDetails_Error in this example.
This delimited file holds a number of simple XML lines separated by double carriage return.
Set the row and field separators used in the input file in the corresponding fields, double carriage
return for the first and nothing for the second in this example.
If needed, set Header, Footer and Limit. None is used in this example.
5. In the design workspace, double-click tExtractXMLField to display its Basic settings view and
define the component properties.

976
tExtractXMLField

6. Click Sync columns to retrieve the schema from the preceding component. You can click the
three-dot button next to Edit schema to view/modify the schema.
The Column field in the Mapping table will be automatically populated with the defined schema.
7. In the Xml field list, select the column from which you want to extract the XML data. In this
example, the filed holding the XML data is called xmlStr.
In the Loop XPath query field, enter the node of the XML tree on which to loop to retrieve data.
8. In the design workspace, double-click tFileOutputDelimited to open its Basic settings view and
display the component properties.

9. In the File Name field, define or browse to the output file you want to write the correct data in,
CustomerNames_right.csv in this example.
Click Sync columns to retrieve the schema of the preceding component. You can click the three-
dot button next to Edit schema to view/modify the schema.
10. In the design workspace, double-click tLogRow to display its Basic settings view and define the
component properties.
Click Sync Columns to retrieve the schema of the preceding component. For more information on
this component, see tLogRow on page 1977.
11. Save your Job and press F6 to execute it.

977
tExtractXMLField

Results

tExtractXMLField reads and extracts in the output delimited file, CustomerNames_right, the client
information for which the XML structure is correct, and displays as well erroneous data on the console
of the Run view.

978
tFileArchive

tFileArchive
Creates a new zip, gzip, or tar.gz archive file from one or more files or folders.
The archive file can be compressed using different compression method.

tFileArchive Standard properties


These properties are used to configure tFileArchive running in the Standard Job framework.
The Standard tFileArchive component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Directory Specify the directory that contains the files to be added to


the archive file.
This field is available when zip or tar.gz is selected from the
Archive format list.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Subdirectories Select this check box if you want to add the files in the
subdirectories to the archive file.
This field is available only when zip is selected from the
Archive format list.

Source File Specify the path to the file that you want to add to the
archive file.
This field is available only when gzip is selected from the
Archive format list.

Archive file Specify the path to the archive file to be created.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Create directory if does not exist Select this check box to create the destination folder if it
does not exist.

Archive format Select an archive file format from the list: zip, gzip, or tar.gz.

Compress level Select the compression level you want to apply.


• Best: the compression quality will be optimum, but the
compression time will be long.
• Normal: the compression quality and time will be
average.
• Fast (no compression): the compression will be fast, but
the quality will be lower.

All files Select this check box if all files in the specified directory
will be added to the archive file. Clear it to specify the file(s)
you want to add to the archive file in the Files table.

979
tFileArchive

Filemask: type in the file name or the file mask using a


special character or a regular expression.
This check box is available when zip or tar.gz is selected
from the Archive format list.

Encoding Select an encoding type from the list or select CUSTOM


and define it manually. This field is compulsory for DB data
handling.
This list is available when zip is selected from the Archive
format list.

Overwrite Existing Archive This check box is selected by default. This allows you to
save an archive by replacing the existing one. But if you
clear the check box, an error is reported, the replacement
fails and the new archive cannot be saved.

Note:
When the replacement fails, the Job runs.

Encrypt files Select this check box if you want the archive file to be
password protected.
Encrypt method: select an encrypt method from the list, Java
Encrypt, Zip4j AES, or Zip4j STANDARD.
AES Key Strength: select a key strength for the Zip4j AES
method, either AES 128 or AES 256.
Enter Password: enter the encryption password.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
This check box is available only when zip is selected
from the Archive format list. With this check box selected,
the compressed archive file can be decompressed only
by the tFileUnarchive component and not by a common
archiver. For more information about tFileUnarchive, see
tFileUnarchive on page 1168.

ZIP64 mode This option allows for archives with the .zip64 extension to
be created, with three modes available:
• ASNEEDED: archives with the .zip64 extension will be
automatically created based on the file size.
• ALWAYS: archives with the .zip64 extension will be
created, no matter what size the file may be.
• NEVER: no archives with the .zip64 extension will be
created, no matter what size the file may be.
Note that if the file size or the total size of the archive
exceeds 4GB or there are more than 65536 files inside the
archive, you need to set the mode to ALWAYS.

Advanced settings

Use sync flush Select this check box to flush the compressor before
flushing the output stream. Clear this check box to flush
only the output stream.

980
tFileArchive

This check box is available when gzip or tar.gz is selected


from the Archive format list.

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables ARCHIVE_FILEPATH: the path to the archive file. This is an


After variable and it returns a string.
ARCHIVE_FILENAME: the name of the archive file. This is an
After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component must be used as a standalone component.

Connections Outgoing links (from this component to another):


Row: Main; Reject; Iterate.
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

Incoming links (from one component to this one):


Row: Main; Reject; Iterate.
Trigger: Run if; On Subjob Ok; On Subjob Error; On
component Ok; On Component Error; Synchronize;
Parallelize.

For further information regarding connections, see Talend


Studio User Guide.

Zipping files using a tFileArchive


This scenario creates a Job with a unique component. It aims at zipping files and recording them in
the selected directory.

981
tFileArchive

Procedure
Procedure
1. Drop the tFileArchive component from the Palette onto the workspace.
2. Double-click it to display its Component view.

3. In the Directory field, click the [...] button, browse your directory and select the directory or the
file you want to compress.
4. Select the Subdirectories check box if you want to include the subfolders and their files in the
archive.
5. Then, set the Archive file field, by filling the destination path and the name of your archive file.
6. Select the Create directory if not exists check box if you do not have a destination directory yet
and you want to create it.
7. In the Compress level list, select the compression level you want to apply to your archive. In this
example, we use the normal level.
8. Clear the All Files check box if you only want to zip specific files.

982
tFileArchive

9. Add a row in the table by clicking the [+] button and click the name which appears. Between two
star symbols (ie. *RG*), type part of the name of the file that you want to compress.
10. Press F6 to execute your Job.

Results
The tFileArchive has compressed the selected file(s) and created the folder in the selected directory.

983
tFileCompare

tFileCompare
Compares two files and provides comparison data based on a read-only schema.

tFileCompare Standard properties


These properties are used to configure tFileCompare running in the Standard Job framework.
The Standard tFileCompare component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component.
The schema of this component is read-only.

File to compare Filepath to the file to be checked.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Reference file Filepath to the file, the comparison is based on.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

If differences are detected, display and If no difference Type in a message to be displayed in the Run console based
detected, display on the result of the comparison.

Print to console Select this check box to display the message.

Advanced settings

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables DIFFERENCE: the result of the comparison. This is a Flow


variable and it returns a boolean.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

984
tFileCompare

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as standalone component but


it is usually linked to an output component to gather the log
data.

Connections Outgoing links (from this component to another):


Row: Main.
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

Incoming links (from one component to this one):


Row: Main; Reject; Iterate.
Trigger: Run if; On Subjob Ok; On Subjob Error; On
component Ok; On Component Error; Synchronize; Paralle
lize.

For further information regarding connections, see Talend


Studio User Guide.

Comparing unzipped files


This scenario describes a Job unarchiving a file and comparing it to a reference file to make sure it did
not change. The output of the comparison is stored into a delimited file and a message displays in the
console.

Procedure
Procedure
1. Drag and drop the following components: tFileUnarchive, tFileCompare, and tFileOutputDelimited.
2. Link the tFileUnarchive to the tFileCompare with Iterate connection.
3. Connect the tFileCompare to the output component, using a Main row link.
4. In the tFileUnarchive component Basic settings, fill in the path to the archive to unzip.
5. In the Extraction Directory field, fill in the destination folder for the unarchived file.

985
tFileCompare

6. In the tFileCompare Basic settings, set the File to compare. Press Ctrl+Space bar to display the
list of global variables. Select $_globals{tFileUnarchive_1}{CURRENT_FILEPATH} or "((String)glob
alMap.get("tFileUnarchive_1_CURRENT_FILEPATH"))" according to the language you work with, to
fetch the file path from the tFileUnarchive component.

7. And set the Reference file to base the comparison on it.


8. In the messages fields, set the messages you want to see if the files differ or if the files are
identical, for example: "[job " + JobName + "] Files differ".
9. Select the Print to Console check box, for the message defined to display at the end of the
execution.
10. The schema is read-only and contains standard information data. Click Edit schema to have a look
to it.

11. Then set the output component as usual with semi-colon as data separators.
12. Save your Job and press F6 to run it.

The message set is displayed to the console and the output shows the schema information data.

986
tFileCompare

987
tFileCopy

tFileCopy
Copies a source file or folder into a target directory.

tFileCopy Standard properties


These properties are used to configure tFileCopy running in the Standard Job framework.
The Standard tFileCopy component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

File Name Specify the path to the file to be copied.


This field does not appear when the Copy a directory check
box is selected.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Copy a directory Select this check box to copy a directory including all su
bdirectories and files in it.

Source directory Specify the source directory to copy.


This field appears only when the Copy a directory check box
is selected.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Destination directory Specify the directory to copy the source file or directory to.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Rename Select this check box if you want to rename the file copied
to the destination.
This field does not appear when the Copy a directory check
box is selected.

Destination filename Specify a new name for the file to be copied.


This field appears only when the Rename check box is
selected.

Remove source file Select this check box to remove the source file after it is
copied to the destination directory.
This field does not appear when the Copy a directory check
box is selected.

Replace existing file Select this check box to overwrite any existing file with the
newly copied file.

988
tFileCopy

This field does not appear when the Copy a directory check
box is selected.

Create the directory if it doesn't exist Select this check box to create the specified destination
directory if it does not exist.
This field does not appear when the Copy a directory check
box is selected.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables DESTINATION_FILENAME: the destination file name. This is


an After variable and it returns a string.
DESTINATION_FILEPATH: the destination file path. This is
an After variable and it returns a string.
SOURCE_DIRECTORY: the source directory. This is an After
variable and it returns a string.
DESTINATION_DIRECTORY: the destination directory. This is
an After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as a standalone component.

Connections Outgoing links (from this component to another):


Row: Main.
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

Incoming links (from one component to this one):


Row: Main; Reject; Iterate.
Trigger: Run if; On Subjob Ok; On Subjob Error; On
component Ok; On Component Error; Synchronize; Paralle
lize.

989
tFileCopy

For further information regarding connections, see Talend


Studio User Guide.

Restoring files from bin


This scenario describes a Job that iterates on a list of files in a directory, copies each file to a defined
target directory, and then removes the copied files from the source directory.

Procedure
Procedure
1. Create a new Job and add a tFileList component and a tFileCopy component by typing their names
in the design workspace or dropping them from the Palette.
2. Connect tFileList to tFileCopy using a Row > Iterate link.
3. Double-click tFileList to open its Basic settings view.

4. In the Directory field, browse to or type in the directory to iterated upon.


5. Double-click tFileCopy to open its Basic settings view.

990
tFileCopy

6. In the File Name field, press Ctrl+Space to access the global variable list and select the
tFileList_1.CURRENT_FILEPATH variable from the list to fill the field with ((String)globalMap.get("tFil
eList_1_CURRENT_FILEPATH")).
7. In the Destination directory field, browse to or type in the directory to copy each file to.
8. Select the Remove source file check box to get rid of the files that have been copied.
9. Press Ctrl+S to save your Job and press F6 to execute it.
All the files in the defined source directory are copied to the destination directory and are
removed from the source directory.

991
tFileDelete

tFileDelete
Deletes files from a given directory.

tFileDelete Standard properties


These properties are used to configure tFileDelete running in the Standard Job framework.
The Standard tFileDelete component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

File Name Path to the file to be deleted. This field is hidden when
you select the Delete folder check box or the Delete file or
folder check box.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Directory Path to the folder to be deleted. This field is available only


when you select the Delete folder check box.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

File or directory to delete Enter the path to the file or to the folder you want to
delete. This field is available only when you select the
Delete file or folder check box.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Fail on error Select this check box to prevent the main Job from being
executed if an error occurs, for example, if the file to be
deleted does not exist.

Delete Folder Select this check box to display the Directory field, where
you can indicate the path the folder to be deleted.

Delete file or folder Select this check box to display the File or directory to de
lete field, where you can indicate the path to the file or to
the folder you want to delete.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables DELETE_PATH: the path to the deleted file or folder. This is
an After variable and it returns a string.

992
tFileDelete

CURRENT_STATUS: the execution result of the component.


This is a Flow variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as standalone component.

Connections Outgoing links (from this component to another):


Row: Main.
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

Incoming links (from one component to this one):


Row: Main; Reject; Iterate.
Trigger: Run if; On Subjob Ok; On Subjob Error; On
component Ok; On Component Error; Synchronize; Paralle
lize.

For further information regarding connections, see Talend


Studio User Guide.

Deleting files
This very simple scenario describes a Job deleting files from a given directory.

Procedure
Procedure
1. Drop the following components: tFileList, tFileDelete, tJava from the Palette to the design
workspace.
2. In the tFileList Basic settings, set the directory to loop on in the Directory field.

993
tFileDelete

3. The filemask is "*.txt" and no case check is to carry out.


4. In the tFileDelete Basic settings panel, set the File Name field in order for the current file in
selection in the tFileList component be deleted. This delete all files contained in the directory, as
specified earlier.

5. press Ctrl+Space bar to access the list of global variables. In Java, the relevant variable to collect
the current file is: ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")).
6. Then in the tJava component, define the message to be displayed in the standard output (Run
console). In this Java use case, type in the Code field, the following script: System.out.println( ((S
tring)globalMap.get("tFileList_1_CURRENT_FILE"))
+ " has been deleted!" );
7. Then save your Job and press F6 to run it.

Results
The message set in the tJava component displays in the log, for each file that has been deleted
through the tFileDelete component.

994
tFileExist

tFileExist
Checks if a file exists or not.

tFileExist Standard properties


These properties are used to configure tFileExist running in the Standard Job framework.
The Standard tFileExist component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

File name/Stream Path to the file you want to check if it exists or not.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables EXISTS: the result of whether a specified file exists. This is a
Flow variable and it returns a boolean.
FILENAME: the name of the file processed. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as standalone component.

Connections Outgoing links (from this component to another):


Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

995
tFileExist

Incoming links (from one component to this one):


Row: Iterate.
Trigger: Run if; On Subjob Ok; On Subjob Error; On
component Ok; On Component Error; Synchronize; Paralle
lize.

For further information regarding connections, see Talend


Studio User Guide.

Checking for the presence of a file and creating it if it does


not exist
This scenario describes a simple Job that: checks if a given file exists, displays a graphical message
to confirm that the file does not exist, reads the input data in another given file and writes it in an
output delimited file.
A dialog box appears to confirm that the file does not exists.
Click OK to close the dialog box and continue the Job execution process. The missing file, file1 in this
scenario, got written in a delimited file in the defined place.

Dropping and linking the components


Procedure
1. Drop the following components from the Palette onto the design workspace: tFileExist,
tFileInputDelimited, tFileOutputDelimited, and tMsgBox.
2. Connect tFileExist to tFileInputDelimited using an OnSubjobOk and to tMsgBox using a Run If
link.

3. Connect tFileInputDelimited to tFileOutputDelimite using a Row Main link.

Configuring the components


Procedure
1. In the design workspace, select tFileExist and click the Component tab to define its basic settings.

996
tFileExist

2. In the File name field, enter the file path or browse to the file you want to check if it exists or not.
3. In the design workspace, select tFileInputDelimited and click the Component tab to define its
basic settings.

4. Browse to the input file you want to read to fill out the File Name field.

Warning:
If the path of the file contains some accented characters, you will get an error message when
executing your Job.

5. Set the row and field separators in their corresponding fields.


6. Set the header, footer and number of processed rows as needed. In this scenario, there is one
header in our table.
7. Set Schema to Built-in and click the Edit schema button to define the data to pass on to the
tFileOutputDelimited component. Define the data present in the file to read, file2 in this scenario.
For more information about schema types, see Talend Studio User Guide.

The schema in file2 consists of five columns: Num, Ref, Price, Quant, and tax.
8. In the design workspace, select the tFileOutputDelimited component.
9. Click the Component tab to define the basic settings of tFileOutputDelimited.

997
tFileExist

10. Set property type to Built-in.


11. In the File name field, press Ctrl+Space to access the variable list and select the global variable
FILENAME.
12. Set the row and field separators in their corresponding fields.
13. Select the Include Header check box as file2 in this scenario includes a header.
14. Set Schema to Built-in and click Sync columns to synchronize the output file schema (file1) with
the input file schema (file2).

15. In the design workspace, select the tMsgBox component.


16. Click the Component tab to define the basic settings of tMsgBox.

17. Click the If link to display its properties in the Basic settings view.
18. In the Condition panel, press Ctrl+Space to access the variable list and select the global variable
EXISTS. Type an exclamation mark before the variable to negate the meaning of the variable.

998
tFileExist

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 or click the Run button in the Run tab to execute it.

999
tFileFetch

tFileFetch
Retrieves a file through the given protocol (HTTP, HTTPS, FTP, or SMB).

tFileFetch Standard properties


These properties are used to configure tFileFetch running in the Standard Job framework.
The Standard tFileFetch component belongs to the Internet family.
The component in this framework is available in all Talend products.

Basic settings

Protocol Select the protocol you want to use from the list and fill in
the corresponding fields: http, https, ftp, smb.
The properties differ slightly depending on the type of
protocol selected. The additional fields are defined in this
table, after the basic settings.

URI Type in the URI of the site from which the file is to be
fetched.

Use cache to save resource Select this check box to save the data in the cache.
This option allows you to process the file data flow (in
streaming mode) without saving it on your drive. This is
faster and improves performance.

Domain Enter the Microsoft server domain name.


Available for the smb protocol.

Username and Password Enter the authentication information required to access the
server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
Available for the smb protocol.

Destination Directory Browse to the destination folder where the file fetched is to
be placed.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Destination Filename Enter a new name for the file fetched.


If the Upload file option in the Advanced settings view is
selected, the upload response will be saved in this file.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

1000
tFileFetch

Create full path according to URI It allows you to reproduce the URI directory path. To save
the file at the root of your destination directory, clear the
check box.
Available for the http, https and ftp protocols.

Add header Select this check box if you want to add one or more HTTP
request headers as fetch conditions. In the Headers table,
enter the name(s) of the HTTP header parameter(s) in the
Name field and the corresponding value(s) in the Value
field.
Available for the http and https protocols.

POST method This check box is selected by default. It allows you to use
the POST method. In the Parameters table, enter the name
of the variable(s) in the Name field and the corresponding
value in the Value field.
Clear the check box if you want to use the GET method.
Available for the http and https protocols.

Die on error Clear this check box to skip the rows in error and to
complete the process for the error free rows
Available for the http, https and ftp protocols.

Read Cookie Select this check box for tFileFetch to load a web
authentication cookie.
Available for the http, https, ftp and smb protocols.

Save Cookie Select this check box to save the web page authentication
cookie. This means you will not have to log on to the same
web site in the future.
Available for the http, https, ftp and smb protocols.

Cookie file Type in the full path to the file which you want to use to
save the cookie or click [...] and browse to the desired file to
save the cookie.
Available for the http, https, ftp and smb protocols.

Cookie policy Choose a cookie policy from this drop-down list. Four
options are available, BROWSER_COMPATIBILITY, DEFAULT,
NETSCAPE and RFC_2109.
Available for the http, https, ftp and smb protocols.

Single cookie header Check this box to put all cookies into one request header for
maximum compatibility among different servers.
Available for the http, https, ftp and smb protocols.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at each
component level.

Timeout Enter the number of milliseconds after which the protocol


connection should close.
Available for the http and https protocols.

1001
tFileFetch

Print response to console Select this check box to print the server response in the
console.
Available for the http and https protocols.

Upload file Select this check box to upload one or more files to the
server. For each file to be uploaded, click the [+] button
beneath the table displayed and set the following fields:
• Name: the value of the name attribute of the <input
type="file"> field in the original HTML form.
• File: the full path of the file to upload, e.g. "D:/
filefetch.txt".
• Content-Type: the content type of the file to upload.
The default value is "application/octet-
stream".
• Charset: the character set of the file to upload. The
default value is "ISO-8859-1".
Thhis option is available for the http and https protocols,
with the POST method option in the Basic settings view
selected.
With this option selected, the upload response will be saved
in the file specified in the Destination filename field in the
Basic settings view.

Enable proxy server Select this check box if you are connecting via a proxy
and complete the fields which follow with the relevant
information.
Available for the http, https and ftp protocols.

Enable NTLM Credentials Select this check box if you are using an NTLM
authentication protocol.
Domain: The client domain name.
Host: The client's IP address.
Available for the http and https protocols.

Need authentication Select this check box and enter the username and password
in the relevant fields, if they are required to access the
protocol.
Available for the http and https protocols.

Support redirection Select this check box to repeat the redirection request until
redirection is successful and the file can be retrieved.
Available for the http, https and ftp protocols.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
INPUT_STREAM: the content of the file being fetched. This
is a Flow variable and it returns an InputStream.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

1002
tFileFetch

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is generally used as a start component


to feed the input flow of a Job and is often connected to
the Job using an OnSubjobOk or OnComponentOk link,
depending on the context.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Fetching data through HTTP


This scenario describes a three-component Job which retrieves a file from an HTTP website, reads
data from the fetched file and displays the data on the console.

Dropping and linking components


Procedure
1. Drop a tFileFetch, a tFileInputDelimited and a tLogRow onto your design workspace.
2. Link tFileFetch to tFileInputDelimited using a Trigger > On Subjob Ok or On Component Ok
connection.
3. Link tFileInputDelimited to tLogRow using a Row > Main connection.

1003
tFileFetch

Configuring the components


Procedure
1. Double-click tFileFetch to open its Basic settings view.

2. Select the protocol you want to use from the list. Here, http is selected.
3. In the URI field, type in the URI where the file to be fetched can be retrieved from. You can paste
the URI directly in your browser to view the data in the file.
4. In the Destination directory field, browse to the folder where the fetched file is to be stored. In
this example, it is D:/Output.
5. In the Destination filename field, type in a new name for the file if you want it to be changed. In
this example, new.txt.
6. If needed, select the Add header check box and define one or more HTTP request headers as fetch
conditions. For example, to fetch the file only if it has been modified since 19:43:31 GMT, October
29, 1994, fill in the Name and Value fields with "If-Modified-Since" and "Sat, 29 Oct 1994 19:43:31
GMT" respectively in the Headers table. For details about HTTP request header definitions, see
Header Field Definitions.
7. Double-click tFileInputDelimited to open its Basic settings view.

8. In the File name field, type in the full path to the fetched file which had been stored locally.

1004
tFileFetch

9. Click the [...] button next to Edit schema to open the Schema dialog box. In
this example, add one column output to store the data from the fetched file.

10. Leave other settings as they are.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 or click Run on the Run tab to execute the Job.

The data of the fetched file is displayed on the console.

Reusing stored cookie to fetch files through HTTP


This scenario describes a two-component Job which logs in a given HTTP website and then using
cookie stored in a user-defined local directory, fetches data from this website.

1005
tFileFetch

Dropping and linking components


Procedure
1. Drop two tFileFetch components onto your design workspace.
2. Link the two components as subJobs using a Trigger > On Subjob Ok connection.

Configuring the components


Configuring the first subJob

Procedure
1. Double click tFileFetch_1 to open its component view.

1006
tFileFetch

2. Select the protocol you want to use from the Protocol list. Here, we use the https protocol.
3. In the URI field, type in the URI through which you can log in the website and fetch the web page
accordingly. In this example, the URI is https://www.codeproject.com/script/Members
hip/LogOn.aspx?download=true.
4. In the Destination directory field, browse to the folder where the fetched web page is to be stored.
This folder will be created on the fly if it does not exist. In this example, type in D:/download.
5. In the Destination Filename field, type in a new name for the file if you want it to be changed. In
this example, codeproject.html.
6. Under the Parameters table, click the plus button to add two rows and fill in the credentials for
accessing the desired website..
In the Name column, type in a new name respectively for the two rows. In this example, they are
Email and Password, which are required by the website you are logging in.
In the Value column, type in the authentication information.
7. Select the Save cookie check box.
8. In the Cookie file field, type in the full path to the file which you want to use to save the cookie.
In this example, it is D:/download/cookie.
9. Click Advanced settings to open its view.
10. Select the Support redirection check box so that the redirection request will be repeated until the
redirection is successful.

Configuring the second subJob

Procedure
1. Double-click tFileFetch_2 to open its Component view.

1007
tFileFetch

2. From the Protocol list, select http.


3. In the URI field, type in the address from which you fetch the files of your interest. In
this example, the address is http://www.codeproject.com/script/articles/
download.aspx?file=/KB/DLL/File_List_Downloader/FLD02June2011_Source.
zip&rp=http://www.codeproject.com/Articles/203991/File-List-Downloader
.
4. In the Destination directory field, type in the directory or browse to the folder where you want to
store the fetched files. This folder can be automatically created if it does not exist yet during the
execution process. In this example, type in D:/download.
5. In the Destination Filename field, type in a new name for the file if you want it to be changed. In
this example, source.zip.
6. Clear the POST method check box to deactivate the Parameters table.
7. Select the Read cookie check box.
8. In the Cookie file field, browse to the file which is used to save the cookie. In this example, it is
D:/download/cookie.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 or click Run on the Run tab to execute the Job.
Then, go to the local directory D:/download to check the downloaded file.

1008
tFileFetch

Related scenario
For an example of transferring data in streaming mode, see Reading data from a remote file in
streaming mode on page 1020

1009
tFileInputARFF

tFileInputARFF
Reads an ARFF file row by row to split them up into fields and then sends the fields as defined in the
schema to the next component.

tFileInputARFF Standard properties


These properties are used to configure tFileInputARFF running in the Standard Job framework.
The Standard tFileInputARFF component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file where the properties


are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a connection wizard and store the


Excel file connection parameters you set in the component's
Basic settings view.
For more information about setting up and storing file
connection parameters, see Talend Studio User Guide.

File Name Name and path of the ARFF file and/or variable to be
processed.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

1010
tFileInputARFF

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
flowcharts. Related topic: see Talend Studio User Guide.

Advanced settings

Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to read a file and separate the fields
with the specified separator.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Displaying the content of a ARFF file


This scenario describes a two-component Job in which the rows of an ARFF file are read, the delimited
data is selected and the output is displayed in the Run view.

1011
tFileInputARFF

An ARFF file looks like the following:

It is generally made of two parts. The first part describes the data structure, that is to say the rows
which begin by @attribute and the second part comprises the raw data, which follows the
expression @data.

Dropping and linking components


Procedure
1. Drop the tFileInputARFF component from the Palette onto the workspace.
2. In the same way, drop the tLogRow component.
3. Right-click the tFileInputARFF and select Row > Main in the menu. Then, drag the link to the
tLogRow, and click it. The link is created and appears.

Configuring the components


Procedure
1. Double-click the tFileInputARFF.
2. In the Component view, in the File Name field, browse your directory in order to select your .arff
file.
3. In the Schema field, select Built-In.
4. Click the [...] button next to Edit schema to add column descriptions corresponding to the file to
be read.

1012
tFileInputARFF

5. Click on the button as many times as required to create the number of columns required,
according to the source file. Name the columns as follows.

6. For every column, the Nullable check box is selected by default. Leave the check boxes selected,
for all of the columns.
7. Click OK.
8. In the workspace, double-click the tLogRow to display its Component view.

1013
tFileInputARFF

9. Click the [...] button next to Edit schema to check that the schema has been propagated. If not,
click the Sync columns button.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 to execute your Job.

The console displays the data contained in the ARFF file, delimited using a vertical line (the
default separator).

1014
tFileInputDelimited

tFileInputDelimited
Reads a delimited file row by row to split them up into fields and then sends the fields as defined in
the schema to the next component.

tFileInputDelimited Standard properties


These properties are used to configure tFileInputDelimited running in the Standard Job framework.
The Standard tFileInputDelimited component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-In or Repository.

  Built-In: No property data stored centrally.

  Repository: Select the repository file where the properties


are stored.

File Name/Stream File name: Name and path of the file to be processed.
Stream: The data flow to be processed. The data must be
added to the flow in order for tFileInputDelimited to fetch
these data via the corresponding representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using
along with this component; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.
Related topic to the available variables: see Talend Studio
User Guide

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Row separator The separator used to identify the end of a row.

Field separator Enter character, string or regular expression to separate


fields for the transferred data.

CSV options Select this check box to specify the following CSV
parameters:
• Escape char: enter the escape character between
double quotation marks.
• Text enclosure: enter the enclosure character (only
one character) between double quotation marks.
For example, """ needs to be entered when double
quotation marks (") are used as the enclosure character.

1015
tFileInputDelimited

It is recommended to use standard escape character, that


is "\". Otherwise, you should set the same character for
Escape char and Text enclosure. For example, if the escape
character is set to "\", the text enclosure can be set to any
other character. On the other hand, if the escape character
is set to other character rather than "\", the text enclosure
can be set to any other characters. However, the escape
character will be changed to the same character as the text
enclosure. For instance, if the escape character is set to "#"
and the text enclosure is set to "@", the escape character
will be changed to "@", not "#".

Header Enter the number of rows to be skipped in the beginning of


file.

Footer Number of rows to be skipped at the end of the file.

Limit Maximum number of rows to be processed. If Limit = 0, no


row is read or processed.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Note that if the input value of any non-nullable primitive
field is null, the row of data including that field will be
rejected.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Skip empty rows Select this check box to skip the empty rows.

Uncompress as zip file Select this check box to uncompress the input file.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

1016
tFileInputDelimited

To catch the FileNotFoundException, you also need to


select this check box.

Advanced settings

Advanced separator (for numbers) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).

Extract lines at random Select this check box to set the number of lines to be
extracted randomly.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

Trim all column Select this check box to remove the leading and trailing
whitespaces from all columns. When this check box is
cleared, the Check column to trim table is displayed, which
lets you select particular columns to trim.

Check each row structure against schema Select this check box to check whether the total number
of columns in each row is consistent with the schema. If
not consistent, an error message will be displayed on the
console.

Check date Select this check box to check the date format strictly
against the input schema.

Check columns to trim This table is filled automatically with the schema being
used. Select the check box(es) corresponding to the
column(s) to be trimmed.

Split row before field Select this check box to split rows before splitting fields.

Permit hexadecimal (0xNNN) or octal (0NNNN) for numeric Select this check box if any of your numeric types
types - it will act the opposite for Byte (long, integer, short, or byte type), will be parsed from a
hexadecimal or octal string.
In the table that appears, select the check box next to the
column or columns of interest to transform the input string
of each selected column to the type defined in the schema.
Select the Permit hexadecimal or octal check box to select
all the columns.
This table appears only when the Permit hexadecimal
(0xNNN) or octal (0NNNN) for numeric types - it will act the
opposite for Byte check box is selected.

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.

1017
tFileInputDelimited

ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to read a file and separate fields
contained in this file using a defined separator. It allows
you to create a data flow using a Row > Main link or via a
Row > Reject link in which case the data is filtered by data
that does not correspond to the type defined. For further
information, please see Procedure on page 975.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Reading data from a Delimited file and display the output


The following scenario creates a two-component Job, which aims at reading each row of a file,
selecting delimited data and displaying the output in the Run log console.

Dropping and linking components


Procedure
1. Drop a tFileInputDelimited component and a tLogRow component from the Palette to the design
workspace.
2. Right-click on the tFileInputDelimited component and select Row > Main. Then drag it onto the
tLogRow component and release when the plug symbol shows up.

1018
tFileInputDelimited

Configuring the components


Procedure
1. Select the tFileInputDelimited component again, and define its Basic settings:

2. Fill in a path to the file in the File Name field. This field is mandatory.

Warning:
If the path of the file contains some accented characters, you will get an error message when
executing your Job.

3. Define the Row separator allowing to identify the end of a row. Then define the Field separator
used to delimit fields in a row.
4. In this scenario, the header and footer limits are not set. And the Limit number of processed rows
is set on 50.
5. Set the Schema as either a local (Built-in) or a remotely managed (Repository) to define the data
to pass on to the tLogRow component.
6. You can load and/or edit the schema via the Edit Schema function.
Related topics: see Talend Studio User Guide.
7. Enter the encoding standard the input file is encoded in. This setting is meant to ensure encoding
consistency throughout all input and output files.
8. Select the tLogRow and define the Field separator to use for the output display. Related topic:
tLogRow on page 1977.
9. Select the Print schema column name in front of each value check box to retrieve the column
labels in the output displayed.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Go to Run tab, and click on Run to execute the Job.
The file is read row by row and the extracted fields are displayed on the Run log as defined in
both components Basic settings.

1019
tFileInputDelimited

The Log sums up all parameters in a header followed by the result of the Job.

Reading data from a remote file in streaming mode


This scenario describes a four component Job used to fetch data from a voluminous file almost as
soon as it has been read. The data is displayed in the Run view. The advantage of this technique is
that you do not have to wait for the entire file to be downloaded, before viewing the data.

Dropping and linking components


Procedure
1. Drop the following components onto the workspace: tFileFetch, tSleep, tFileInputDelimited, and
tLogRow.
2. Connect tSleep and tFileInputDelimited using a Trigger > OnComponentOk link and connect
tFileInputDelimited to tLogRow using a Row > Main link.

Configuring the components


Procedure
1. Double-click tFileFetch to display the Basic settings tab in the Component view and set the
properties.

1020
tFileInputDelimited

2. From the Protocol list, select the appropriate protocol to access the server on which your data is
stored.
3. In the URI field, enter the URI required to access the server on which your file is stored.
4. Select the Use cache to save the resource check box to add your file data to the cache memory.
This option allows you to use the streaming mode to transfer the data.
5. In the workspace, click tSleep to display the Basic settings tab in the Component view and set the
properties.
By default, tSleep's Pause field is set to 1 second. Do not change this setting. It pauses the second
Job in order to give the first Job, containing tFileFetch, the time to read the file data.
6. In the workspace, double-click tFileInputDelimited to display its Basic settings tab in the
Component view and set the properties.

7. In the File name/Stream field:


- Delete the default content.
- Press Ctrl+Space to view the variables available for this component.
- Select tFileFetch_1_INPUT_STREAM from the auto-completion list, to add the following
variable to the Filename field: ((java.io.InputStream)globalMap.get("tFile
Fetch_1_INPUT_STREAM")).

1021
tFileInputDelimited

8. From the Schema list, select Built-in and click [...] next to the Edit schema field to describe the
structure of the file that you want to fetch. The US_Employees file is composed of six columns: ID,
Employee, Age, Address, State, EntryDate.
Click [+] to add the six columns and set them as indicated in the above screenshot. Click OK.

9. In the workspace, double-click tLogRow to display its Basic settings in the Component view and
click Sync Columns to ensure that the schema structure is properly retrieved from the preceding
component.

Configuring Job execution and executing the Job


Procedure
1. Click the Job tab and then on the Extra view.

2. Select the Multi thread execution check box in order to run the two Jobs at the same time. Bear
in mind that the second Job has a one second delay according to the properties set in tSleep.
This option allows you to fetch the data almost as soon as it is read by tFileFetch, thanks to the
tFileDelimited component.
3. Save the Job and press F6 to run it.

1022
tFileInputDelimited

The data is displayed in the console as almost as soon as it is read.

1023
tFileInputExcel

tFileInputExcel
Reads an Excel file row by row to split them up into fields using regular expressions and then sends
the fields as defined in the schema to the next component.

tFileInputExcel Standard properties


These properties are used to configure tFileInputExcel running in the Standard Job framework.
The Standard tFileInputExcel component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file where the properties


are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a connection wizard and store the


Excel file connection parameters you set in the component
Basic settings view.
For more information about setting up and storing file
connection parameters, see Talend Studio User Guide.

Read excel2007 file format (xlsx / xlsm) Select this check box to read the .xlsx or .xlsm file of Excel
2007.

File Name/Stream File name: Name of the file and/or the variable to be
processed.
Stream: Data flow to be processed. The data must be added
to the flow in order to be collected by tFileInputExcel via
the INPUT_STREAM variable in the auto-completion list
(Ctrl+Space).
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Password Provide the password set for the Excel file in double
quotation marks by clicking the three-dot button to the
right of this frame.
This field is for Excel 2007 (and higher versions) files
protected by passwords and is available when Read
excel2007 file format(xlsx) is selected.
This component supports standard encryption and agile
encryption.

All sheets Select this check box to process all sheets of the Excel file.

1024
tFileInputExcel

Sheet list Click the plus button to add as many lines as needed to the
list of the excel sheets to be processed:
Sheet (name or position): enter the name or position of the
excel sheet to be processed.
Use Regex: select this check box if you want to use a regular
expression to filter the sheets to process.

Header Enter the number of rows to be skipped in the beginning of


file.

Footer Number of records to be skipped at the end of the file.

Limit Maximum number of lines to be processed.

Affect each sheet(header&footer) Select this check box if you want to apply the parameters
set in the Header and Footer fields to all excel sheets to be
processed.

Note: This option is only available when you select


Memory-consuming (User mode) from the Generation
mode drop-down list in the Advanced settings view.

Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.

First column and Last column Define the range of the columns to be processed through
setting the first and last columns in the First column and
Last column fields respectively.

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
flowcharts. Related topic: see Talend Studio User Guide.

1025
tFileInputExcel

Advanced settings

Advanced separator Select this check box to change the used data separators.

Trim all columns Select this check box to remove the leading and trailing
whitespaces from all columns. When this check box is
cleared, the Check column to trim table is displayed, which
lets you select particular columns to trim.

Check column to trim This table is filled automatically with the schema being
used. Select the check box(es) corresponding to the
column(s) to be trimmed.

Convert date column to string Available when Read excel2007 file format (xlsx) is
selected in the Basic settings view.
Select this check box to show the table Check need convert
date column. Here you can parse the string columns that
contain date values based on the given date pattern.
Column: all the columns available in the schema of the
source .xlsx file.
Convert: select this check box to choose all the columns for
conversion (only if they are all of the string type). You can
also select the individual check box next to each column for
conversion.
Date pattern: set the date format here.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

Read real values for numbers Select this check box to read numbers in real values. This
check box becomes unavailable when you select Read
excel2007 file format (xlsx) in the Basic settings view.

Stop reading on encountering empty rows Select this check box to ignore the empty line encountered
and, if there are any, the lines that follow this empty line.
This check box becomes unavailable when you select Read
excel2007 file format (xlsx) in the Basic settings view.

Generation mode Available when Read excel2007 file format (xlsx) is


selected in the Basic settings view. Select the mode used to
read the Excel 2007 file.
• Less memory consumed for large excel(Event mode):
used for large file. This is a memory-saving mode to
read the Excel 2007 file as a flow. This option helps
prevent Job failure with an out-of-memory error due to
high memory consumption when reading large Excel
files.
With this mode selected, the data will be extracted
with the format symbol, for example, the percent
symbol % and the currency symbol $. Moreover, the
Include phonetic runs check box is selected by default
to allow you to use phonetic strings at index.
• Memory-consuming (User mode): used for small file. It
needs much memory. With this mode selected, the pure
data without the format symbol will be extracted.

1026
tFileInputExcel

Don't validate the cells Select this check box to in order not to validate data. This
check box becomes unavailable when you select Read
excel2007 file format (xlsx) in the Basic settings view.

Ignore the warning Select this check box to ignore all warnings generated to
indicate errors in the Excel file. This check box becomes
unavailable when you select Read excel2007 file format
(xlsx) in the Basic settings view.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
CURRENT_SHEET: the name of the sheet being processed.
This is a Flow variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to read an Excel file and to output the
data separately depending on the schemas identified in the
file. You can use a Row > Reject link to filter the data which
doesn't correspond to the type defined. For an example of
how to use these two links, see Procedure on page 975.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
No scenario is available for the Standard version of this component yet.

1027
tFileInputFullRow

tFileInputFullRow
Reads a file row by row and sends complete rows of data as defined in the schema to the next
component via a Row link.

tFileInputFullRow Standard properties


These properties are used to configure tFileInputFullRow running in the Standard Job framework.
The Standard tFileInputFullRow component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

File Name Specify the path to the file to be processed.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Row separator The separator used to identify the end of a row.

Header Enter the number of rows to be skipped in the beginning of


file.

Footer Enter the number of rows to be skipped at the end of the


file.

1028
tFileInputFullRow

Limit Enter the maximum number of rows to be processed. If the


value is set to 0, no row is read or processed.

Skip empty rows Select this check box to skip the empty rows.

Advanced settings

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

Extract lines at random Select this check box to set the number of lines to be
extracted randomly.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to read full rows in delimited files that
can get very large.

Reading full rows in a delimited file


The following scenario creates a two-component Job that aims at reading complete rows in the
delimited file states.csv and displaying the rows on the console.
The content of the file states.csv that holds ten rows of data is as follows:
StateID;StateName
1;Alabama
2;Alaska
3;Arizona
4;Arkansas

1029
tFileInputFullRow

5;California
6;Colorado
7;Connecticut
8;Delaware
9;Florida
10;Georgia

Reading full rows in a delimited file


Procedure
1. Create a new Job and add a tFileInputFullRow component and a tLogRow component by typing
their names in the design workspace or dropping them from the Palette.
2. Link the tFileInputFullRow component to the tLogRow component using a Row > Main
connection.

3. Double-click the tFileInputFullRow component to open its Basic settings view on the Component
tab.

4. Click the [...] button next to Edit schema to view the data to be passed onto the tLogRow
component. Note that the schema is read-only and it consists of only one column line.

5. In the File Name field, browse to or enter the path to the file to be processed. In this scenario, it is
E:/states.csv.
6. In the Row Separator field, enter the separator used to identify the end of a row. In this example,
it is the default value \n.

1030
tFileInputFullRow

7. In the Header field, enter 1 to skip the header row at the beginning of the file.
8. Double-click the tLogRow component to open its Basic settings view on the Component tab.

In the Mode area, select Table (print values in cells of a table) for better readability of the result.
9. Press Ctrl+S to save your Job and then F6 to execute it.

As shown above, ten rows of data in the delimited file states.csv are read one by one, ignoring
field separators, and the complete rows of data are displayed on the console.
To extract fields from rows, you must use tExtractDelimitedFields, tExtractPositionalFields,
or tExtractRegexFields. For more information, see tExtractDelimitedFields on page 937,
tExtractPositionalFields on page 963 and tExtractRegexFields on page 966.

1031
tFileInputJSON

tFileInputJSON
Extracts JSON data from a file and transfers the data to a file, a database table, etc.

tFileInputJSON Standard properties


These properties are used to configure tFileInputJSON running in the Standard Job framework.
The Standard tFileInputJSON component belongs to the Internet and the File families.
The component in this framework is available in all Talend products.

Basic settings

Property Type Either Built-In or Repository.

  Built-In: No property data stored centrally.

  Repository: Select the repository file where the properties


are stored.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.

Note: If you make changes, the schema automatically


becomes built-in.

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Read By Select a way of extracting JSON data in the file.


• JsonPath: Extracts JSON data based on the JSONPath
query. With this option selected, you need to select a
JSONPath API version from the API version drop-down
list. It is recommended to read data by JSONPath in
order to gain better performance.
• Xpath: Extracts JSON data based on the XPath query.
• JsonPath without loop: Extracts JSON data based on the
JSONPath query without setting a loop node.

1032
tFileInputJSON

Use Url Select this check box to retrieve data directly from the Web.

URL Enter the URL path from which you will retrieve data.
This field is available only when the Use Url check box is
selected.

Filename Specify the file from which you will retrieve data.
This field is not visible if the Use Url check box is selected.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Loop Jsonpath query Enter the path pointing to the node within the JSON field,
on which the loop is based.
Note if you have selected Xpath from the Read by drop-
down list, the Loop Xpath query field is displayed instead.

Mapping Complete this table to map the columns defined in the


schema to the corresponding JSON nodes.
• Column: The Column cells are automatically filled with
the defined schema column names.
• Json query/JSONPath query: Specify the JSONPath
node that holds the desired data. For more information
about JSONPath expressions, see http://goessner.net/
articles/JsonPath/.
This column is available only when JsonPath is
selected from the Read By list.
• XPath query: Specify the XPath node that holds the
desired data.
This column is available only when Xpath is selected
from the Read By list.
• Get Nodes: Select this check box to extract the JSON
data of all the nodes or select the check box next to a
specific node to extract the data of that node.
This column is available only when Xpath is selected
from the Read By list.

Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.

Advanced settings

Advanced separator (for numbers) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.

Use the loop node as root Select this check box to use the loop node as the root for
querying the file.
The loop node is set in the Loop Json query text frame in
the Basic Settings view. If this option is checked, only the
child elements of the loop node are available for querying;

1033
tFileInputJSON

otherwise, both the parent elements and the child elements


of the loop node can be queried. You can specify a parent
element through JSON path syntax.
This check box is available only when JsonPath is selected
in the Read By drop-down list of the Basic settings view.

Validate date Select this check box to check the date format strictly
against the input schema.
This check box is available only if the Read By XPath check
box is selected.

Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is a start component of a Job and always


needs an output link.

Extracting JSON data from a file using JSONPath without


setting a loop node
This scenario describes a two-component Job that extracts data from the JSON file Store.json by
specifying the complete JSON path for each node of interest and displays the flat data extracted on
the console.

1034
tFileInputJSON

The JSON file Store.json contains information about a department store and the content of the file is
as follows:

{"store": {
"name": "Sunshine Department Store",
"address": "Wangfujing Street",
"goods": {
"book": [
{
"category": "Reference",
"title": "Sayings of the Century",
"author": "Nigel Rees",
"price": 8.88
},
{
"category": "Fiction",
"title": "Sword of Honour",
"author": "Evelyn Waugh",
"price": 12.66
}
],
"bicycle": {
"type": "GIANT OCR2600",
"color": "White",
"price": 276
}
}
}}

In the following example, we will extract the store name, the store address, and the bicycle
information from this file.

Adding and linking the components


Procedure
1. Create a new Job and add a tFileInputJSON component and a tLogRow component by typing their
names in the design workspace or dropping them from the Palette.
2. Link the tFileInputJSON component to the tLogRow component using a Row > Main connection.

Configuring the components


Procedure
1. Double-click the tFileInputJSON component to open its Basic settings view.

1035
tFileInputJSON

2. Select JsonPath without loop from the Read By drop-down list. With this option, you need to
specify the complete JSON path for each node of interest in the JSONPath query fields of the
Mapping table.
3. Click the [...] button next to Edit schema to open the schema editor.

4. Click the [+] button to add five columns, store_name, store_address, bicycle_type, and bicycle_color
of String type, and bicycle_price of Double type.
Click OK to close the schema editor. In the pop-up dialog box, click Yes to propagate the schema
to the subsequent component.
5. In the Filename field, specify the path to the JSON file that contains the data to be extracted. In
this example, it is "E:/Store.json".
6. In the Mapping table, the Column fields are automatically filled with the schema columns you
have defined.
In the JSONPath query fields, enter the JSONPath query expressions between double quotation
marks to specify the nodes that hold the desired data.

1036
tFileInputJSON

• For the columns store_name and store_address, enter the JSONPath query expressions
"$.store.name" and "$.store.address" relative to the nodes name and address respectively.
• For the columns bicycle_type, bicycle_color, and bicycle_price, enter the JSONPath query
expressions "$.store.goods.bicycle.type", "$.store.goods.bicycle.color", and "$.store.goods
.bicycle.price" relative to the child nodes type, color, and price of the bicycle node respectively.
7. Double-click the tLogRow component to display its Basic settings view.

8. In the Mode area, select Table (print values in cells of a table) for better readability of the result.

Executing the Job


Procedure
1. Press Ctrl+S to save the Job.
2. Press F6 to execute the Job.

As shown above, the store name, the store address, and the bicycle information are extracted
from the source JSON data and displayed in a flat table on the console.

Extracting JSON data from a file using JSONPath


Based on Extracting JSON data from a file using JSONPath without setting a loop node on page 1034,
this scenario extracts data under the book array of the JSON file Store.json by specifying a loop node
and the relative JSON path for each node of interest, and then displays the flat data extracted on the
console.

Procedure
Procedure
1. In the Studio, open the Job used in Extracting JSON data from a file using JSONPath without
setting a loop node on page 1034 to display it in the design workspace.

1037
tFileInputJSON

2. Double-click the tFileInputJSON component to open its Basic settings view.

3. Select JsonPath from the Read By drop-down list.


4. In the Loop Json query field, enter the JSONPath query expression between double quotation
marks to specify the node on which the loop is based. In this example, it is "$.store.goods.book[*]".
5. Click the [...] button next to Edit schema to open the schema editor.

Select the five columns added previously and click the x button to remove all of them.
Click the [+] button to add four columns, book_title, book_category, and book_author of String type,
and book_price of Double type.
Click OK to close the schema editor. In the pop-up dialog box, click Yes to propagate the schema
to the subsequent component.
6. In the Json query fields of the Mapping table, enter the JSONPath query expressions between
double quotation marks to specify the nodes that hold the desired data. In this example, enter the
JSONPath query expressions "title", "category", "author", and "price" relative to the four child nodes
of the book node respectively.
7. Press Ctrl+S to save the Job.

1038
tFileInputJSON

8. Press F6 to execute the Job.

As shown above, the book information is extracted from the source JSON data and displayed in a
flat table on the console.

Extracting JSON data from a file using XPath


Based on Extracting JSON data from a file using JSONPath without setting a loop node on page 1034,
this scenario extracts the store name and the book information from the JSON file Store.json using
XPath queries and displays the flat data extracted on the console.

Procedure
Procedure
1. In the Studio, open the Job used in Extracting JSON data from a file using JSONPath without
setting a loop node on page 1034 to display it in the design workspace.
2. Double-click the tFileInputJSON component to open its Basic settings view.

3. Select Xpath from the Read By drop-down list.


4. Click the [...] button next to Edit schema to open the schema editor.

1039
tFileInputJSON

Select the five columns added previously and click the x button to remove all of them.
Click the [+] button to add five columns, store_name, book_title, book_category, and book_author of
String type, and book_price of Double type.
Click OK to close the schema editor. In the pop-up dialog box, click Yes to propagate the schema
to the subsequent component.
5. In the Loop XPath query field, enter the XPath query expression between double quotation marks
to specify the node on which the loop is based. In this example, it is "/store/goods/book".
6. In the XPath query fields of the Mapping table, enter the XPath query expressions between do
uble quotation marks to specify the nodes that hold the desired data.
• For the column store_name, enter the XPath query "../../name" relative to the name node.
• For the columns book_title, book_category, book_author, and book_price, enter the XPath query
expressions "title", "category", "author", and "price" relative to the four child nodes of the book
node respectively.
7. Press Ctrl+S to save the Job.
8. Press F6 to execute the Job.

As shown above, the store name and the book information are extracted from the source JSON
data and displayed in a flat table on the console.

Extracting JSON data from a URL


In this scenario, tFileInputJSON retrieves data of the friends node from the JSON file facebook.json on
the Web that contains the data of a Facebook user and tExtractJSONFields extracts the data from the
friends node for flat data output.

1040
tFileInputJSON

The JSON file facebook.json is deployed on the Tomcat server, specifically, located in the folder
<tomcat path>/webapps/docs, and the content of the file is as follows:

{"user": {
"id": "9999912398",
"name": "Kelly Clarkson",
"friends": [
{
"name": "Tom Cruise",
"id": "55555555555555",
"likes": {"data": [
{
"category": "Movie",
"name": "The Shawshank Redemption",
"id": "103636093053996",
"created_time": "2012-11-20T15:52:07+0000"
},
{
"category": "Community",
"name": "Positiveretribution",
"id": "471389562899413",
"created_time": "2012-12-16T21:13:26+0000"
}
]}
},
{
"name": "Tom Hanks",
"id": "88888888888888",
"likes": {"data": [
{
"category": "Journalist",
"name": "Janelle Wang",
"id": "136009823148851",
"created_time": "2013-01-01T08:22:17+0000"
},
{
"category": "Tv show",
"name": "Now With Alex Wagner",
"id": "305948749433410",
"created_time": "2012-11-20T06:14:10+0000"
}
]}
}
]
}}

Adding and linking the components


Procedure
1. Create a new Job and add a tFileInputJSON component, a tExtractJSONFields component, and two
tLogRow components by typing their names in the design workspace or dropping them from the
Palette.
2. Link the tFileInputJSON component to the first tLogRow component using a Row > Main connecti
on.
3. Link the first tLogRow component to the tExtractJSONFields component using a Row > Main
connection.

1041
tFileInputJSON

4. Link the tExtractJSONFields component to the second tLogRow component using a Row > Main
connection.

Configuring the components


Procedure
1. Double-click the tFileInputJSON component to open its Basic settings view.

2. Select JsonPath without loop from the Read By drop-down list. Then select the Use Url check box
and in the URL field displayed enter the URL of the file facebook.json from which the data
will be retrieved. In this example, it is http://localhost:8080/docs/facebook.json.
3. Click the [...] button next to Edit schema and in the Schema dialog box define the schema by
adding one column friends of String type.

Click OK to close the dialog box and accept the propogation prompted by the pop-up dialog box.
4. In the Mapping table, enter the JSONPath query "$.user.friends[*]" next to the friends
column to retrieve the entire friends node from the source file.
5. Double-click tExtractJSONFields to open its Basic settings view.

1042
tFileInputJSON

6. Select Xpath from the Read By drop-down list.


7. In the Loop XPath query field, enter the XPath expression between double quotation marks to
specify the node on which the loop is based. In this example, it is "/likes/data".
8. Click the [...] button next to Edit schema and in the Schema dialog box define the schema by
adding five columns of String type, id, name, like_id, like_name, and like_category, which will hold
the data of relevant nodes under the JSON field friends.

Click OK to close the dialog box and accept the propogation prompted by the pop-up dialog box.
9. In the XPath query fields of the Mapping table, type in the XPath query expressions between
double quotation marks to specify the JSON nodes that hold the desired data. In this example:
• "../../id" (querying the "/friends/id" node) for the column id,
• "../../name" (querying the "/friends/name" node) for the column name,
• "id" for the column like_id,
• "name" for the column like_name, and
• "category" for the column like_category.
10. Double-click the second tLogRow component to open its Basic settings view.

1043
tFileInputJSON

In the Mode area, select Table (print values in cells of a table) for better readability of the result.

Executing the Job


Procedure
1. Press Ctrl + S to save the Job.
2. Click F6 to execute the Job.

As shown above, the friends data in the JSON file specified using the URL is extracted and then
the data from the node friends is extracted and displayed in a flat table.

1044
tFileInputLDIF

tFileInputLDIF
Reads an LDIF file row by row to split them up into fields and sends the fields as defined in the
schema to the next component using a Row connection.

tFileInputLDIF Standard properties


These properties are used to configure tFileInputLDIF running in the Standard Job framework.
The Standard tFileInputLDIF component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file where the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

File Name Name of the file and/or variable to be processed.


For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

add operation as prefix when the entry is modify type Select this check box to display the operation mode.

Value separator Type in the separator required for parsing data in the given
file. By default, the separator used is ",".

Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.

Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon

1045
tFileInputLDIF

completion and choose this schema metadata again in


the Repository Content window.

Advanced settings

Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.

Use field options (for Base64 decode checked) Select this check box to specify the Base64-encoded
columns of the input flow. Once selected, this check box
activates the Decode Base64 encoding values table to ena
ble you to precise the columns to be decoded from Base64.

Note:
The data type of the columns to be handled by this check
box is byte that you define in the input schema editor.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to read full rows in a voluminous LDIF
file. This component enables you to create a data flow,
using a Row > Main link, and to create a reject flow with
a Row > Reject link filtering the data which type does
not match the defined type. For an example of usage, see
Procedure on page 1096 from tFileInputXML.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can

1046
tFileInputLDIF

find more details about how to install external modules in


Talend Help Center (https://help.talend.com).

Related scenario
For a related scenario, see Writing data from a database table into an LDIF file on page 1133.

1047
tFileInputMail

tFileInputMail
Reads the standard key data of a given MIME or MSG email file.

tFileInputMail Standard properties


These properties are used to configure tFileInputMail running in the Standard Job framework.
The Standard tFileInputMail component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

File Name Specify the email file to read and extract data from.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
flowcharts. Related topic: see Talend Studio User Guide.

Mail type Select a type of email from the drop-down list, either MIME
or MSG.

Attachment export directory Specify the directory to which you want to export email
attachments.

Mail parts Specify the header fields to extract from the MIME email file
specified in the File Name field.
• Column: The Column cells are automatically filled with
the column names defined in the schema.

1048
tFileInputMail

• Mail part: Type in the names of the header fields or


body parts to be extracted from the email file in double
quotation marks. Refer to https://tools.ietf.org/html/
rfc4021 for a list of MIME mail header fields.
• Multi value: Select this check box to allow multiple
field values.
• Separator: Enter a character as the separators for
multiple field values.
This table appears only when MIME is selected from the
Mail type drop-down list.

MSG Mail parts Specify what to extract from the defined MSG email file for
each schema column.
• Column: The Column cells are automatically filled with
the column name defined in the schema.
• Mail part: Click each cell and then select an email part
to be extracted.
This table appears only when MSG is selected from the Mail
type drop-down list.

Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables EXPORTED_FILE_PATH: the directory to export mail


attachment. This is a Flow variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component handles flow of data therefore it requires


output. It is defined as an intermediary step.

1049
tFileInputMail

Extracting key fields from an email


This Java scenario describes a two-component Job that extracts some key standard fields and displays
the values on the Run console.

Procedure
Procedure
1. Drop a tFileInputMail and a tLogRow component from the Palette to the design workspace.
2. Connect the two components together using a Main Row link.
3. Double-click tFileInputMail to display its Basic settings view and define the component
properties.

4. Click the three-dot button next to the File Name field and browse to the mail file to be processed.
5. Set schema type to Built-in and click the three-dot button next to Edit schema to open a dialog
box where you can define the schema including all columns you want to retrieve on your output.
6. Click the plus button in the dialog box to add as many columns as you want to include in the
output flow. In this example, the schema has four columns: Date, Author, Object and Status.
7. Once the schema is defined, click OK to close the dialog box and propagate the schema into the
Mail parts table.
8. Click the three-dot button next to Attachment export directory and browse to the directory in
which you want to export email attachments, if any.
9. In the Mail part column of the Mail parts table, type in the actual header or body standard keys
that will be used to retrieve the values to be displayed.
10. Select the Multi Value check box next to any of the standard keys if more than one value for the
relative standard key is present in the input file.

1050
tFileInputMail

11. If needed, define a separator for the different values of the relative standard key in the Separator
field.
12. Double-click tLogRow to display its Basic settings view and define the component properties in
order for the values to be separated by a carriage return. On Windows OS, type in \n between
double quotes.
13. Save your Job and press F6 to execute it and display the output flow on the console.

Results
The header key values are extracted as defined in the Mail parts table. Mail reception date, author,
subject and status are displayed on the console.

1051
tFileInputMSDelimited

tFileInputMSDelimited
Reads the data structures (schemas) of a multi-structured delimited file and sends the fields as
defined in the different schemas to the next components using Row connections.

tFileInputMSDelimited Standard properties


These properties are used to configure tFileInputMSDelimited running in the Standard Job framework.
The Standard tFileInputMSDelimited component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Multi Schema Editor The Multi Schema Editor helps to build and configure the
data flow in a multi-structure delimited file to associate
one schema per output.
For more information, see The Multi Schema Editor on page
1053.

Output Lists all the schemas you define in the Multi Schema
Editor, along with the related record type and the field
separator that corresponds to every schema, if different field
separators are used.

Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows.

Advanced settings

Trim all column Select this check box to remove leading and trailing
whitespaces from defined columns.

Validate date Select this check box to check the date format strictly
against the input schema.

Advanced separator (for numbers) Select this check box to modify the separators used for nu
mbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the

1052
tFileInputMSDelimited

Die on error check box is cleared, if the component has this


check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to read multi-structured delimited files


and separate fields contained in these files using a defined
separator.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

The Multi Schema Editor


The Multi Schema Editor enables you to:
• set the path to the source file,

Warning: Use absolute path (instead of relative path) for this field to avoid possible errors.
• define the source file properties,
• define data structure for each of the output schemas.
When you define data structure for each of the output schemas in the Multi Schema Editor, column
names in the different data structures automatically appear in the input schema lists of the
components that come after tFileInputMSDelimited. However, you can still define data structures
directly in the Basic settings view of each of these components.
The Multi Schema Editor also helps to declare the schema that should act as the source schema
(primary key) from the incoming data to insure its unicity.The editor uses this mapping to associate all
schemas processed in the delimited file to the source schema in the same file.
The editor opens with the first column, that usually holds the record type indicator, selected by
default. However, once the editor is open, you can select the check box of any of the schema columns
to define it as a primary key.
The below figure illustrates an example of the Multi Schema Editor.

1053
tFileInputMSDelimited

For detailed information about the usage of the Multi Schema Editor, see Reading a multi structure
delimited file on page 1054.

Reading a multi structure delimited file


The following scenario creates a Java Job which aims at reading three schemas in a delimited file and
displaying their data structure on the Run Job console.
The delimited file processed in this example looks like the following:

1054
tFileInputMSDelimited

Dropping and linking components


Procedure
1. Drop a tFileInputMSDelimited component and three tLogRow components from the Palette onto
the design workspace.
2. In the design workspace, right-click tFileInputMSDelimited and connect it to tLogRow1,
tLogRow2, and tLogRow3 using the row_A_1, row_B_1, and row_C_1 links respectively.

Configuring the components


Procedure
1. Double-click tFileInputMSDelimited to open the Multi Schema Editor.

1055
tFileInputMSDelimited

2. Click Browse... next to the File name field to locate the multi schema delimited file you need to
process.
3. In the File Settings area:
-Select from the list the encoding type the source file is encoded in. This setting is meant to
ensure encoding consistency throughout all input and output files.
-Select the field and row separators used in the source file.

Note:
Select the Use Multiple Separator check box and define the fields that follow accordingly if
different field separators are used to separate schemas in the source file.

A preview of the source file data displays automatically in the Preview panel.

1056
tFileInputMSDelimited

Note:
Column 0 that usually holds the record type indicator is selected by default. However, you can
select the check box of any of the other columns to define it as a primary key.

4. Click Fetch Codes to the right of the Preview panel to list the type of schema and records you
have in the source file. In this scenario, the source file has three schema types (A, B, C).
Click each schema type in the Fetch Codes panel to display its data structure below the Preview
panel.
5. Click in the name cells and set column names for each of the selected schema.
In this scenario, column names read as the following:
-Schema A: Type, DiscName, Author, Date,
-Schema B: Type, SongName,

1057
tFileInputMSDelimited

-Schema C: Type, LibraryName.


You need now to set the primary key from the incoming data to insure its unicity (DiscName in this
scenario). To do that:
6. In the Fetch Codes panel, select the schema holding the column you want to set as the primary
key (schema A in this scenario) to display its data structure.
7. Click in the Key cell that corresponds to the DiscName column and select the check box that app
ears.

8. Click anywhere in the editor and the false in the Key cell will become true.
You need now to declare the parent schema by which you want to group the other "children"
schemas (DiscName in this scenario). To do that:
9. In the Fetch Codes panel, select schema B and click the right arrow button to move it to the right.
Then, do the same with schema C.

Note:
The Cardinality field is not compulsory. It helps you to define the number (or range) of fields
in "children" schemas attached to the parent schema. However, if you set the wrong number or
range and try to execute the Job, an error message will display.

10. In the Multi Schema Editor, click OK to validate all the changes you did and close the editor.
The three defined schemas along with the corresponding record types and field separators display
automatically in the Basic settings view of tFileInputMSDelimited.

1058
tFileInputMSDelimited

The three schemas you defined in the Multi Schema Editor are automatically passed to the three
tLogRow components.
11. If needed, click the Edit schema button in the Basic settings view of each of the tLogRow
components to view the input and output data structures you defined in the Multi Schema Editor
or to modify them.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 or click Run on the Run tab to execute the Job.
The multi schema delimited file is read row by row and the extracted fields are displayed on the
Run Job console as defined in the [Multi Schema Editor].

1059
tFileInputMSDelimited

1060
tFileInputMSPositional

tFileInputMSPositional
Reads the data structures (schemas) of a multi-structured positional file and sends the fields as
defined in the different schemas to the next components using Row connections.

tFileInputMSPositional Standard properties


These properties are used to configure tFileInputMSPositional running in the Standard Job framework.
The Standard tFileInputMSPositional component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file where the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

File name/Stream Name of the file and/or the variable to be processed


For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Row separator String (ex: "\n"on Unix) to distinguish rows.

Header Field Position Start-end position of the schema identifier.

Records Schema: define as many schemas as needed.


Header value: value in the row that identifies a schema.
Pattern: string which represents the length of each column
of the schema, separated by commas. Make sure the values
defined in this field are relevant with the defined schema.
Reject incorrect row size: select the check boxes of the
schemas where to reject incorrect row size.
Parent row: Select the parent row from the drop-down list.
By default, it is <Empty>.
Parent key column: Type in the parent key column name. If
the parent row is not <Empty>, this field must be filled with
a column name of the parent row schema.
Key column: Type in the key column name.

Skip from header Number of rows to be skipped in the beginning of file.

Skip from footer Number of rows to be skipped at the end of the file.

1061
tFileInputMSPositional

Limit Maximum number of rows to be processed. If Limit = 0, no


row is read or processed.

Die on parse error Let the component die if an parsing error occurs.

Die on unknown header type Length values separated by commas, interpreted as a string
between quotes. Make sure the values entered in this fields
are consistent with the schema defined.

Advanced settings

Process long rows (needed for processing rows longer than Select this check box to process long rows (this is necessary
100,000 characters wide) to process rows longer than 100 000 characters).

Advanced separator (for numbers) Select this check box to modify the separators used for nu
mbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.

Trim all column Select this check box to remove leading and trailing
whitespaces from defined columns.

Validate date Select this check box to check the date format strictly
against the input schema.

Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is a Flow
variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is a
Flow variable and it returns an integer.
NB_LINE_UNKOWN_HEADER_TYPES: the number of rows
with unknown header type. This is a Flow variable and it
returns an integer.
NB_LINE_PARSE_ERRORS: the number of rows with parse
errors. This is a Flow variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

1062
tFileInputMSPositional

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule Use this component to read a multi schemas positional file
and separate fields using a position separator value. You
can also create a rejection flow using a Row > Reject link
to filter the data which does not correspond to the type
defined. For an example of how to use these two links, see
Procedure on page 975.

Reading data from a positional file


The following scenario reads data from a positional file, which contains two schemas. The positional
file is shown below:
schema_1 (car_owner):schema_id;car_make;owner;age
schema_2 (car-insurance):schema_id;car_owner;age;car_insurance
1bmw John 45
1bench Mike 30
2John 45 yes
2Mike 50 No

Dropping the components


Procedure
1. Drop one tFileInputMSPositional and two tLogRow from the Palette to the design workspace.
2. Rename the two tLogRow components as car_owner and car_insurance.

Configuring the components


Procedure
1. Double-click the tFileInputMSPositional component to show its Basic settings view and define its
properties.

1063
tFileInputMSPositional

2. In the File name/Stream field, type in the path to the input file. Also, you can click the [...] button
to browse and choose the file.
3. In the Header Field Position field, enter the start-end position for the schema identifier in the
input file, 0-1 in this case as the first character in each row is the schema identifier.
4. Click the [+] button twice to added two rows in the Records table.
5. Click the cell under the Schema column to show the [...] button.
Click the [...] button to show the schema naming box.

6. Enter the schema name and click OK.


The schema name appears in the cell and the schema editor opens.

1064
tFileInputMSPositional

7. Define the schema car_owner, which has four columns: schema_id, car_make, owner and age.
8. Repeat the steps to define the schema car_insurance, which has four columns: schema_id,
car_owner, age and car_insurance.

9. Connect tFileInputMSPositional to the car_owner component with the Row > car_owner link, and
the car_insurance component with the Row > car_insurance link.
10. In the Header value column, type in the schema identifier value for the schema, 1 for the schema
car_owner and 2 for the schema car_insurance in this case.
11. In the Pattern column, type in the length of each field in the schema, the number of characters,
number, etc in each field, 1,8,10,3 for the schema car_owner and 1,10,3,3 for the schema
car_insurance in this case.
12. In the Skip from header field, type in the number of beginning rows to skip, 2 in this case as the
two rows in the beginning just describes the two schemas, instead of the values.
13. Choose Table (print values in cells of a table) in the Mode area of the components car_owner and
car_insurance.

Executing the Job


Procedure
1. Press Ctrl+S to save the Job.
2. Press F6 or click Run on the Run tab to execute the Job.

1065
tFileInputMSPositional

The file is read row by row based on the length values defined in the Pattern field and output in
two tables with different schemas.

1066
tFileInputMSXML

tFileInputMSXML
Reads the data structures (schemas) of a multi-structured XML file and sends the fields as defined in
the different schemas to the next components using Row connections.

tFileInputMSXML Standard properties


These properties are used to configure tFileInputMSXML running in the Standard Job framework.
The Standard tFileInputMSXML component belongs to the File and the XML families.
The component in this framework is available in all Talend products.

Basic settings

File Name Name of the file and/or the variable to be processed.


For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Root XPath query The root of the XML tree, which the query is based on.

Enable XPath in column "Schema XPath loop" but lose the o Select this check box if you want to define a XPath path in
rder the Schema XPath loop field of the Outputs table while not
keeping the order of the data shown in the source XML file.

Warning:
This options takes effect only if you select the Dom4j
generation mode in the Advanced settings view.

Outputs Schema: Define as many schemas as needed.


Schema XPath loop: Enter the node of the XML tree or
XPath path which the loop is based on.
XPath Queries: Enter the fields to be extracted from the
structured input.
Create empty row: Select this check box if you want to
create empty rows for the empty field(s) in the schema.

Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows.

Advanced settings

Trim all column Select this check box to remove leading and trailing
whitespaces from defined columns.

Validate date Select this check box to check the date format strictly
against the input schema.

1067
tFileInputMSXML

Ignore DTD file Select this check box to ignore the DTD file indicated in the
XML file being processed.

Generation mode Select the appropriate generation mode according to your


memory availability. The available modes are:
• Slow and memory-consuming (Dom4j)

Note:
This option allows you to use dom4j to process the
XML files of high complexity.

• Fast with low memory consumption (SAX)

Encoding Select the encoding type from the list or select CUSTOM
and define it manually. This field is compulsory for DB data
handling.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Reading a multi-structure XML file


The following scenario describes a Job which reads a multi-structure XML file, extracts the desired
fields and displays them on the console.

Designing the Job


Procedure
1. Drop a tFileInputMSXML component from the Palette onto the design workspace and double-click
the component to open its Basic settings view in the Component tab.

1068
tFileInputMSXML

2. Browse to the XML file you want to process. In this example, it is D:/Input/multischema_xml.xml,
which contains the following data:

<root>
<toy>Cat</toy>
<record>We Belong Together</record>
<book>As You Like It</book>
<book>All's Well That Ends Well</book>
<record>When You Believe</record>
<toy>Dog</toy>
</root>

3. In the Root XPath query field, enter the root of the XML tree, which the query will be based on. In
this example, it is "/root".
4. Select the Enable XPath in column "Schema XPath loop" but lose the order check box.
In this example, to extract the desired fields, you need to define a XPath path in the Schema
XPath loop field in the Outputs table for each output flow while not keeping the order of the data
shown in the source XML file.
5. Click the plus button to add lines in the Outputs table where you can define the output schemas,
record and book in this example.
6. In the Outputs table, click in the Schema cell and then click a three-dot button to display a dialog
box where you can define the schema name.
Enter a name for the output schema and click OK to close the dialog box.

7. The tFileInputMSXML schema editor appears.


Define the schema according to your need.

1069
tFileInputMSXML

8. Do the same to define the output schema record.


9. In the Schema XPath loop cell, enter the node of the XML tree, which the loop is based on. In this
example, enter "/book" and "/record" respectively.
10. In the XPath Queries cell, enter the fields to be extracted from the structured XML input. In this
example, enter the XPath query ".".
11. In the design workspace, drop two tLogRow compnents from the Palette and connect
tFileInputMSXML to tLogRow1 and tLogRow2 using the book and record links respectively.
Rename the two tLogRow components as book and record respectively.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Execute the Job by pressing F6 or clicking Run on the Run tab.
The multi-structure XML file is read row by row and the extracted fields are displayed on the
console. The first two fields are for the book schema, and the last two fields are for the record
schema.

1070
tFileInputMSXML

1071
tFileInputPositional

tFileInputPositional
Reads a positional file row by row to split them up into fields based on a given pattern and then sends
the fields as defined in the schema to the next component.

tFileInputPositional Standard properties


These properties are used to configure tFileInputPositional running in the Standard Job framework.
The Standard tFileInputPositional component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-In or Repository.

  Built-In: No property data stored centrally.

  Repository: Select the repository file where the properties


are stored.

File name/Stream File name: Name and path of the file to be processed.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Stream: The data flow to be processed. The data must be


added to the flow in order for tFileInputPositional to fetch
these data via the corresponding representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using
along with this component, for example, the INPUT_STREAM
variable of tFileFetch; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.
Related topic to the available variables: see Talend Studio
User Guide.
Related scenario to the input stream, see Reading data from
a remote file in streaming mode on page 1020.

Row separator The separator used to identify the end of a row.

Use byte length as the cardinality Select this check box to enable the support of double-byte
character to this component. JDK 1.6 is required for this
feature.

Customize Select this check box to customize the data format of the
positional file and define the table columns:
Column: Select the column you want to customize.
Size: Enter the column size.

1072
tFileInputPositional

Padding char: Enter, between double quotation marks, the


padding charater you need to remove from the field. A space
by default.
Alignment: Select the appropriate alignment parameter.

Pattern Length values separated by commas, interpreted as a string


between quotes. Make sure the values entered in this field
are consistent with the schema defined.

Pattern Units The unit of the length values specified in the Pattern field.
• Bytes: With this option selected, the length values in
the Pattern field should be the count of bytes that
represent symbols in original encoding of the input file.
• Symbols: With this option selected, the length values
in the Pattern field should be the count of regular
symbols, not including surrogate pairs.
• Symbols (including rare): With this option selected,
the length values in the Pattern field should be the
count of symbols, including rare symbols such as
surrogate pairs, and each surrogate pair counts as a
single symbol. Considering the performance factor, it is
not recommended to use this option when your input
data consists of only regular symbols.

Skip empty rows Select this check box to skip the empty rows.

Uncompress as zip file Select this check box to uncompress the input file.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Header Enter the number of rows to be skipped in the beginning of


file.

Footer Number of rows to be skipped at the end of the file.

Limit Maximum number of rows to be processed. If Limit = 0, no


row is read or processed.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon

1073
tFileInputPositional

completion and choose this schema metadata again in


the Repository Content window.
This component must work with tSetDynamicSchema to
leverage the dynamic schema feature.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
flowcharts. Related topic: see Talend Studio User Guide.

Advanced settings

Needed to process rows longer than 100 000 characters Select this check box if the rows to be processed in the
input file are longer than 100 000 characters.

Advanced separator (for numbers) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.

Trim all column Select this check box to remove leading and trailing
whitespaces from defined columns.

Validate date Select this check box to check the date format strictly
against the input schema.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

1074
tFileInputPositional

Usage

Usage rule Use this component to read a file and separate fields using
a position separator value. You can also create a rejection
flow using a Row > Reject link to filter the data which does
not correspond to the type defined. For an example of how
to use these two links, see Procedure on page 975.

Reading a Positional file and saving filtered results to XML


The following scenario describes a two-component Job, which aims at reading data from an input file
that contains contract numbers, customer references, and insurance numbers as shown below, and
outputting the selected data (according to the data position) into an XML file.
Contract CustomerRef InsuranceNr
00001 8200 50330
00001 8201 50331
00002 8202 50332
00002 8203 50333

Dropping and linking components


About this task

Procedure
1. Drop a tFileInputPositional component from the Palette to the design workspace.
2. Drop a tFileOutputXML component as well. This file is meant to receive the references in a
structured way.
3. Right-click the tFileInputPositional component and select Row > Main. Then drag it onto the
tFileOutputXML component and release when the plug symbol shows up.

Configuring data input


Procedure
1. Double-click the tFileInputPositional component to show its Basic settings view and define its
properties.

1075
tFileInputPositional

2. Define the Job Property type if needed. For this scenario, we use the built-in Property type.
As opposed to the Repository, this means that the Property type is set for this station only.
3. Fill in a path to the input file in the File Name field. This field is mandatory.
4. Define the Row separator identifying the end of a row if needed, by default, a carriage return.
5. If required, select the Use byte length as the cardinality check box to enable the support of
double-byte character.
6. Define the Pattern to delimit fields in a row. The pattern is a series of length values corresponding
to the values of your input files. The values should be entered between quotes, and separated by
a comma. Make sure the values you enter match the schema defined.
7. Fill in the Header, Footer and Limit fields according to your input file structure and your need. In
this scenario, we only need to skip the first row when reading the input file. To do this, fill the
Header field with 1 and leave the other fields as they are.
8. Next to Schema, select Repository if the input schema is stored in the Repository. In this use case,
we use a Built-In input schema to define the data to pass on to the tFileOutputXML component.
9. You can load and/or edit the schema via the Edit Schema function. For this schema, define three
columns, respectively Contract, CustomerRef and InsuranceNr matching the structure of the input fi
le. Then, click OK to close the Schema dialog box and propagate the changes.

1076
tFileInputPositional

Configuring data output


Procedure
1. Double-click tFileOutputXML to show its Basic settings view.

2. Enter the XML output file path.


3. Define the row tag that will wrap each row of data, in this use case ContractRef.
4. Click the three-dot button next to Edit schema to view the data structure, and click Sync columns
to retrieve the data structure from the input component if needed.
5. Switch to the Advanced settings tab view to define other settings for the XML output.

6. Click the plus button to add a line in the Root tags table, and enter a root tag (or more) to wrap
the XML output structure, in this case ContractsList.
7. Define parameters in the Output format table if needed. For example, select the As attribute
check box for a column if you want to use its name and value as an attribute for the parent XML
element, clear the Use schema column name check box for a column to reuse the column label
from the input schema as the tag label. In this use case, we keep all the default output format
settings as they are.
8. To group output rows according to the contract number, select the Use dynamic grouping check
box, add a line in the Group by table, select Contract from the Column list field, and enter an
attribute for it in the Attribute label field.

1077
tFileInputPositional

9. Leave all the other parameters as they are.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job to ensure that all the configured parameters take effect.
2. Press F6 or click Run on the Run tab to execute the Job.
The file is read row by row based on the length values defined in the Pattern field and output as
an XML file as defined in the output settings. You can open it using any standard XML editor.

1078
tFileInputProperties

tFileInputProperties
Reads a text file row by row and separates the fields according to the model key = value.

tFileInputProperties Standard properties


These properties are used to configure tFileInputProperties running in the Standard Job framework.
The Standard tFileInputProperties component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
For this component, the schema is read-only. It is made of
two column, Key and Value, corresponding to the parameter
name and the parameter value to be copied.

File format Select from the list your file format, either: .properties or
.ini.

  .properties: data in the configuration file is written in two


lines and structured according to the following way: key =
value.

  .ini: data in the configuration file is written in two lines and


structured according to the following way: key = value and
re-grouped in sections.
Section Name: enter the section name on which the
iteration is based.

File Name Name and path to the file to be created and/or the variable
to be used.
For further information about how to define and use a
variable, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Calculate MD5 Hash Select this check box to verify that the file to be processed
has been correctly downloaded.

Advanced settings

Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

1079
tFileInputProperties

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to read a text file and separate data
according to the structure key = value.

Reading and matching the keys and the values of


different .properties files and outputting the results in a
glossary
This four-component Job reads two .properties files, one in French and the other in English. The data
in the two input files is mapped to output a glossary matching the English and French terms.
The two input files used in this scenario hold localization strings for the tMysqlInput component in
Talend Studio .

1080
tFileInputProperties

The glossary displays on the console listing three columns holding: the key name in the first column,
the English term in the second, and the corresponding French term in the third.

Dropping and linking the components


Procedure
1. Drop the following components from the Palette onto the design workspace: tFileInputProperties
(x2), tMap, and tLogRow.
2. Connect the component together using Row > Main links. The second properties file, FR, is used as
a lookup flow.

Configuring the components


Procedure
1. Double-click the first tFileInputProperties component to open its Basic settings view and define
its properties.

1081
tFileInputProperties

2. In the File Format field, select your file format.


3. In the File Name field, click the three-dot button and browse to the input .properties file you want
to use.
4. Do the same with the second tFileInputProperties and browse to the French .properties file this
time.

5. Double-click the tMap component to open the tMap editor.

1082
tFileInputProperties

6. Select all columns from the English_terms table and drop them to the output table.
Select the key column from the English_terms table and drop it to the key column in the
French_terms table.
7. In the glossary table in the lower right corner of the tMap editor, rename the value field to EN
because it will hold the values of the English file.
8. Click the plus button to add a line to the glossary table and rename it to FR.
9. In the Length field, set the maximum length to 255.
10. In the upper left corner of the tMap editor, select the value column in the English_terms table and
drop it to the FR column in the French_terms table. When done, click OK to validate your changes
and close the map editor and propagate the changes to the next component.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 or click the Run button from the Run tab to execute it.

1083
tFileInputProperties

1084
tFileInputRaw

tFileInputRaw
Reads all data in a raw file and sends it to a single output column for subsequent processing by
another component.

tFileInputRaw Standard properties


These properties are used to configure tFileInputRaw running in the Standard Job framework.
The Standard tFileInputRaw component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: No property data stored centrally.

  Repository: Select the repository file where the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Filename The name of and path of the input file to be processed,


which you can enter manually between double quotes or
browse and select by clicking the [...] button.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Mode Read the file as a string: The content of the file is read as a
string.
Read the file as a bytes array: The content of the file is read
as a bytes array.
Stream the file: As soon as the first character is entered in
the source file, it is read immediately.

1085
tFileInputRaw

Encoding If you are using the Read the file as a string mode, select
the encoding type from the list or select Custom and define
it manually.

Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.
To catch the FileNotFoundException, you also need to
select this check box.

Advanced settings

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables FILENAME_PATH: the path of the input file. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to provide input data for Jobs that
require a single column of data or that require a whole file
to be read as a single column.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related Scenario
For a related use case, see:
• Uploading files to Dropbox on page 655

1086
tFileInputRegex

tFileInputRegex
Reads a file row by row to split them up into fields using regular expressions and sends the fields as
defined in the schema to the next component.
Powerful feature which can replace number of other components of the File family. Requires some
advanced knowledge on regular expression syntax.

tFileInputRegex Standard properties


These properties are used to configure tFileInputRegex running in the Standard Job framework.
The Standard tFileInputRegex component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-In or Repository.

  Built-In: No property data stored centrally.

  Repository: Select the repository file where the properties


are stored.

File name/Stream File name: Name of the file and/or the variable to be
processed.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Stream: Data flow to be processed. The data must be added


to the flow so that it can be collected by the tFileInputRege
x via the INPUT_STREAM variable in the autocompletion list
(Ctrl+Space).
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Row separator The separator used to identify the end of a row.

Regex Type in your Java regular expression including the


subpattern matching the fields to be extracted. This field
can contain multiple lines.
Note: Antislashes need to be doubled in regexp

Warning:
• The regular expression needs to be in double
quotes.
• To extract all the desired strings, make sure the
regular expression contains the corresponding
subpatterns that match the strings. Also, each
subpattern in the regular expression needs to be in
a pair of brackets.

1087
tFileInputRegex

Header Enter the number of rows to be skipped in the beginning of


file.

Footer Number of rows to be skipped at the end of the file.

Limit Maximum number of rows to be processed. If Limit = 0, no


row is read or processed.

Ignore error message for the unmatched record Select this check box to avoid outputing error messages for
records that do not match the specified regex. This check
box is cleared by default.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Skip empty rows Select this check box to skip the empty rows.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
In the Map/Reduce version of tFileInputRegex, you need to
select the Custom encoding check box to display this list.

1088
tFileInputRegex

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to read a file and separate fields
contained in this file according to the defined Regex. You
can also create a rejection flow using a Row > Reject link
to filter the data which doesn't correspond to the type
defined. For an example of how to use these two links, see
Procedure on page 975.

Reading data using a Regex and outputting the result to


Positional file
The following scenario creates a two-component Job, reading data from an Input file using regular
expression and outputting delimited data into a positional file.

Dropping and linking the components


Procedure
1. Drop a tFileInputRegex component from the Palette to the design workspace.
2. Drop a tFileOutputPositional component the same way.
3. Right-click on the tFileInputRegex component and select Row > Main. Drag this main row link
onto the tFileOutputPositional component and release when the plug symbol displays.

1089
tFileInputRegex

Configuring the components


Procedure
1. Select the tFileInputRegex again so the Component view shows up, and define the properties:

2. The Job is built-in for this scenario. Hence, the Properties are set for this station only.
3. Fill in a path to the file in File Name field. This field is mandatory.
4. Define the Row separator identifying the end of a row.
5. Then define the Regular expression in order to delimit fields of a row, which are to be passed on
to the next component. You can type in a regular expression using Java code, and on mutiple lines
if needed.

Warning:
Regex syntax requires double quotes.

6. In this expression, make sure you include all subpatterns matching the fields to be extracted.
7. In this scenario, ignore the header, footer and limit fields.
8. Select a local (Built-in) Schema to define the data to pass on to the tFileOutputPositional
component.
9. You can load or create the schema through the Edit Schema function.
10. Then define the second component properties:

1090
tFileInputRegex

11. Enter the Positional file output path.


12. Enter the Encoding standard, the output file is encoded in. Note that, for the time being, the
encoding consistency verification is not supported.
13. Select the Schema type. Click on Sync columns to automatically synchronize the schema with the
Input file schema.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Now go to the Run tab, and click on Run to execute the Job.
The file is read row by row and split up into fields based on the Regular Expression definition. You
can open it using any standard file editor.

1091
tFileInputXML

tFileInputXML
Reads an XML structured file row by row to split them up into fields and sends the fields as defined in
the schema to the next component.

tFileInputXML Standard properties


These properties are used to configure tFileInputXML running in the Standard Job framework.
The Standard tFileInputXML component belongs to the File and the XML families.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-In or Repository.

  Built-In: No property data stored centrally.

  Repository: Select the repository file where the properties


are stored.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

File name/Stream File name: Name and path of the file to be processed.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Stream: The data flow to be processed. The data must be


added to the flow in order for tFileInputXML to fetch these
data via the corresponding representative variable.

1092
tFileInputXML

This variable could be already pre-defined in your Studio or


provided by the context or the components you are using
along with this component, for example, the INPUT_STREAM
variable of tFileFetch; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.

Related topic to the available variables: see Talend Studio


User Guide. Related scenario to the input stream, see
Reading data from a remote file in streaming mode on page
1020.

Loop XPath query Node of the tree, which the loop is based on.

Mapping Column: Columns to map. They reflect the schema as


defined in the Schema type field.
XPath Query: Enter the fields to be extracted from the
structured input.
Get nodes: Select this check box to recuperate the XML
content of all current nodes specified in the Xpath query
list, or select the check box next to specific XML nodes
to recuperate only the content of the selected nodes.
These nodes are important when the output flow from this
component needs to use the XML structure, for example, the
Document data type.
For further information about the Document type, see
Talend Studio User Guide.

Note:
The Get Nodes option functions in the DOM4j and SAX
modes, although in SAX mode namespaces are not s
upported. For further information concerning the DOM4j
and SAX modes, please see the properties noted in the
Generation mode list of the Advanced Settings tab.

Limit Maximum number of rows to be processed. If Limit = 0,


no row is read nor processed. If -1, all rows are read or
processed.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

Ignore DTD file Select this check box to ignore the DTD file indicated in the
XML file being processed.

Advanced separator (for number) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).

1093
tFileInputXML

Thousands separator: define the separators to use for


thousands.
Decimal separator: define the separators to use for decimals.

Ignore the namespaces Select this check box to ignore name spaces.
Generate a temporary file: click the three-dot button to
browse to the XML temporary file and set its path in the
field.

Use Separator for mode Xerces Select this check box if you want to separate concatenated
children node values.

Note:
This field can only be used if the selected Generation
mode is Xerces.

The following field displays:


Field separator: Define the delimiter to be used to separate
the children node values.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

Generation mode From the drop-down list select the generation mode for the
XML file, according to the memory available and the desired
speed:
• Slow and memory-consuming (Dom4j)

Note:
This option allows you to use dom4j to process the
XML files of high complexity.

• Memory-consuming (Xerces).
• Fast with low memory consumption (SAX)

Validate date Select this check box to check the date format strictly
against the input schema.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

1094
tFileInputXML

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tFileInputXML is for use as an entry componant. It allows


you to create a flow of XML data using a Row > Main link.
You can also create a rejection flow using a Row > Reject
link to filter the data which doesn't correspond to the type
defined. For an example of how to use these two links, see
Procedure on page 975.

Reading and extracting data from an XML structure


This scenario describes a basic Job that reads a defined XML directory and extracts specific
information and outputs it on the Run console via a tLogRow component.

Procedure
Procedure
1. Drop tFileInputXML and tLogRow from the Palette to the design workspace.
2. Connect both components together using a Main Row link.
3. Double-click tFileInputXML to open its Basic settings view and define the component properties.

1095
tFileInputXML

4. As the street dir file used as input file has been previously defined in the Metadata area, select
Repository as Property type. This way, the properties are automatically leveraged and the rest
of the properties fields are filled in (apart from Schema). For more information regarding the
metadata creation wizards, see Talend Studio User Guide.
5. Select the same way the relevant schema in the Repository metadata list. Edit schema if you want
to make any change to the schema loaded.
6. The Filename shows the structured file to be used as input
7. In Loop XPath query, change if needed the node of the structure where the loop is based.
8. On the Mapping table, fill the fields to be extracted and displayed in the output.
9. If the file size is consequent, fill in a Limit of rows to be read.
10. Enter the encoding if needed then double-click on tLogRow to define the separator character.
11. Save your Job and press F6 to execute it.

Results

The fields defined in the input properties are extracted from the XML structure and displayed on the
console.

Extracting erroneous XML data via a reject flow


This Java scenario describes a three-component Job that reads an XML file and:
1. first, returns correct XML data in an output XML file,
2. and second, displays on the console erroneous XML data which type does not correspond to the
defined one in the schema.

Procedure
Procedure
1. Drop the following components from the Palette to the design workspace: tFileInputXML,
tFileOutputXML and tLogRow.
Right-click tFileInputXML and select Row > Main in the contextual menu and then click
tFileOutputXML to connect the components together.
Right-click tFileInputXML and select Row > Reject in the contextual menu and then click tLogRow
to connect the components together using a reject link.

1096
tFileInputXML

2. Double-click tFileInputXML to display the Basic settings view and define the component
properties.

3. In the Property Type list, select Repository and click the three-dot button next to the field to
display the Repository Content dialog box where you can select the metadata relative to the input
file if you have already stored it in the File xml node under the Metadata folder of the Repository
tree view. The fields that follow are automatically filled with the fetched data. If not, select Built-
in and fill in the fields that follow manually.
For more information about storing schema metadat in the Repository tree view, see Talend
Studio User Guide.
4. In the Schema Type list, select Repository and click the three-dot button to open the dialog box
where you can select the schema that describe the structure of the input file if you have already
stored it in the Repository tree view. If not, select Built-in and click the three-dot button next to
Edit schema to open a dialog box where you can define the schema manually.

1097
tFileInputXML

The schema in this example consists of five columns: id, CustomerName, CustomerAddress, idState
and id2.
5. Click the three-dot button next to the Filename field and browse to the XML file you want to
process.
6. In the Loop XPath query, enter between inverted commas the path of the XML node on which to
loop in order to retrieve data.
In the Mapping table, Column is automatically populated with the defined schema.
In the XPath query column, enter between inverted commas the node of the XML file that holds
the data you want to extract from the corresponding column.
7. In the Limit field, enter the number of lines to be processed, the first 10 lines in this example.
8. Double-click tFileOutputXML to display its Basic settings view and define the component
properties.

9. Click the three-dot button next to the File Name field and browse to the output XML file you want
to collect data in, customer_data.xml in this example.
In the Row tag field, enter between inverted commas the name you want to give to the tag that
will hold the recuperated data.
Click Edit schema to display the schema dialog box and make sure that the schema matches that
of the preceding component. If not, click Sync columns to retrieve the schema from the preceding
component.
10. Double-click tLogRow to display its Basic settings view and define the component properties.
Click Edit schema to open the schema dialog box and make sure that the schema matches that
of the preceding component. If not, click Sync columns to retrieve the schema of the preceding
component.

1098
tFileInputXML

In the Mode area, select the Vertical option.


11. Save your Job and press F6 to execute it.

Results

The output file customer_data.xml holding the correct XML data is created in the defined path and
erroneous XML data is displayed on the console of the Run view.

1099
tFileList

tFileList
Iterates a set of files or folders in a given directory based on a filemask pattern.

Note: This component iterates over every file in a directory, including system file, hidden file, zero-
byte file, and so on, as long as the file meets the conditions set in the Files field.

tFileList Standard properties


These properties are used to configure tFileList running in the Standard Job framework.
The Standard tFileList component belongs to the File and the Orchestration families.
The component in this framework is available in all Talend products.

Basic settings

Directory Path to the directory where the files are stored.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

FileList Type Select the type of input you want to iterate on from the list:
Files if the input is a set of files,
Directories if the input is a set of directories,
Both if the input is a set of the above two types.

Include subdirectories Select this check box if the selected input source type
includes sub-directories.

Case Sensitive Set the case mode from the list to either create or not
create case sensitive filter on filenames.

Generate Error if no file found Select this check box to generate an error message if no
files or directories are found.

Use Glob Expressions as Filemask This check box is selected by default. It filters the results
using a Global Expression (Glob Expressions).

Files Click the plus button to add as many filter lines as needed:
Filemask: in the added filter lines, type in a filename or a
filemask using special characters or regular expressions.

Order by The folders are listed first of all, then the files. You can
choose to prioritise the folder and file order either:
By default: alphabetical order, by folder then file;
By file name: alphabetical order or reverese alphabetical
order;
By file size: smallest to largest or largest to smallest;
By modified date: most recent to least recent or least recent
to most recent.

1100
tFileList

Note:
If ordering by file name, in the event of identical file
names then modified date takes precedence. If ordering
by file size, in the event of identical file sizes then file
name takes precedence. If ordering by modified date,
in the event of identical dates then file name takes
precedence.

Order action Select a sort order by clicking one of the following radio
buttons:
ASC: ascending order;
DESC: descending order;

Advanced settings

Use Exclude Filemask Select this check box to enable Exclude Filemask field to
exclude filtering condition based on file type:
Exclude Filemask: Fill in the field with file types to be
excluded from the Filemasks in the Basic settings view.

Note:
File types in this field should be quoted with double
quotation marks and seperated by comma.

Format file path to slash(/) style(useful on Windows) Select this check box to format the file path to slash(/) style
which is useful on Windows.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables CURRENT_FILE: the current file name. This is a Flow


variable and it returns a string.
CURRENT_FILEPATH: the current file path. This is a Flow
variable and it returns a string.
CURRENT_FILEEXTENSION: the extension of the current file.
This is a Flow variable and it returns a string.
CURRENT_FILEDIRECTORY: the current file directory. This is
a Flow variable and it returns a string.
NB_FILE: the number of files iterated upon so far. This is a
Flow variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

1101
tFileList

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tFileList provides a list of files or folders from a defined


directory on which it iterates

Connections Outgoing links (from this component to another):


Row: Iterate
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

Incoming links (from one component to this one):


Row: Iterate.
Trigger: Run if; On Subjob Ok; On Subjob Error; On
component Ok; On Component Error; Synchronize;
Parallelize.

For further information regarding connections, see Talend


Studio User Guide.

Iterating on a file directory


The following scenario creates a three-component Job, which aims at listing files from a defined
directory, reading each file by iteration, selecting delimited data and displaying the output in the Run
log console.

Dropping and linking the components


Procedure
1. Drop the following components from the Palette to the design workspace: tFileList, tFileInputDeli
mited, and tLogRow.
2. Right-click the tFileList component, and pull an Iterate connection to the tFileInputDelimited
component. Then pull a Main row from the tFileInputDelimited to the tLogRow component.

Configuring the components


Procedure
1. Double-click tFileList to display its Basic settings view and define its properties.

1102
tFileList

2. Browse to the Directory that holds the files you want to process. To display the path on the Job
itself, use the label (__DIRECTORY__) that shows up when you put the pointer anywhere in the
Directory field. Type in this label in the Label Format field you can find if you click the View tab in
the Basic settings view.

3. In the Basic settings view and from the FileList Type list, select the source type you want to
process, Files in this example.
4. In the Case sensitive list, select a case mode, Yes in this example to create case sensitive filter on
file names.
5. Keep the Use Glob Expressions as Filemask check box selected if you want to use global
expressions to filter files, and define a file mask in the Filemask field.
6. Double-click tFileInputDelimited to display its Basic settings view and set its properties.

7. Enter the File Name field using a variable containing the current filename path, as you filled in
the Basic settings of tFileList. Press Ctrl+Space bar to access the autocomplete list of variables,
and select the global variable ((String)globalMap.get("tFileList_1_CURREN
T_FILEPATH")) . This way, all files in the input directory can be processed.
8. Fill in all other fields as detailed in the tFileInputDelimited section. Related topic:
tFileInputDelimited on page 1015.
9. Select the last component, tLogRow, to display its Basic settings view and fill in the separator to
be used to distinguish field content displayed on the console. Related topic: tLogRow on page
1977.

1103
tFileList

Executing the Job


Press Ctrl + S to save your Job, and press F6 to run it.

The Job iterates on the defined directory, and reads all included files. Then delimited data is passed
on to the last component which displays it on the console.

Finding duplicate files between two folders


This scenario describes a Job that iterates on files in two folders, transforms the iteration results to
data flows to obtain a list of filenames, and then picks up all duplicates from the list and shows them
on the Run console, as a preparation step before merging the two folders, for example.

1104
tFileList

Dropping and linking the components


Procedure
1. From the Palette, drop two tFileList components, two tIterateToFlow components, two
tFileOutputDelimited components, a tFileInputDelimited component, a tUniqRow component, and
a tLogRow component onto the design workspace.
2. Link the first tFileList component to the first tIterateToFlow component using a Row > Iterate
connection, and the connect the first tIterateToFlow component to the first tFileOutputDelimited
component using a Row > Main connection to form the first subJob.
3. Link the second tFileList component to the second tIterateToFlow component using a Row
> Iterate connection, and the connect the second tIterateToFlow component to the second
tFileOutputDelimited component using a Row > Main connection to form the second subJob.
4. Link the tFileInputDelimited to the tUniqRow component using a Row > Main connection, and the
tUniqRow component to the tLogRow component using a Row > Duplicates connection to form
the third subJob.
5. Link the three subJobs using Trigger > On Subjob Ok connections so that they will be triggered
one after another, and label the components to better identify their roles in the Job.

Configuring the components


Procedure
1. In the Basic settings view of the first tFileList component, fill the Directory field with the path to
the first folder you want to read filenames from, E:/DataFiles/DI/images in this example, and leave
the other settings as they are.

1105
tFileList

2. Double-click the first tIterateToFlow component to show its Basic settings view.

3. Double-click the [...] button next to Edit schema to open the Schema dialog box and define the
schema of the text file the next component will write filenames to. When done, click OK to close
the dialog box and propagate the schema to the next component.
In this example, the schema contains only one column: Filename.

1106
tFileList

4. In Value field of the Mapping table, press Ctrl+Space to access the autocomplete list of variables,
and select the global variable ((String)globalMap.get("tFileList_1_CURREN
T_FILE")) to read the name of each file in the input directory, which will be put into a data
flow to pass to the next component.
5. In the Basic settings view of the first tFileOutputDelimited component, fill the File Name field
with the path of the text file that will store the filenames from the incoming flow, D:/temp/tempda
ta.csv in this example. This completes the configuration of the first subJob.

6. Repeat the steps above to complete the configuration of the second subJob, but:
• fill the Directory field in the Basic settings view of the second tFileList component with the
other folder you want to read filenames from, E:/DataFiles/DQ/images in this example.
• select the Append check box in the Basic settings view of the second tFileOutputDelimited
component so that the filenames previously written to the text file will not be overwritten.
7. In the Basic settings view of the tFileInputDelimited component, fill the File name/Stream
field with the path of the text file that stores the list of filenames, D:/temp/tempdata.csv in this
example, and define the file schema, which contains only one column in this example, Filename.

1107
tFileList

8. In the Basic settings view of the tUniqRow component, select the Key attribute check box for the
only column, Filename in this example.

9. In the Basic settings view of the tLogRow component, select the Table (print values in cells of a
table) option for better display effect.

Executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Click Run or press F6 to run the Job.
All the duplicate files between the selected folders are displayed on the console.

1108
tFileList

Results
For other scenarios using tFileList, see tFileCopy on page 988.

1109
tFileOutputARFF

tFileOutputARFF
Writes an ARFF file that holds data organized according to the defined schema.

tFileOutputARFF Standard properties


These properties are used to configure tFileOutputARFF running in the Standard Job framework.
The Standard tFileOutputARFF component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file where the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a connection wizard and store the


Excel file connection parameters you set in the component
Basic settings view.
For more information about setting up and storing file
connection parameters, see Talend Studio User Guide.

File name Name or path to the output file and/or the variable to be
used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Attribute Define Displays the schema you defined in the Edit schema dialog
box.
Column: Name of the column.
Type: Data type.
Pattern: Enter the data model (pattern), if necessary.

Relation Enter the name of the relation.

Append Select this check box to add the new rows at the end of the
file.

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:

1110
tFileOutputARFF

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: You can create the schema and store it locally for
this component. Related topic: see Talend Studio User Guide.

  Repository: You have already created and stored the


schema in the Repository. You can reuse it in various
projects and Job flowcharts. Related topic: see Talend Studio
User Guide.

Create directory if not exists This check box is selected by default. It creates a directory
to hold the output table if it does not exist.

Advanced settings

Don't generate empty file Select this check box if you do not want to generate empty
files.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component along with a Row link to collect data
from another component and to re-write the data to an
ARFF file.

1111
tFileOutputARFF

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Connections Outgoing links (from this component to another):


Row: Main.
Trigger: On Subjob Ok; On Subjob Error; Run if.

Incoming links (from one component to this one):


Row: Main; Reject; Iterate.
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error; Synchronize; Paralle
lize.

For further information regarding connections, see Talend


Studio User Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenario
For tFileOutputARFF related scenario, see Displaying the content of a ARFF file on page 1011.

1112
tFileOutputDelimited

tFileOutputDelimited
Outputs the input data to a delimited file according to the defined schema.

tFileOutputDelimited Standard properties


These properties are used to configure tFileOutputDelimited running in the Standard Job framework.
The Standard tFileOutputDelimited component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-In or Repository.

  Built-In: No property data stored centrally.

  Repository: Select the repository file where the properties


are stored.

Use Output Stream Select this check box process the data flow of interest. Once
you have selected it, the Output Stream field displays and
you can type in the data flow of interest.
The data flow to be processed must be added to the flow
in order for this component to fetch these data via the
corresponding representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using
along with this component; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.
For further information about how to use a stream, see
Reading data from a remote file in streaming mode on page
1020.

File Name Name or path to the output file and/or the variable to be
used.
This field becomes unavailable once you have selected the
Use Output Stream check box.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Row Separator The separator used to identify the end of a row.

Field Separator Enter character, string or regular expression to separate


fields for the transferred data.

1113
tFileOutputDelimited

Append Select this check box to add the new rows at the end of the
file.

Include Header Select this check box to include the column header to the
file.

Compress as zip file Select this check box to compress the output file in zip
format.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the output component.

Advanced settings

Advanced separator (for numbers) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.

CSV options Select this check box to specify the following CSV
parameters:
• Escape char: enter the escape character between
double quotation marks.
• Text enclosure: enter the enclosure character (only
one character) between double quotation marks.
For example, """ needs to be entered when double
quotation marks (") are used as the enclosure character.
It is recommended to use standard escape character, that
is "\". Otherwise, you should set the same character for

1114
tFileOutputDelimited

Escape char and Text enclosure. For example, if the escape


character is set to "\", the text enclosure can be set to any
other character. On the other hand, if the escape character
is set to other character rather than "\", the text enclosure
can be set to any other characters. However, the escape
character will be changed to the same character as the text
enclosure. For instance, if the escape character is set to "#"
and the text enclosure is set to "@", the escape character
will be changed to "@", not "#".

Create directory if not exists This check box is selected by default. It creates the directory
that holds the output delimited file, if it does not already
exist.

Split output in several files In case of very big output files, select this check box to
divide the output delimited file into several files.
Rows in each output file: set the number of lines in each of
the output files.

Custom the flush buffer size Select this check box to define the number of lines to write
before emptying the buffer.
Row Number: set the number of lines to write.

Output in row mode Select this check box to ensure atomicity of the flush so
that each row of data can remain consistent as a set and
incomplete rows of data are never written to a file.
This check box is mostly useful when using this component
in the multi-thread situation.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

Don't generate empty file Select this check box if you do not want to generate empty
files.

Throw an error if the file already exist Select this check box to throw an exception if the output
file specified in the File Name field on the Basic settings
view already exists.
Clear this check box to overwrite the existing file.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
FILE_NAME: the name of the file being processed. This is a
Flow variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

1115
tFileOutputDelimited

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to write a delimited file and separate
fields using a field separator value.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Writing data in a delimited file


This scenario describes a three-component Job that extracts certain data from a file holding
information about clients, customers, and then writes the extracted data in a delimited file.
In the following example, we have already stored the input schema under the Metadata node in the
Repository tree view. For more information about storing schema metadata in the Repository, see
Talend Studio User Guide.

Dropping and linking components


Procedure
1. In the Repository tree view, expand Metadata and File delimited in succession and then browse to
your input schema, customers, and drop it on the design workspace. A dialog box displays where
you can select the component type you want to use.

1116
tFileOutputDelimited

2. Click tFileInputDelimited and then OK to close the dialog box. A tFileInputDelimited component
holding the name of your input schema appears on the design workspace.
3. Drop a tMap component and a tFileOutputDelimited component from the Palette to the design
workspace.
4. Link the components together using Row > Main connections.

Configuring the components


Configuring the input component

Procedure
1. Double-click tFileInputDelimited to open its Basic settings view. All its property fields are
automatically filled in because you defined your input file locally.

2. If you do not define your input file locally in the Repository tree view, fill in the details manually
after selecting Built-in in the Property type list.
3. Click the [...] button next to the File Name field and browse to the input file, customer.csv in this
example.

1117
tFileOutputDelimited

Warning:
If the path of the file contains some accented characters, you will get an error message when
executing your Job.

4. In the Row Separators and Field Separators fields, enter respectively "\n" and ";" as line and field
separators.
5. If needed, set the number of lines used as header and the number of lines used as footer in the
corresponding fields and then set a limit for the number of processed rows.
In this example, Header is set to 6 while Footer and Limit are not set.
6. In the Schema field, schema is automatically set to Repository and your schema is already defined
since you have stored your input file locally for this example. Otherwise, select Built-in and click
the [...] button next to Edit Schema to open the Schema dialog box where you can define the
input schema, and then click OK to close the dialog box.

Configuring the mapping component

Procedure
1. In the design workspace, double-click tMap to open its editor.

1118
tFileOutputDelimited

2.
In the tMap editor, click on top of the panel to the right to open the Add a new output table
dialog box.

3. Enter a name for the table you want to create, row2 in this example.
4. Click OK to validate your changes and close the dialog box.
5. In the table to the left, row1, select the first three lines (Id, CustomerName and CustomerAddress)
and drop them to the table to the right
6. In the Schema editor view situated in the lower left corner of the tMap editor, change the type of
RegisterTime to String in the table to the right.

7. Click OK to save your changes and close the editor.

Configuring the output component

Procedure
1. In the design workspace, double-click tFileOutputDelimited to open its Basic settings view and
define the component properties.

2. In the Property Type field, set the type to Built-in and fill in the fields that follow manually.
3. Click the [...] button next to the File Name field and browse to the output file you want to write
data in, customerselection.txt in this example.
4. In the Row Separator and Field Separator fields, set "\n" and ";" respectively as row and field
separators.

1119
tFileOutputDelimited

5. Select the Include Header check box if you want to output columns headers as well.
6. Click Edit schema to open the schema dialog box and verify if the recuperated schema
corresponds to the input schema. If not, click Sync Columns to recuperate the schema from the
preceding component.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 or click Run on the Run tab to execute the Job.

The three specified columns Id, CustomerName and CustomerAddress are output in the defined
output file.

Utilizing Output Stream to save filtered data to a local file


Based on the preceding scenario, this scenario saves the filtered data to a local file using output
stream.

Dropping and linking components


Procedure
1. Drop tJava from the Palette to the design workspace.
2. Connect tJava to tFileInputDelimited using a Trigger > On Subjob OK connection.

1120
tFileOutputDelimited

Configuring the components


Procedure
1. Double-click tJava to open its Basic settings view.

2. In the Code area, type in the following command:

new java.io.File("C:/myFolder").mkdirs();
globalMap.put("out_file",new
java.io.FileOutputStream("C:/myFolder/customerselection.txt",false));

Note:
In this scenario, the command we use in the Code area of tJava will create a new folder C:/
myFolder where the output file customerselection.txt will be saved. You can customize the
command in accordance with actual practice.

3. Double-click tFileOutputDelimited to open its Basic settings view.

4. Select Use Output Stream check box to enable the Output Stream field in which you can define
the output stream using command.
Fill in the Output Stream field with following command:

(java.io.OutputStream)globalMap.get("out_file")

Note:
You can customize the command in the Output Stream field by pressing CTRL+SPACE to select
built-in command from the list or type in the command into the field manually in accordance
with actual practice. In this scenario, the command we use in the Output Stream field will call
the java.io.OutputStream class to output the filtered data stream to a local file which is
defined in the Code area of tJava in this scenario.

5. Click Sync columns to retrieve the schema defined in the preceding component.

1121
tFileOutputDelimited

6. Leave rest of the components as they were in the previous scenario.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 or click Run on the Run tab to execute the Job.
The three specified columns Id, CustomerName and CustomerAddress are output in the defined
output file.

1122
tFileOutputExcel

tFileOutputExcel
Writes an MS Excel file with separated data values according to a defined schema.

tFileOutputExcel Standard properties


These properties are used to configure tFileOutputExcel running in the Standard Job framework.
The Standard tFileOutputExcel component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Write excel 2007 file format (xlsx / xlsm) Select this check box to write the processed data into the
.xlsx or .xlsm format of Excel 2007.

Use Output Stream Select this check box process the data flow of interest. Once
you have selected it, the Output Stream field displays and
you can type in the data flow of interest.
The data flow to be processed must be added to the flow
in order for this component to fetch these data via the
corresponding representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using
along with this component; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of writing manually, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.
For further information about how to use a stream, see
Reading data from a remote file in streaming mode on page
1020.

File Name Name or path to the output file.


This field becomes unavailable once you have selected the
Use Output Stream check box.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Sheet name Name of the xsl sheet.

1123
tFileOutputExcel

Warning: If a subJob contains multiple tFileOutputExc


el components that write the same excel file (that is,
the File Name options of these components point to
the same file), these components overwrite the same
xsl sheet and only the data of the tFileOutputExc
el component that is the last one to write the excel
file remains. To avoid data lost, make sure that these
tFileOutputExcel components are in different subJobs.

Include header Select this check box to include a header row to the output
file.

Append existing file Select this check box to add the new lines at the end of the
file.
Append existing sheet: Select this check box to add the new
lines at the end of the Excel sheet.

Is absolute Y pos. Select this check box to add information in specified cells:
First cell X: cell position on the X-axis (X-coordinate or
Abcissa).
First cell Y: cell position on the Y-axis (Y-coordinate).
Keep existing cell format: select this check box to retain the
original layout and format of the cell you want to write into.

Font Select in the list the font you want to use.

Define all columns auto size Select this check box if you want the size of all your
columns to be defined automatically. Otherwise, select the
Auto size check boxes next to the column names you want
their size to be defined automatically.

Protect file Select this check box and enter the password in the
Password field to protect the file using a password.
This component supports agile encryption.
This option is available when Write excel2007 file
format(xlsx) is selected and Use Output Stream is not
selected.

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

1124
tFileOutputExcel

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
designs. Related topic: see Talend Studio User Guide.

Advanced settings

Create directory if not exists This check box is selected by default. This option creates
the directory that will hold the output files if it does not
already exist.

Custom the flush buffer size Available when Select this check box to write the processed
data into theWrite excel2007 file format (xlsx) is selected
in the Basic settings view.
Select this check box to set the maximum number of rows
in the Row number field that are allowed in the buffer.

Advanced separator (for numbers) Select this check box to modify the separators you want to
use for numbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.

Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.

Don't generate empty file Select the check box to avoid the generation of an empty
file.

Recalculate formula Select this check box if you need to recalculate formula(s) in
the specified Excel file.
This check box appears only when you select all these three
check boxes: Write excel2007 file format(xlsx), Append
existing file, and Append existing sheet.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

1125
tFileOutputExcel

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to write an MS Excel file with data
passed on from other components using a Row link.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenario
For tFileOutputExcel related scenario, see tSugarCRMInput (deprecated);
For scenario about the usage of Use Output Stream check box, see Utilizing Output Stream to save
filtered data to a local file on page 1120.

1126
tFileOutputJSON

tFileOutputJSON
Receives data and rewrites it in a JSON structured data block in an output file.

tFileOutputJSON Standard properties


These properties are used to configure tFileOutputJSON running in the Standard Job framework.
The Standard tFileOutputJSON component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

File Name Name and path of the output file.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Generate an array json Select this check box to generate an array JSON file.

Name of data block Enter a name for the data block to be written, between
double quotation marks.
This field disappears when the Generate an array json check
box is selected.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the Output component.

1127
tFileOutputJSON

Advanced settings

Create directory if not exists This check box is selected by default. This option creates
the directory that will hold the output files if it does not
already exist.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to rewrite received data in a JSON


structured output file.

Writing a JSON structured file


This is a 2 component scenario in which a tRowGenerator component generates random data which a
tFileOutputJSON component then writes to a JSON structured output file.

Procedure
Procedure
1. Drop a tRowGenerator and a tFileOutputJSON component onto the workspace from the Palette.
2. Link the components using a Row > Main connection.
3. Double click tRowGenerator to define its Basic Settings properties in the Component view.

1128
tFileOutputJSON

4. Click [...] next to Edit Schema to display the corresponding dialog box and define the schema.

5. Click [+] to add the number of columns desired.


6. Under Columns type in the column names.
7. Under Type, select the data type from the list.
8. Click OK to close the dialog box.
9. Click [+] next to RowGenerator Editor to open the corresponding dialog box.

1129
tFileOutputJSON

10. Under Functions, select pre-defined functions for the columns, if required, or select [...] to set
customized function parameters in the Function parameters tab.
11. Enter the number of rows to be generated in the corresponding field.
12. Click OK to close the dialog box.
13. Click tFileOutputJSON to set its Basic Settings properties in the Component view.

14. Click [...] to browse to where you want the output JSON file to be generated and enter the file
name.
15. Enter a name for the data block to be generated in the corresponding field, between double
quotation marks.
16. Select Built-In as the Schema type.
17. Click Sync Columns to retrieve the schema from the preceding component.
18. Press F6 to run the Job.

Results
The data from the input schema is written in a JSON structured data block in the output file.

1130
tFileOutputLDIF

tFileOutputLDIF
Writes or modifies an LDIF file with data separated in respective entries based on the schema defined,
or else deletes content from an LDIF file.
tFileOutputLDIF outputs data to an LDIF type of file which can then be loaded into an LDAP directory.

tFileOutputLDIF Standard properties


These properties are used to configure tFileOutputLDIF running in the Standard Job framework.
The Standard tFileOutputLDIF component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

File Name Specify the path to the LDIF output file.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Wrap Specify the number of characters at which the line will be


wrapped.

Change type Select a changetype that defines the operation you want to
perform on the entries in the output LDIF file.
• Add: the LDAP operation for adding the entry.
• Modify: the LDAP operation for modifying the entry.
• Delete: the LDAP operation for deleting the entry.
• Modrdn: the LDAP operation for modifying an entry's
RDN (Relative Distinguished Name).
• Default: the default LDAP operation.

Multi-Values / Modify Detail Specify the attributes for multi-value fields when Add or
Default is selected from the Change type list or provide the
detailed modification information when Modify is selected
from the Change type list.
• Column: The Column cells are automatically filled with
the defined schema column names.
• Operation: Select an operation to be performed on
the corresponding field. This column is available only
when Modify is selected from the Change type list.
• MultiValue: Select the check box if the corresponding
field is a multi-value field.
• Separator: Specify the value separator in the
corresponding multi-value field.
• Binary: Select the check box if the corresponding field
represents binary data.
• Base64: Select the check box if the corresponding
field should be base-64 encoded. The base-64
encoded data in the LDIF file is represented by the ::
symbol.
This table is available only when Add, Modify, or Default is
selected from the Change type list.

1131
tFileOutputLDIF

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the Output component.

Append Select this check box to add the new rows at the end of the
file.

Advanced settings

Enforce safe base 64 conversion Select this check box to enable the safe base-64 encoding.
For more detailed information about the safe base-64
encoding, see https://www.ietf.org/rfc/rfc2849.txt.

Create directory if not exists This check box is selected by default. It creates the
directory that holds the output delimited file, if it does not
already exist.

Custom the flush buffer size Select this check box to specify the number of lines to
write before emptying the buffer.

Row number Type in the number of lines to write before emptying the bu
ffer.
This field is available only when the Custom the flush
buffer size check box is selected.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

Don't generate empty file Select this check box if you do not want to generate empty
files.

1132
tFileOutputLDIF

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is used to write an LDIF file with data
passed on from an input component using a Row > Main
connection.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Writing data from a database table into an LDIF file


This scenario describes a Job that loads data into a database table, and then extracts the data from
the table and writing the data into a new LDIF file.

1133
tFileOutputLDIF

Adding and linking components


Procedure
1. Create a new Job and add the following components by typing their names in the design
workspace or dropping them from the Palette: a tFixedFlowInput component, a tMysqlOutput
component, a tMysqlInput component, and a tFileOutputLDIF component.
2. Link tFixedFlowInput to tMysqlOutput using a Row > Main connection.
3. Link tMysqlInput to tFileOutputLDIF using a Row > Main connection.
4. Link tFixedFlowInput to tMysqlInput using a Trigger > On Subjob Ok connection.

Configuring the components


Loading data into a database table

Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.

2. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
four columns: dn, id_owners, registration, and make, all of String type.

1134
tFileOutputLDIF

3. Click OK to close the schema editor and accept the propagation prompted by the pop-up dialog
box.
4. In the Mode area, select Use Inline Content(delimited file), and then in the Content field
displayed, enter the following input data:24;24;5382 KC 94;Volkswagen 32;32;9591
0E 79;Honda 35;35;3129 VH 61;Volkswagen
5. Double-click tMysqlOutput to open its Basic settings view.

6. Fill in the Host, Port, Database, Username, and Password fields with your MySQL database
connection details.
7. In the Table field, enter the name of the table into which the data will be written. In this example,
it is ldifdata.
8. Select Drop table if exists and create from the Action on table drop-down list.

1135
tFileOutputLDIF

Extracting data from the database table and writing it into an LDIF file

Procedure
1. Double-click tMysqlInput to open its Basic settings view.

2. Fill in the Host, Port, Database, Username, and Password fields with your MySQL database
connection details.
3. Click the [...] button next to Edit schema and in the pop-up window define the schema by adding
four columns: dn, id_owners, registration, and make, all of String type.
4. In the Table Name field, enter the name of the table from which the data will be read. In this
example, it is ldifdata.
5. Click the Guess Query button to fill in the Query field with the auto-generated query.
6. Double-click tFileOutputLDIF to open its Basic settings view.

7. In the File Name field, browse to or enter the path to the LDIF file to be generated. In this
example, it is E:/out.ldif.

1136
tFileOutputLDIF

8. Select the operation Add from the Change type list.


9. Click the Sync columns button to retrieve the schema from the preceding component.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 or click Run on the Run tab to execute the Job.

The LDIF file created contains the data from the database table and the change type for the
entries is set to add.

1137
tFileOutputMSDelimited

tFileOutputMSDelimited
Creates a complex multi-structured delimited file, using data structures (schemas) coming from
several incoming Row flows.

tFileOutputMSDelimited Standard properties


These properties are used to configure tFileOutputMSDelimited running in the Standard Job
framework.
The Standard tFileOutputMSDelimited component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

File Name Name and path to the file to be created and/or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Row Separator String (ex: "\n"on Unix) to distinguish rows.

Field Separator Character, string or regular expression to separate fields.

Use Multi Field Separators Select this check box to set a different field separator for
each of the schemas using the Field separator field in the
Schemas area.

Schemas The table gets automatically populated by schemas


coming from the various incoming rows connected to
tFileOutputMSDelimited. Fill out the dependency between
the various schemas:
Parent row: Type in the parent flow name (based on the
Row name transferring the data).
Parent key column: Type in the key column of the parent
row.
Key column: Type in the key column for the selected row.

Advanced settings

Advanced separator (for numbers) Select this check box to modify the separators used for nu
mbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.

CSV options Select this check box to take into account all parameters
specific to CSV files, in particular Escape char and Text
enclosure parameters.

1138
tFileOutputMSDelimited

Create directory if not exists This check box is selected by default. It creates the
directory that holds the output delimited file, if it does not
already exist.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

Don't generate empty file Select this check box if you do not want to generate empty
files.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to write a multi-schema delimited file


and separate fields using a field separator value.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
No scenario is available for the Standard version of this component yet.

1139
tFileOutputMSPositional

tFileOutputMSPositional
Creates a complex multi-structured file, using data structures (schemas) coming from several
incoming Row flows.

tFileOutputMSPositional Standard properties


These properties are used to configure tFileOutputMSPositional running in the Standard Job
framework.
The Standard tFileOutputMSPositional component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

File Name Name and path to the file to be created and/or variable to
be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Row separator String (ex: "\n"on Unix) to distinguish rows.

Schemas The table gets automatically populated by schemas


coming from the various incoming rows connected to
tFileOutputMSPositional. Fill out the dependency between
the various schemas:
Parent row: Type in the parent flow name (based on the
Row name transferring the data).
Parent key column: Type in the key column of the parent
row
Key column: Type in the key column for the selected row.
Pattern: Type in the pattern that positions the fields
separator for each incoming row.
Padding char: type in the padding character to be used
Alignment: Select the relevant alignment parameter

Advanced settings

Advanced separator (for numbers) Select this check box to modify the separators used for nu
mbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.

Create directory if not exists This check box is selected by default. It creates the
directory that holds the output delimited file, if it does not
already exist.

1140
tFileOutputMSPositional

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is a Flow
variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is a
Flow variable and it returns an integer.
NB_LINE_UNKOWN_HEADER_TYPES: the number of rows
with unknown header type. This is a Flow variable and it
returns an integer.
NB_LINE_PARSE_ERRORS: the number of rows with parse
errors. This is a Flow variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to write a multi-schema positional file


and separate fields using a position separator value.

Related scenarios
No scenario is available for the Standard version of this component yet.

1141
tFileOutputMSXML

tFileOutputMSXML
Creates a complex multi-structured XML file, using data structures (schemas) coming from several
incoming Row flows.

tFileOutputMSXML Standard properties


These properties are used to configure tFileOutputMSXML running in the Standard Job framework.
The Standard tFileOutputMSXML component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

File Name Name and path to the file to be created and or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Configure XML tree Opens the dedicated interface to help you set the XML
mapping. For details about the interface, see Defining the
MultiSchema XML tree on page 1143.

Advanced settings

Create directory only if not exists This check box is selected by default. It creates the
directory that holds the output delimited file, if it does not
already exist.

Advanced separator (for numbers) Select this check box to modify the separators used for nu
mbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

Don't generate empty file Select this check box if you do not want to generate empty
files.

Trim the whitespace characters Select this check box to remove leading and trailing
whitespace from the columns.

Escape text Select this check box to escape special characters.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

1142
tFileOutputMSXML

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Defining the MultiSchema XML tree


Double-click on the tFileOutputMSXML component to open the dedicated interface or click on the
three-dot button on the Basic settings vertical tab of the Component tab.

To the left of the mapping interface, under Linker source, the drop-down list includes all the input
schemas that should be added to the multi-schema output XML file (only if more than one input flow
is connected to the tFileOutputMSXML component).
And under Schema List, are listed all columns retrieved from the input data flow in selection.

1143
tFileOutputMSXML

To the right of the interface, are expected all XML structures you want to create in the output XML
file.
You can create manually or easily import the XML structures. Then map the input schema columns
onto each element of the XML tree, respectively for each of the input schemas in selection under
Linker source.

Importing the XML tree


The easiest and most common way to fill out the XML tree panel, is to import a well-formed XML file.

Procedure
1. Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2. Right-click on the root tag to display the contextual menu.
3. On the menu, select Import XML tree.
4. Browse to the file to import and click OK.
• You can import an XML tree from files in XML, XSD and DTD formats.
• When importing an XML tree structure from an XSD file, you can choose an element as the
root of your XML tree.
The XML Tree column is hence automatically filled out with the correct elements.
5. If you need to add or remove an element or sub-elements, right-click the relevant element of the
tree to display the contextual menu.
6. Select Delete to remove the selection from the tree or select the relevant option among: Add sub-
element, Add attribute, Add namespace to enrich the tree.

Creating the XML tree manually


If you don't have any XML structure defined as yet, you can create it manually.

Procedure
1. Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2. Right-click on the root tag to display the contextual menu.
3. On the menu, select Add sub-element to create the first element of the structure.
4. If you need to add an attribute or a child element to any element or remove any element, right-
click the left of the corresponding element name to display the contextual menu.
5. Right-click to the left of the element name to display the contextual menu.
6. On the menu, select the relevant option among: Add sub-element, Add attribute, Add namespace
or Delete.

Mapping XML data from multiple schema sources


Once your XML tree is ready, select the first input schema that you want to map.
You can map each input column with the relevant XML tree element or sub-element to fill out the
Related Column.

1144
tFileOutputMSXML

Procedure
1. Click on one of the Schema column name.
2. Drag it onto the relevant sub-element to the right.
3. Release the mouse button to implement the actual mapping.

A light blue link displays that illustrates this mapping. If available, use the Auto-Map button,
located to the bottom left of the interface, to carry out this operation automatically.
4. If you need to disconnect any mapping on any element of the XML tree, select the element and
right-click to the left of the element name to display the contextual menu
5. Select Disconnect link.
The light blue link disappears.

Defining the node status


Defining the XML tree and mapping the data is not sufficient. You also need to define the loop
elements for each of the source in selection and if required the group element.

Define a loop element


The loop element allows you to define the iterating object. Generally the Loop element is also the
row generator.

About this task


To define an element as loop element:

Procedure
1. Select the relevant element on the XML tree.
2. Right-click to the left of the element name to display the contextual menu.
3. Select Set as Loop Element.

Results
The Node Status column shows the newly added status.
There can only be one loop element at a time.

1145
tFileOutputMSXML

Define a group element


The group element is optional, it represents a constant element where the groupby operation can be
performed. A group element can be defined only if a loop element was defined before.

About this task


When using a group element, the rows should sorted, in order to be able to group by the selected
node.
To define an element as group element:

Procedure
1. Select the relevant element on the XML tree.
2. Right-click to the left of the element name to display the contextual menu.
3. Select Set as Group Element.

Results
The Node Status column shows the newly added status and any group status required are
automatically defined, if needed.
Click OK once the mapping is complete to validate the definition and continue the job configuration
where needed.

Related scenarios
No scenario is available for the Standard version of this component yet.

1146
tFileOutputPositional

tFileOutputPositional
Writes a file row by row according to the length and the format of the fields or columns in a row.

tFileOutputPositional Standard properties


These properties are used to configure tFileOutputPositional running in the Standard Job framework.
The Standard tFileOutputPositional component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-In or Repository.

  Built-In: No property data stored centrally.

  Repository: Select the repository file where the properties


are stored.

Use Output Stream Select this check box process the data flow of interest. Once
you have selected it, the Output Stream field displays and
you can type in the data flow of interest.
The data flow to be processed must be added to the flow
in order for this component to fetch these data via the
corresponding representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using
along with this component; otherwise, you could define it
manually and use it according to the design of your Job, for
example, using tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you
could select the variable of interest from the auto-completio
n list (Ctrl+Space) to fill the current field on condition that
this variable has been properly defined.
For further information about how to use a stream, see
Reading data from a remote file in streaming mode on page
1020.

File Name Name or path to the file to be processed and or the variable
to be used.
This field becomes unavailable once you have selected the
Use Output Stream check box.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

1147
tFileOutputPositional

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Row separator The separator used to identify the end of a row.

Append Select this check box to add the new rows at the end of the
file.

Include header Select this check box to include the column header to the fi
le.

Compress as zip file Select this check box to compress the output file in zip fo
rmat.

Formats Customize the positional file data format and fill in the
columns in the Formats table.
Column: Select the column you want to customize.
Size: Enter the column size.
Padding char: Type in between quotes the padding
characters used. A space by default.
Alignment: Select the appropriate alignment parameter.
Keep: If the data in the column or in the field are too long,
select the part you want to keep.

Advanced settings

Advanced separator (for numbers) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.

Use byte length as the cardinality Select this check box to add support of double-byte
character to this component. JDK 1.6 is required for this
feature.

1148
tFileOutputPositional

Create directory if not exists This check box is selected by default. It creates a directory
to hold the output table if it does not exist.

Custom the flush buffer size Select this check box to define the number of lines to write
before emptying the buffer.
Row Number: set the number of lines to write.

Output in row mode Writes in row mode.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

Don't generate empty file Select this check box if you do not want to generate empty
files.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to read a file and separate the fields
using the specified separator.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic

1149
tFileOutputPositional

settings view. Once a dynamic parameter is defined, the


Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenario
For a related scenario, see Reading data using a Regex and outputting the result to Positional file on
page 1089.
For scenario about the usage of Use Output Stream check box, see Utilizing Output Stream to save
filtered data to a local file on page 1120.

1150
tFileOutputProperties

tFileOutputProperties
Writes a configuration file, of the type .ini or .properties, containing text data organized according to
the model key = value.

tFileOutputProperties Standard properties


These properties are used to configure tFileOutputProperties running in the Standard Job framework.
The Standard tFileOutputProperties component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
For this component, the schema is read-only. It is made of
two column, Key and Value, corresponding to the parameter
name and the parameter value to be copied.

File format Select from the list file format: either .properties or .ini.

  .properties: data in the configuration file is written in two


lines and structured according to the following way: key =
value.

  .ini: data in the configuration file is written in two lines and


structured according to the following way: key = value and
re-grouped in sections.
Section Name: enter the section name on which the
iteration is based.

File Name Name or path to the file to be processed and/or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Advanced settings

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

1151
tFileOutputProperties

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to write files where data is organized
according to the structure key = value.

Related scenarios
For a related scenario, see Reading and matching the keys and the values of different .properties files
and outputting the results in a glossary on page 1080 of tFileInputProperties on page 1079.

1152
tFileOutputRaw

tFileOutputRaw
Provides data coming from another component, in the form of a single column of output data.

tFileOutputRaw Standard properties


These properties are used to configure tFileOutputRaw running in the Standard Job framework.
The Standard tFileOutputRaw component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: No property data stored centrally.

  Repository: Select the repository file where the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Filename The name of and path to the output file to be processed,


which you can enter manually between double quotes or
browse and select by clicking the [...] button.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Encoding If the output is a string, select the encoding type from the
list or select Custom and define it manually.

Die on error Select this check box to stop the execution of the Job when
an error occurs. Clear the check box to skip the row on error
and complete the process for error-free rows. If needed, you
can collect the rows on error using a Row > Reject link.
To catch the FileNotFoundException, you also need to
select this check box.

1153
tFileOutputRaw

Advanced settings

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables FILENAME_PATH: the path of the input file. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use the tFileOutputRaw component to receive data coming


from a data source that provides its data in a single column.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

1154
tFileOutputXML

tFileOutputXML
Writes an XML file with separated data values according to a defined schema.

tFileOutputXML Standard properties


These properties are used to configure tFileOutputXML running in the Standard Job framework.
The Standard tFileOutputXML component belongs to the File and the XML families.
The component in this framework is available in all Talend products.

Basic settings

File Name Name or path to the output file and/or the variable to be
used.
Related topic: see Defining variables from the Component
view section in Talend Studio User Guide

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Incoming record is a document Select this check box if the data from the preceding
component is in XML format.
When this check box is selected, a Column list appears
allowing you to select a Document type column of the
schema that holds the data, and the Row tag field d
isappears.
When this check box is selected, in the Advanced settings
view, only the check boxes Create directory if not exists,
Don't generate empty file, Trim data, tStatCatcher Statistics
and the list Encoding are available.

Row tag Specify the tag that will wrap data and structure per row.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

1155
tFileOutputXML

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the input component.

Advanced settings

Split output in several files If the output is big, you can split the output into several
files, each containing the specified number of rows.
Rows in each output file: Specify the number of rows in each
output file.

Create directory if not exists This check box is selected by default. It creates a directory
to hold the output XML files if required.

Root tags Specify one or more root tags to wrap the whole output file
structure and data. The default root tag is root.

Output format Define the output format.


• Column: The columns retrieved from the input schema.
• As attribute: select check box for the column(s) you
want to use as attribute(s) of the parent element in the
XML output.

Note:
If the same column is selected in both the Output format
table as an attribute and in the Use dynamic grouping
setting as the criterion for dynamic grouping, only the
dynamic group setting will take effect for that column.

Use schema column name: By default, this check box is


selected for all columns so that the column labels from the
input schema are used as data wrapping tags. If you want
to use a different tag than from the input schema for any
column, clear this check box for that column and specify a
tag label between quotation marks in the Label field.

Use dynamic grouping Select this check box if you want to dynamically group the
output columns. Click the plus button to add one ore more
grouping criteria in the Group by table.
Column: Select a column you want to use as a wrapping
element for the grouped output rows.
Attribute label: Enter an attribute label for the group
wrapping element, between quotation marks.

Custom the flush buffer size Select this check box to define the number of rows to buffer
before the data is written into the target file and the buffer
is emptied.
Row Number: Specify the number of rows to buffer.

1156
tFileOutputXML

Advanced separator (for numbers) Select this check box to change the separator used for
numbers. By default, the thousands separator is a comma (,)
and the decimal separator is a period (.).
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

Don't generate empty file Select the check box to avoid the generation of an empty fi
le.

Trim data Select this check box to remove the spaces at the beginning
and at the end of the text, and merge multiple consecutive
spaces into one within the text.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Use this component to write an XML file with data passed
on from other components using a Row link.

Related scenarios
For related scenarios using tFileOutputXML, see Reading a Positional file and saving filtered results to
XML on page 1075 and Using a SOAP message from an XML file to get country name information and
saving the information to an XML file on page 3454.

1157
tFileProperties

tFileProperties
Creates a single row flow that displays the main properties of the processed file.

tFileProperties Standard properties


These properties are used to configure tFileProperties running in the Standard Job framework.
The Standard tFileProperties component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. It describes the
main properties of the specified file. You can click the [...]
button next to Edit schema to view the predefined schema
which contains the following fields:
• abs_path: the absolute path of the file.
• dirname: the directory of the file.
• basename: the name of the file.
• mode_string: the access mode of the file, r and w for
read and write permissions respectively.
• size: the file size in bytes.
• mtime: the timestamp indicating when the file was
last modified, in milliseconds that have elapsed since
the Unix epoch (00:00:00 UTC, Jan 1, 1970).
• mtime_string: the date and time the file was last
modified.

File Name or path to the file to be processed and/or the variable


to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Calculate MD5 Hash Select this check box to check the MD5 of the downloaded
file.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the

1158
tFileProperties

Die on error check box is cleared, if the component has this


check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as standalone component.

Connections Outgoing links (from this component to another):


Row: Main; Iterate.
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

Incoming links (from one component to this one):


Row: Iterate.
Trigger: Run if; On Subjob Ok; On Subjob Error; On
component Ok; On Component Error; Synchronize; Paralle
lize.

For further information regarding connections, see Talend


Studio User Guide.

Displaying the properties of a processed file


This Java scenario describes a very simple Job that displays the properties of the specified file.

Procedure
Procedure
1. Drop a tFileProperties component and a tLogRow component from the Palette onto the design
workspace.
2. Right-click on tFileProperties and connect it to tLogRow using a Main Row link.

3. In the design workspace, select tFileProperties.


4. Click the Component tab to define the basic settings of tFileProperties.

1159
tFileProperties

5. Set Schema type to Built-In.


6. If desired, click the Edit schema button to see the read-only columns.
7. In the File field, enter the file path or browse to the file you want to display the properties for.
8. In the design workspace, select tLogRow and click the Component tab to define its basic settings.
For more information, see tLogRow on page 1977.
9. Press F6 to execute the Job.

Results
The properties of the defined file are displayed on the console.

1160
tFileRowCount

tFileRowCount
Opens a file and reads it row by row in order to determine the number of rows inside.

tFileRowCount Standard properties


These properties are used to configure tFileRowCount running in the Standard Job framework.
The Standard tFileRowCount component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

File Name Name or path to the file to be processed and/or the variable
to be used.
For further information about how to define and use a
variable in a Job, see Talend Studio User Guide.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Row separator String (ex: "\n"on Unix) to distinguish rows in the output
file.

Ignore empty rows Select this check box to ignore the empty rows while the
component is counting the rows in the file.

Encoding Select the encoding type from the list or select Custom
and define it manually. This field is compulsory for DB data
handling.

Advanced settings

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables COUNT: the number of rows in a file. This is a Flow variable
and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

1161
tFileRowCount

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule tFileRowCount is a standalone component, it must be used


with a OnSubjobOk connection to tJava.

Connections Outgoing links (from this component to another):


Row: Main; Iterate.
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

Incoming links (from one component to this one):


Row: Main; Reject; Iterate.
Trigger: On Subjob Ok; On Subjob Error; Run if; On
component Ok; On Component Error; Synchronize; Paralle
lize.

For further information regarding connections, see Talend


Studio User Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Writing a file to MySQL if the number of its records matches


a reference value
In this scenario, tFileRowCount counts the number of records in a .txt file, which is compared against
a reference value through tJava. Once the two values match, the .txt file will be written to a MySQL
table.
The .txt file has two records:

1;andy
2;mike

Linking the components


Procedure
1. Drop tFileRowCount, tJava, tFlieInputDelimited, and tMysqlOutput from the Palette onto the
design workspace.
2. Link tFileRowCount to tJava using an OnSubjobOk trigger.
3. Link tJava to tFlieInputDelimited using a Run if trigger.
4. Link tFlieInputDelimited to tMysqlOutput using a Row > Main connection.

1162
tFileRowCount

Configuring the components


Procedure
1. Double-click tFileRowCount to open its Basic settings view.

2. In the File Name field, type in the full path of the .txt file. You can also click the [...] button to
browse for this file.
Select the Ignore empty rows check box.
3. Double-click tJava to open its Basic settings view.

In the Code box, enter the function to print out the number of rows in the file:

System.out.println(globalMap.get("tFileRowCount_1_COUNT"));

4. Click the if trigger connection to open its Basic settings view.

1163
tFileRowCount

In the Condition box, enter the statement to judge if the number of rows is 2:

((Integer)globalMap.get("tFileRowCount_1_COUNT"))==2

This if trigger means that if the row count equals 2, the rows of the .txt file will be written to
MySQL.
5. Double-click tFlieInputDelimited to open its Basic settings view.

In the File name/Stream field, type in the full path of the .txt file. You can also click the [...]
button to browse for this file.
6. Click the Edit schema button open the schema editor.

7. Click the [+] button to add two columns, namely id and name, respectively of the integer and
string type.
8. Click the Yes button in the pop-up box to propagate the schema setup to the following
component.

1164
tFileRowCount

9. Double-click tMysqlOutput open its Basic settings view.

10. In the Host and Port fields, enter the connection details.
In the Database field, enter the database name.
In the Username and Password fields, enter the authentication details.
In the Table field, enter the table name, for instance "staff".
11. In the Action on table list, select Create table if not exists.
In the Action on data list, select Insert.

Executing the Job


Procedure
1. Press Ctrl+S to save the Job.
2. Press F6 to run the Job.

As shown above, the Job has been executed successfully and the number of rows in the .txt file
has been printed out.
3. Go to the MySQL GUI and open the table staff.

As shown above, the table has been created with the two records inserted.

1165
tFileTouch

tFileTouch
Creates an empty file or, if the specified file already exists, updates its date of modification and of last
access while keeping the contents unchanged.

tFileTouch Standard properties


These properties are used to configure tFileTouch running in the Standard Job framework.
The Standard tFileTouch component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

File Name Path and name of the file to be created and/or the variable
to be used.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Create directory if not exists This check box is selected by default. It creates a directory
to hold the output table if it does not exist.

Advanced settings

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as a standalone component.

Connections Outgoing links (from this component to another):


Row: Main.

1166
tFileTouch

Trigger: On Subjob Ok; On Subjob Error; Run if; On


Component Ok; On Component Error.

Incoming links (from one component to this one):


Row: Main; Reject; Iterate.
Trigger: Run if; On Subjob Ok; On Subjob Error; On
component Ok; On Component Error; Synchronize;
Parallelize.

For further information regarding connections, see Talend


Studio User Guide.

Related scenarios
No scenario is available for the Standard version of this component yet.

1167
tFileUnarchive

tFileUnarchive
Decompresses an archive file for further processing, in one of the following formats: *.tar.gz , *.tgz,
*.tar, *.gz and *.zip.

tFileUnarchive Standard properties


These properties are used to configure tFileUnarchive running in the Standard Job framework.
The Standard tFileUnarchive component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Archive file File path to the archive.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Extraction directory Folder where the unzipped file(s) will be put.

Warning: Use absolute path (instead of relative path) for


this field to avoid possible errors.

Use archive file name as root directory Select this check box to create a folder named as the
archive, if it does not exist, under the specified directory and
extract the zipped file(s) to that folder.

Check the integrity before unzip Select this check box to run an integrity check before
unzipping the archive.

Extract file paths Select this check box to reproduce the file path structure
zipped in the archive.

Need a password Select this check box and provide the correct decrypt
method and password if the archive to be unzipped is
password protected. Note that the encrypted archive must
be one created by the tFileArchive component; otherwise
you will see error messages or get nothing extracted even if
no error message is displayed.
Decrypt method: select the decrypt method from the list,
either Java Decrypt or Zip4j Decrypt.
Enter the password: enter the decryption password.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Advanced settings

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

1168
tFileUnarchive

Global Variables

Global Variables CURRENT_FILE: the current file name. This is a Flow


variable and it returns a string.
CURRENT_FILEPATH: the current file path. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as a standalone component


but it can also be used within a Job as a Start component
using an Iterate link.

Connections Outgoing links (from this component to another):


Row: Iterate.
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

Incoming links (from one component to this one):


Row: Iterate.
Trigger: Run if; On Subjob Ok; On Subjob Error; On
component Ok; On Component Error; Synchronize;
Parallelize.

For further information regarding connections, see Talend


Studio User Guide.

Limitation
Warning:
Such files can be decompressed: *.tar.gz , *.tgz, *.tar, *.gz and
*.zip.

Related scenario
For tFileUnarchive related scenario, see tFileCompare on page 984.

1169
tFilterColumns

tFilterColumns
Homogenizes schemas either by ordering the columns, removing unwanted columns or adding new
columns.

tFilterColumns Standard properties


These properties are used to configure tFilterColumns running in the Standard Job framework.
The Standard tFilterColumns component belongs to the Processing family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the
previous component in the Job.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the

1170
tFilterColumns

Die on error check box is cleared, if the component has this


check box.
NB_LINE: the number of rows read by an input component
or transferred to an output component. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is not startable (green background) and it


requires an output component.

Related Scenario
For more information regarding the tFilterColumns component in use, see Cleaning up and filtering a
CSV file on page 3027.

1171
tFilterRow

tFilterRow
Filters input rows by setting one or more conditions on the selected columns.

tFilterRow Standard properties


These properties are used to configure tFilterRow running in the Standard Job framework.
The Standard tFilterRow component belongs to the Processing family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
The schema of this component is built-in only.

Logical operator used to combine conditions Select a logical operator to combine simple conditions and
to combine the filter results of both modes if any advanced
conditions are defined.
And: returns the boolean value of true if all conditions are tr
ue; otherwise false. For each two conditions combined using
a logical AND, the second condition is evaluated only if the
first condition is evaluated to be true.
Or: returns the boolean value of true if any condition is true;
otherwise false. For each two conditions combined using a
logical OR, the second condition is evaluated only if the first
condition is evaluated to be false.

Conditions Click the plus button to add as many simple conditions


as needed. Based on the logical operator selected, the
conditions are evaluated one after the other in sequential
order for each row. When evaluated, each condition returns
the boolean value of true or false.
Input column: Select the column of the schema the function
is to be operated on
Function: Select the function on the list
Operator: Select the operator to bind the input column with
the value
Value: Type in the filtered value, between quotes if needed.

Use advanced mode Select this check box when the operations you want to
perform cannot be carried out through the standard
functions offered, for example, different logical operations
in the same component. In the text field, type in the regular
expression as required.
If multiple advanced conditions are defined, use a logical
operator between two conditions:
&& (logical AND): returns the boolean value of true if both
conditions are true; otherwise false. The second condition is
evaluated only if the first condition is evaluated to be true.

1172
tFilterRow

|| (logical OR): returns the boolean value of true if either


condition is true; otherwise false. The second condition is
evaluated only if the first condition is evaluated to be false.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows read by an input component
or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_OK: the number of rows matching the filter. This is
an After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is not startable (green background) and it


requires an output component.

Filtering a list of names using simple conditions


The following scenario shows a Job that uses simple conditions to filter a list of records. This scenario
will output two tables: the first will list all male persons with a last name shorter than nine characters
and aged between 10 and 80 years; the second will list all rejected records. An error message for each
rejected record will display in the same table to explain why such a record has been rejected.

1173
tFilterRow

Dropping and linking components


Procedure
1. Drop tFixedFlowInput, tFilterRow and tLogRow from the Palette onto the design workspace.
2. Connect the tFixedFlowInput to the tFilterRow, using a Row > Main link. Then, connect the
tFilterRow to the tLogRow, using a Row > Filter link.
3. Drop tLogRow from the Palette onto the design workspace and rename it as reject. Then, connect
the tFilterRow to the reject, using a Row > Reject link.
4. Label the components to better identify their roles in the Job.

Configuring the components


Procedure
1. Double-click tFixedFlowInput to display its Basic settings view and define its properties.
2. Click the [...] button next to Edit schema to define the schema for the input data. In this example,
the schema is made of the following four columns: LastName (type String), Gender (type String),
Age (type Integer) and City (type String).

When done, click OK to validate the schema setting and close the dialog box. A new dialog box
opens and asks you if you want to propagate the schema. Click Yes.

1174
tFilterRow

3. Set the row and field separators in the corresponding fields if needed. In this example, use the
default settings for both, namely the row separator is a carriage return and the field separator is a
semi-colon.
4. Select the Use Inline Content(delimited file) option in the Mode area and type in the input data in
the Content field.

The input data used in this example is shown below:

Van Buren;M;73;Chicago
Adams;M;40;Albany
Jefferson;F;66;New York
Adams;M;9;Albany
Jefferson;M;30;Chicago
Carter;F;26;Chicago
Harrison;M;40;New York
Roosevelt;F;15;Chicago
Monroe;M;8;Boston
Arthur;M;20;Albany
Pierce;M;18;New York
Quincy;F;83;Albany
McKinley;M;70;Boston
Coolidge;M;4;Chicago
Monroe;M;60;Chicago

5. Double-click tFilterRow to display its Basic settings view and define its properties.

1175
tFilterRow

6. In the Conditions table, add four conditions and fill in the filtering parameters.
• From the InputColumn list field of the first row, select LastName, from the Function list field,
select Length, from the Operator list field, select Lower than, and in the Value column, type in
9 to limit the length of last names to nine characters.
• From the InputColumn list field of the second row, select Gender, from the Operator list field,
select Equals, and in the Value column, type in M in double quotes to filter records of male
persons.

Warning:
In the Value field, you must type in your values between double quotes for all types of va
lues, except for integer values, which do not need quotes.

• From the InputColumn list field of the third row, select Age, from the Operator list field, select
Greater than, and in the Value column, type in 10 to set the lower limit to 10 years.
• From the InputColumn list field of the four row, select Age, from the Operator list field, select
Lower than, and in the Value column, type in 80 to set the upper limit to 80 years.
7. To combine the conditions, select And as that only those records that meet all the defined
conditions are accepted.
8. In the Basic settings of tLogRow components, select Table (print values in cells of a table) in the
Mode area.

Executing the Job


Procedure
Save your Job and press F6 to execute it.

As shown above, the first table lists the records of male persons aged between 10 and 80 years,
whose last names are made up of less than nine characters, and the second table lists all the records
that do not match the filter conditions. Each rejected record has a corresponding error message that
explains the reason of rejection.

1176
tFilterRow

Filtering a list of names through different logical operations


Based on the previous scenario, this scenario further filters the input data so that only those records
of people from New York and Chicago are accepted. Without changing the filter settings defined in
the previous scenario, advanced conditions are added in this scenario to enable both logical AND and
logical OR operations in the same tFilterRow component.

Procedure
Procedure
1. Double-click the tFilterRow component to show its Basic settings view.

2. Select the Use advanced mode check box, and type in the following expression in the text field:

input_row.City.equals("Chicago") || input_row.City.equals("New York")

This defines two conditions on the City column of the input data to filter records that contain the
cities of Chicago and New York, and uses a logical OR to combine the two conditions so that reco
rds satisfying either condition will be accepted.
3. Press Ctrl+S to save the Job and press F6 to execute it.

1177
tFilterRow

As shown above, the result list of the previous scenario has been further filtered, and only the
records containing the cities of New York and Chicago are accepted.

1178
tFirebirdClose

tFirebirdClose
Closes a transaction with a Firebird database.

tFirebirdClose Standard properties


These properties are used to configure tFirebirdClose running in the Standard Job framework.
The Standard tFirebirdClose component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tFirebirdConnection component in the list if more


than one connection are planned for the current Job.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is to be used along with Firebird


components, especially with tFirebirdConnection and
tFirebirdCommit.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

1179
tFirebirdClose

Related scenarios
No scenario is available for the Standard version of this component yet.

1180
tFirebirdCommit

tFirebirdCommit
Commits a global transaction instead of doing so on every row or every batch, thus providing a gain in
performance.

tFirebirdCommit Standard properties


These properties are used to configure tFirebirdCommit running in the Standard Job framework.
The Standard tFirebirdCommit component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tFirebirdConnection component in the list if more


than one connection are planned for the current Job.

Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.

Warning:
If you want to use a Row > Main connection to link
tFirebirdCommit to your Job, your data will be committed
row by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other


tFirebird* components, especially with the tFirebirdConne
ction and tFirebirdRollback components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an

1181
tFirebirdCommit

environment where you cannot change your Job settings, for


example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenario
For tFirebirdCommit related scenario, see Inserting data in mother/daughter tables on page 2426

1182
tFirebirdConnection

tFirebirdConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.

tFirebirdConnection Standard properties


These properties are used to configure tFirebirdConnection running in the Standard Job framework.
The Standard tFirebirdConnection component belongs to the Databases and the ELT families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Host name Database server IP address.

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.

1183
tFirebirdConnection

Advanced settings

Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.

Usage

Usage rule This component is more commonly used with other


tFirebird* components, especially with the tFirebirdCommit
and tFirebirdRollback components.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For tFirebirdConnection related scenario, see tMysqlConnection on page 2425

1184
tFirebirdInput

tFirebirdInput
Executes a database query on a Firebird database with a strictly defined order which must correspond
to the schema definition then passes on the field list to the next component via a Main row link.

tFirebirdInput Standard properties


These properties are used to configure tFirebirdInput running in the Standard Job framework.
The Standard tFirebirdInput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address

Port Listening port number of the DB server.

Database Name of the database

1185
tFirebirdInput

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.

Advanced Settings

Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Remove leading and trailing whitespace from defined


columns.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.

1186
tFirebirdInput

QUERY: the query statement being processed. This is a Flow


variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component covers all possible SQL queries for FireBird
databases.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:

1187
tFirebirdInput

See also related topic: Reading data from different MySQL databases using dynamically loaded
connection parameters on page 497.

1188
tFirebirdOutput

tFirebirdOutput
Executes the action defined on the table in a Firebird database and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.
tFirebirdOutput writes, updates, makes changes or suppresses entries in a database.

tFirebirdOutput Standard properties


These properties are used to configure tFirebirdOutput running in the Standard Job framework.
The Standard tFirebirdOutput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address

Port Listening port number of DB server.

1189
tFirebirdOutput

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.

Warning:
You must specify at least one column as a primary key on
which the Update and Delete operations are based. You can
do that by clicking Edit Schema and selecting the check
box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings
view where you can simultaneously define primary keys for
the update and delete operations. To do that: Select the
Use field options check box and then in the Key in update
column, select the check boxes next to the column name on
which you want to base the update operation. Do the same
in the Key in delete column for the deletion operation.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

1190
tFirebirdOutput

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.

Commit every Enter the number of rows to be completed before


committing batches of rows together into the DB. This
option ensures transaction quality (but not rollback) and,
above all, better performance at execution.

Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.

  Name: Type in the name of the schema column to be


altered or inserted as new column

  SQL expression: Type in the SQL statement to be executed


in order to alter or insert the relevant column data.

  Position: Select Before, Replace or After following the


action to be performed on the reference column.

1191
tFirebirdOutput

  Reference column: Type in a column of reference that the


tDBOutput can use to place or replace the new or altered
column.

Use field options Select this check box to customize a request, especially
when there is double action on data.

Debug query mode Select this check box to display each step during processing
entries in a database.

Support null in "SQL WHERE" statement Select this check box if you want to deal with the Null
values contained in a DB table.

Note:
Make sure the Nullable check box is selected for the c
orresponding columns in the schema.

Use Batch Select this check box to activate the batch mode for data
processing.

Note:
This check box is available only when you have selected
the Insert, Update, or Delete option in the Action on data
option.

Batch Size Specify the number of records to be processed in each


batch.
This field appears only when the Use batch mode check box
is selected.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

1192
tFirebirdOutput

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a Firebird database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.

1193
tFirebirdRollback

tFirebirdRollback
Cancels the transation committed in the connected Firebird database.

tFirebirdRollback Standard properties


These properties are used to configure tFirebirdRollback running in the Standard Job framework.
The Standard tFirebirdRollback component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tFirebirdConnection component in the list if more


than one connection are planned for the current Job.

Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other


tFirebird* components, especially with the tFirebirdConne
ction and tFirebirdCommit components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic

1194
tFirebirdRollback

settings and context variables, see Talend Studio User


Guide.

Related scenario
For tFirebirdRollback related scenario, see Rollback from inserting data in mother/daughter tables on
page 2429.

1195
tFirebirdRow

tFirebirdRow
Executes the stated SQL query on the specified Firebird database.
Depending on the nature of the query and the database, tFirebirdRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL
statements.
tFirebirdRow is the specific component for this database query. The row suffix means the component
implements a flow in the job design although it doesn't provide output.

tFirebirdRow Standard properties


These properties are used to configure tFirebirdRow running in the Standard Job framework.
The Standard tFirebirdRow component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

1196
tFirebirdRow

Host Database server IP address

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type Either Built-in or Repository .

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

1197
tFirebirdRow

Advanced settings

Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.

Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.

Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

1198
tFirebirdRow

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all possible SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:
• Combining two flows for selective output on page 2503
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.

1199
tFixedFlowInput

tFixedFlowInput
Generates a fixed flow from internal variables.

tFixedFlowInput Standard properties


These properties are used to configure tFixedFlowInput running in the Standard Job framework.
The Standard tFixedFlowInput component belongs to the Misc family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either built-in or remote in the
Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: You have already created the schema and


stored it in the Repository, hence can be reused in various
projects and job designs. Related topic: see Talend Studio
User Guide.

Mode From the three options, select the mode that you want to
use.
Use Single Table : Enter the data that you want to generate
in the relevant value field.
Use Inline Table : Add the row(s) that you want to generate.
Use Inline Content : Enter the data that you want to
generate, separated by the separators that you have already
defined in the Row and Field Separator fields.

Number of rows Enter the number of lines to be generated.

1200
tFixedFlowInput

Values Between inverted commas, enter the values corresponding


to the columns you defined in the schema dialog box via
the Edit schema button.

Advanced settings

tStat Catcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as a start or intermediate


component and thus requires an output component.

Related scenarios
For related scenarios, see:
• Buffering output data on the webapp server on page 421.
• Iterating on a DB table and listing its column names on page 2419.
• Filtering a list of names using simple conditions on page 1173.

1201
tFlowMeter

tFlowMeter
Counts the number of rows processed in the defined flow, so this number can be caught by the
tFlowMeterCatcher component for logging purposes.

tFlowMeter Standard properties


These properties are used to configure tFlowMeter running in the Standard Job framework.
The Standard tFlowMeter component belongs to the Logs & Errors family.
The component in this framework is available in all Talend products.

Basic settings

Use input connection name as label Select this check box to reuse the name given to the input
main row flow as label in the logged data.

Mode Select the type of values for the data measured: Absolute:
the actual number of rows is logged
Relative: a ratio (%) of the number of rows is logged. When
this option is selected, a Connections List shows to let you
select a reference connection.

Thresholds Adds a threshold to watch proportions in volumes


measured. you can decide that the normal flow has to be
between low and top end of a row number range, and if the
flow is under this low end, there is a bottleneck.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Cannot be used as a start component as it requires an input


flow to operate.

If you have a need of log, statistics and other measurement of your data flows, see Talend Studio User
Guide.

1202
tFlowMeter

Related scenario
For related scenario, see Catching flow metrics from a Job on page 1205

1203
tFlowMeterCatcher

tFlowMeterCatcher
Operates as a log function triggered by the use of a tFlowMeter component in the Job.
Based on a defined schema, the tFlowMeterCatcher catches the processing volumetric from the
tFlowMeter component and passes them on to the output component.

tFlowMeterCatcher Standard properties


These properties are used to configure tFlowMeterCatcher running in the Standard Job framework.
The Standard tFlowMeterCatcher component belongs to the Logs & Errors family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the fields to be
processed and passed on to the next component. In this
particular case, the schema is read-only, as this component
gathers standard log information including:

  Moment: Processing time and date

  Pid: Process ID

  Father_pid: Process ID of the father Job if applicable. If not


applicable, Pid is duplicated.

  Root_pid: Process ID of the root Job if applicable. If not


applicable, pid of current Job is duplicated.

  System_pid: Process id generated by the system

  Project: Project name, the Job belongs to.

  Job: Name of the current Job

  Job_repository_id: ID generated by the application.

  Job_version: Version number of the current Job

  Context: Name of the current context

  Origin: Name of the component if any

  Label: Label of the row connection preceding the


tFlowMeter component in the Job, and that will be analyzed
for volumetrics.

  Count: Actual number of rows being processed

  Reference: Number of rows passing the reference link.

  Thresholds: Only used when the relative mode is selected in


the tFlowMeter component.

1204
tFlowMeterCatcher

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is the start component of a secondary Job


which triggers automatically at the end of the main Job.

Limitation The use of this component cannot be separated from


the use of the tFlowMeter. For more information, see
tFlowMeter on page 1202

Catching flow metrics from a Job


The following basic Job aims at catching the number of rows being passed in the flow processed. The
measures are taken twice, once after the input component, that is, before the filtering step and once
right after the filtering step, that is, before the output component.

• Drop the following components from the Palette to the design workspace: tMysqlInput,
tFlowMeter (x2), tMap, tLogRow, tFlowMeterCatcher and tFileOutputDelimited.
• Link components using row main connections and click on the label to give consistent name
throughout the Job, such as US_States from the input component and filtered_states for the output
from the tMap component, for example.
• Link the tFlowMeterCatcher to the tFileOutputDelimited component using a row main link also as
data is passed.

1205
tFlowMeterCatcher

• On the tMysqlInput Component view, configure the connection properties as Repository, if the
table metadata are stored in the Repository. Or else, set the Type as Built-in and configure
manually the connection and schema details if they are built-in for this Job.

• The 50 States of the USA are recorded in the table states. In order for all 50 entries of the table to
get selected, the query to run onto the Mysql database is as follows:
select * from states.
• Select the relevant encoding type on the Advanced settings vertical tab.
• Then select the following component which is a tFlowMeter and set its properties.

• Select the check box Use input connection name as label, in order to reuse the label you chose in
the log output file (tFileOutputDelimited).
• The mode is Absolute as there is no reference flow to meter against, also no Threshold is to be set
for this example.
• Then launch the tMap editor to set the filtering properties.
• For this use case, drag and drop the ID and State columns from the Input area of the tMap towards
the Output area. No variable is used in this example.

1206
tFlowMeterCatcher

• On the Output flow area (labelled filtered_states in this example), click the arrow & plus button to
activate the expression filter field.
• Drag the State column from the Input area (row2) towards the expression filter field and type in
the rest of the expression in order to filter the state labels starting with the letter M. The final
expression looks like: row2.State.startsWith("M")
• Click OK to validate the setting.
• Then select the second tFlowMeter component and set its properties.

• Select the check box Use input connection name as label.


• Select Relative as Mode and in the Reference connections list, select US_States as reference to be
measured against.
• Once again, no threshold is used for this use case.
• No particular setting is required in the tLogRow.
• Neither does the tFlowMeterCatcher as this component's properties are limited to a preset
schema which includes typical log information.
• So eventually set the log output component (tFileOutputDelimited).

• Select the Append check box in order to log all tFlowMeter measures.

1207
tFlowMeterCatcher

• Then save your Job and press F6 to execute it.

The Run view shows the filtered state labels as defined in the Job.

In the delimited csv file, the number of rows shown in column count varies between tFlowMeter1
and tFlowMeter2 as the filtering has then been carried out. The reference column shows also this
difference.

1208
tFlowToIterate

tFlowToIterate
Reads data line by line from the input flow and stores the data entries in iterative global variables.

tFlowToIterate Standard properties


These properties are used to configure tFlowToIterate running in the Standard Job framework.
The Standard tFlowToIterate component belongs to the Orchestration family.
The component in this framework is available in all Talend products.

Basic settings

Use the default (key, value) in global variables When selected, the system uses the default value of the
global variable in the current Job.

Customize key: Type in a name for the new global variable. Press Ctrl
+Space to access all available variables either global or
user-defined.

  value: Click in the cell to access a list of the columns


attached to the defined global variable.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.
CURRENT_ITERATION: the sequence number of the current
iteration. This is a Flow variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule You cannot use this component as a start component.


tFlowToIterate requires an output component.

Connections Outgoing links (from this component to another):


Row: Iterate
Trigger: Run if; On Component Ok; On Component Error.

1209
tFlowToIterate

Incoming links (from one component to this one):


Row: Main;

For further information regarding connections, see Talend


Studio User Guide.

Transforming data flow to a list


The following scenario describes a Job that reads a list of files from a defined input file, iterates on
each of the files and displays their content row by row on the Run console.

Setting up the Job


Procedure
1. Drop the following components from the Palette onto the design workspace: two tFileInputDeli
mited components, a tFlowToIterate, and a tLogRow.
2. Connect the first tFileInputDelimited to tFlowToIterate using a Row > Main link, tFlowToIterate
to the second tFileInputDelimited using an Iterate link, and the second tFileInputDelimited to
tLogRow using a Row > Main link.

Configuring the Components


Procedure
1. Double-click the first tFileInputDelimited to display its Basic settings view.
2. Click the [...] button next to the File Name field to select the path to the input file.

Note:
The File Name field is mandatory.

1210
tFlowToIterate

The input file used in this scenario is Customers.txt. It is a text file that contains a list of names
of three other simple text files: Name.txt, E-mail.txt and Address.txt. The first text file, Name.txt,
is made of one column holding customers' names. The second text file, E-mail.txt, is made of
one column holding customers' e-mail addresses. The third text file, Address.txt, is made of one
column holding customers' postal addresses.
Fill in all other fields as needed. For more information, see tFileInputDelimited Standard
properties on page 1015. In this scenario, the header and the footer are not set and there is no
limit for the number of processed rows.
3. Click Edit schema to describe the data structure of this input file. In this scenario, the schema is
made of one column, FileName.

4. Double-click tFlowToIterate to display its Basic settings view.

Click the plus button to add new parameter lines and define your variables, and click in
the key cell to enter the variable name as desired. In this scenario, one variable is defined:
"Name_of_File".
Alternatively, you can select the Use the default (key, value) in global variables check box to use
the default in global variables.
5. Double-click the second tFileInputDelimited to display its Basic settings view.

1211
tFlowToIterate

In the File name field, enter the directory of the files to be read, and then press Ctrl+Space to
select the global variable "Name_of_File". In this scenario, the syntax is as follows:

"C:/scenario/flow_to_iterate/"+((String)globalMap.get("Name_of_File"))

Click Edit schema to define the schema column name. In this scenario, it is RowContent.
Fill in all other fields as needed. For more information, see tFileInputDelimited Standard
properties on page 1015.
6. In the design workspace, select the last component, tLogRow, and click the Component tab to
define its basic settings.

Define your settings as needed. For more information, see tLogRow Standard properties on page
1977.

Saving and executing the Job


Procedure
1. Save your Job by pressing Ctrl+S.
2. Execute the Job by pressing F6 or clicking Run on the Run tab.

1212
tFlowToIterate

Results
Customers' names, customers' e-mails, and customers' postal addresses appear on the console
preceded by the schema column name.

1213
tForeach

tForeach
Creates a loop on a list for an iterate link.

tForeach Standard properties


These properties are used to configure tForeach running in the Standard Job framework.
The Standard tForeach component belongs to the Orchestration family.
The component in this framework is available in all Talend products.

Basic settings

Values Use the [+] button to add rows to the Values table. Then
click on the fields to enter the list values to be iterated
upon, between double quotation marks.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at a component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
CURRENT_VALUE: the value currently iterated upon. This is
a Flow variable and it returns a string.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tForeach is an input component and requires an Iterate link


to connect it to another component.

Iterating on a list and retrieving the values


This scenario describes a two component Job in which a list is created and iterated upon in a tForeach
component. The values are then retrieved in a tJava component.

1214
tForeach

Setting up the Job


Procedure
1. Drop a tForeach and a tJava component onto the design workspace.
2. Link tForeach to tJava using a Row > Iterate connection.

Results

Configuring the components


Procedure
1. Double-click tForeach to open its Basic settings view:

2. Click the [+] button to add as many rows to the Values list as required.
3. Click on the Value fields to enter the list values, between double quotation marks.
4. Double-click tJava to open its Basic settings view:

5. Enter the following Java code in the Code area: System.out.println(globalMap


.get("tForeach_1_CURRENT_VALUE")+"_out");

1215
tForeach

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 to execute the Job.

Results
The tJava run view displays the list values retrieved from tForeach, each one suffixed with _out:

1216
tFTPClose

tFTPClose
Closes an active FTP connection to release the occupied resources.

tFTPClose Standard properties


These properties are used to configure tFTPClose running in the Standard Job framework.
The Standard tFTPClose component belongs to the Internet family.
The component in this framework is available in all Talend products.

Basic settings

Component list Select the component that opens the connection you need
to close from the list.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is more commonly used with other FTP
components, especially with the tFTPConnection compon
ent.

Related scenarios
• Listing and getting files/folders on an FTP directory on page 1230
• Putting files onto an FTP server on page 1246
• Renaming a file located on an FTP server on page 1253

1217
tFTPConnection

tFTPConnection
Opens an FTP connection to transfer files in a single transaction.

tFTPConnection Standard properties


These properties are used to configure tFTPConnection running in the Standard Job framework.
The Standard tFTPConnection component belongs to the Internet family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Host The IP address or hostname of the FTP server.

Port The listening port number of the FTP server.

Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.

Warning: This option does not work with an HTTP/


HTTPS proxy. If you need a proxy, set a SOCKS proxy in
the Advanced settings tab.

Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.

Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's

1218
tFTPConnection

version is greater than 3, the encoding should be UTF-8, or


else an error occurs.
This property is available only when the SFTP Support
check box is selected.

FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.

Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.

Keystore Password The password for your keystore file.


This property is available only when the FTPS Support
check box is selected.

Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.

Connection mode Select the connection mode from the list, either Passive or
Active.

Encoding Specify the encoding type by selecting an encoding type


from the list or selecting CUSTOM and enter the encoding
type manually.

Advanced settings

Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.

Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

1219
tFTPConnection

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is typically used as a single-component


subJob. It is used along with other FTP components.

Related scenarios
• Listing and getting files/folders on an FTP directory on page 1230
• Putting files onto an FTP server on page 1246
• Renaming a file located on an FTP server on page 1253

1220
tFTPDelete

tFTPDelete
Deletes files or folders in a specified directory on an FTP server.

tFTPDelete Standard properties


These properties are used to configure tFTPDelete running in the Standard Job framework.
The Standard tFTPDelete component belongs to the Internet family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Host The IP address or hostname of the FTP server.

Port The listening port number of the FTP server.

Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Remote directory The directory where the files/folders to be deleted are


located.

Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.

SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.

1221
tFTPDelete

Warning: This option does not work with an HTTP/


HTTPS proxy. If you need a proxy, set a SOCKS proxy in
the Advanced settings tab.

Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.

Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.

FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.

Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.

Keystore Password The password for your keystore file.


This property is available only when the FTPS Support
check box is selected.

Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.

Use Perl5 Regex Expression as Filemask Select this check box to use Perl5 regular expressions in the
or Files field as file filters. This is useful when the name of
the file to be processed contains special characters such as
parentheses.
For more information about Perl5 regular expression syntax,
see Perl5 Regular Expression Syntax.

Files The names of the files/folders or the paths to the files/


folders to be deleted. You can specify multiple files/folders
in a line by using wildcards or a regular expression.

Target Type Select the type of the target to be deleted, either File or
Directory.

1222
tFTPDelete

Connection mode Select the connection mode from the list, either Passive or
Active.

Encoding Specify the encoding type by selecting an encoding type


from the list or selecting CUSTOM and enter the encoding
type manually.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any error and continue the Job
execution process.

Advanced settings

Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.

Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.

Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

NB_FILE The number of the files processed. This is an After variable


and it returns an integer.

CURRENT_STATUS The execution result of the component. This is a Flow


variable and it returns a string.

1223
tFTPDelete

Usage

Usage rule This component is typically used as a single-component


subJob but can also be used as an output or end object.

Related scenario
No scenario is available for this component yet.

1224
tFTPFileExist

tFTPFileExist
Checks if a file or a directory exists on an FTP server.

tFTPFileExist Standard properties


These properties are used to configure tFTPFileExist running in the Standard Job framework.
The Standard tFTPFileExist component belongs to the Internet family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Host The IP address or hostname of the FTP server.

Port The listening port number of the FTP server.

Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Remote directory The remote directory under which the file or the directory
will be checked.

Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.

Target Type Select the type of the target to be checked, either File or
Directory.

File Name The name of the file or the path to the file to be checked.

1225
tFTPFileExist

This property is available only when File is selected from


the Target Type list.

Directory Name The name of the directory or the path to the directory to be
checked.
This property is available only when Directory is
selected from the Target Type list.

SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.

Warning: This option does not work with an HTTP/


HTTPS proxy. If you need a proxy, set a SOCKS proxy in
the Advanced settings tab.

Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.

Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.

FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.

Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.

Keystore Password The password for your keystore file.


This property is available only when the FTPS Support
check box is selected.

Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.

Connection mode Select the connection mode from the list, either Passive or
Active.

1226
tFTPFileExist

Encoding Specify the encoding type by selecting an encoding type


from the list or selecting CUSTOM and enter the encoding
type manually.

Advanced settings

Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.

Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.

Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

EXISTS The result of whether a specified file/directory exists. This is


a Flow variable and it returns a boolean.

FILENAME The name of the file/directory processed. This is an After


variable and it returns a string.

Usage

Usage rule This component is typically used as a single-component


subJob but can also be used with other components.

Related scenario
No scenario is available for this component yet.

1227
tFTPFileList

tFTPFileList
Lists all files and folders directly under a specified directory based on a filemask pattern.

tFTPFileList Standard properties


These properties are used to configure tFTPFileList running in the Standard Job framework.
The Standard tFTPFileList component belongs to the Internet family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Host The IP address or hostname of the FTP server.

Port The listening port number of the FTP server.

Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Remote directory The remote directory where the files and folders to be listed
are located.

Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.

File detail Select this check box to list the details of each file/folder.
The informative details include the file/folder permissions,
the name of the author, the name of the group of users
that have read/write permissions, the file size, and the last
modification date.

1228
tFTPFileList

Files The names of the files/folders to be listed. You can specify


multiple files/folders in a line by using wildcards or a
regular expression.

SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.

Warning: This option does not work with an HTTP/


HTTPS proxy. If you need a proxy, set a SOCKS proxy in
the Advanced settings tab.

Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.

Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.

FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.

Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.

Keystore Password The password for your keystore file.


This property is available only when the FTPS Support
check box is selected.

Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.

Connection mode Select the connection mode from the list, either Passive or
Active.

Encoding Specify the encoding type by selecting an encoding type


from the list or selecting CUSTOM and enter the encoding
type manually.

1229
tFTPFileList

Advanced settings

Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.

Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.

Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

CURRENT_FILE The current file name. This is a Flow variable and it returns
a string.

CURRENT_FILEPATH The current file path. This is a Flow variable and it returns a
string.

NB_FILE The number of the files processed. This is an After variable


and it returns an integer.

Usage

Usage rule This component is typically used as a single-component


subJob but can also be used with other components.

Listing and getting files/folders on an FTP directory


Here is an example of using Talend FTP components to iterate and list all files and folders on an FTP
server directory, and then get only text files on that directory to a local directory.

1230
tFTPFileList

Creating a Job for listing and getting files/folders on an FTP directory


Create a Job to connect to an FTP server, iterate and list all files and folders on an FTP root directory,
then get only text files on the FTP root directory to a local directory, finally close the connection to
the server.

Before you begin


Prerequisites: To replicate this scenario, an FTP server must be started and a couple of files/folders
must be put onto the root directory of the FTP server.

1231
tFTPFileList

Procedure
1. Create a new Job and add a tFTPConnection component, a tFTPFileList component, a
tIterateToFlow component, a tLogRow component, a tFTPGet component, and a tFTPClose
component by typing their names in the design workspace or dropping them from the Palette.
2. Link the tFTPFileList component to the tIterateToFlow component using a Row > Iterate
connection.
3. Link the tIterateToFlow component to the tLogRow component using a Row > Main connection.
4. Link the tFTPConnection component to the tFTPFileList component using a Trigger > OnSubjobOk
connection.
5. Do the same to link the tFTPFileList component to the tFTPGet component, and the tFTPGet
component to the tFTPClose component.

Opening a connection to the FTP server


Configure the tFTPConnection component to open a connection to the FTP server.

Procedure
1. Double-click the tFTPConnection component to open its Basic settings view.
2. In the Host and Port fields, enter the FTP server IP address and the listening port number
respectively.
3. In the Username and Password fields, enter the authentication details.

Listing all files/folders on the FTP root directory


Configure the tFTPFileList component, the tIterateToFlow component, and the tLogRow component
to iterate all files and folders on the FTP root directory and display the names and paths of these files
and folders on the console of Talend Studio .

Procedure
1. Double-click the tFTPFileList component to open its Basic settings view.

2. Specify the connection details required to access the FTP server. In this example, select the Use
an existing connection check box and from the Component list drop-down list displayed, select
the connection component to reuse its connection details you have already defined.

1232
tFTPFileList

3. In the Remote directory field, specify the FTP server directory on which the files and folders will
be iterated. In this example, it is /, which means the root directory of the FTP server.
4. Clear the Move to the current directory check box.
5. Double-click the tIterateToFlow component to open its Basic settings view.

6. Click the button next to Edit schema to open the schema dialog box.

7. Click the button to add two String type columns filename and filepath that will hold the names
and paths of the files to be iterated respectively. When done, click OK to close the dialog box.
8. In the Mapping table, set the values for the filename and filepath columns. In this example, the
global variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILE")) for filename and the global
variable ((String)globalMap.get("tFTPFileList_1_CURRENT_FILEPATH")) for filepath.
Note that you can fill the values by pressing Ctrl + Space to access the global variables list and
then selecting tFTPFileList_1_CURRENT_FILE and tFTPFileList_1_CURRENT_FILEPATH from the list.
9. Double-click the tLogRow component to open its Basic settings view, and then select Table (print
values in cells of a table) in the Mode area for better readability of the result.

1233
tFTPFileList

Getting files on the FTP server directory to a local directory


Configure the tFTPGet component to get only the text files on the FTP root directory to a local
directory.

Procedure
1. Double-click the tFTPGet component to open its Basic settings view.

2. Specify the connection details required to access the FTP server. In this example, select the Use
an existing connection check box and from the Component list drop-down list displayed, select
the connection component to reuse its connection details you have already defined.
3. In the Local directory field, specify the local directory to which the files and folders will be
downloaded. In this example, it is D:/FtpDownloads.
4. In the Remote directory field, specify the FTP server directory under which the files and folders
will be downloaded. In this example, it is /, which means the root directory of the FTP server.
5. In the Files table, click the [+] button to add a line and in the Filemask column field, enter *.txt
between double quotation marks to get only the text files on the FTP directory to the local
directory.

Closing the connection to the FTP server


Configure the tFTPClose component to close the connection to the FTP server.

Procedure
1. Double-click the tFTPClose component to open its Basic settings view.

1234
tFTPFileList

2. From the Component list drop-down list, select the tFTPConnection component that opens the
connection you need to close. In this example, only one tFTPConnection component is used and it
is selected by default.

Executing the Job to list and get files/folders on the FTP directory
After setting up the Job and configuring the components used in the Job for listing and getting files/
folders on the FTP directory, you can then execute the Job and verify the Job execution result.

Procedure
1. Press Ctrl + S to save the Job.
2. Press F6 to execute the Job.

As shown above, the names and paths of the files and folders on the FTP server root directory are
displayed on the console, and only the text files are downloaded to the specified local directory.

1235
tFTPFileProperties

tFTPFileProperties
Retrieves the properties of a specified file on an FTP server.

tFTPFileProperties Standard properties


These properties are used to configure tFTPFileProperties running in the Standard Job framework.
The Standard tFTPFileProperties component belongs to the Internet family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. It describes the
main properties of the specified file. You can click the [...]
button next to Edit schema to view the predefined schema
which contains the following fields:
• abs_path: the absolute path of the file.
• dirname: the directory of the file.
• basename: the name of the file.
• size: the file size in bytes.
• mtime: the timestamp indicating when the file was last
modified, in milliseconds that have elapsed since the
Unix epoch (00:00:00 UTC, Jan 1, 1970).
• mtime_string: the date and time the file was last
modified.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Host The IP address or hostname of the FTP server.

Port The listening port number of the FTP server.

Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

1236
tFTPFileProperties

Remote directory The path to the directory where the file is available.

File The name of the file or the path to the file whose properties
will be retrieved.

Transfer mode Select the transfer mode from the list, either asciibinary.

SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.

Warning: This option does not work with an HTTP/


HTTPS proxy. If you need a proxy, set a SOCKS proxy in
the Advanced settings tab.

Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.

Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.

FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.

Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.

Keystore Password The password for your keystore file.


This property is available only when the FTPS Support
check box is selected.

Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.

Connection mode Select the connection mode from the list, either Passive or
Active.

Encoding Specify the encoding type by selecting an encoding type


from the list or selecting CUSTOM and enter the encoding
type manually.

1237
tFTPFileProperties

Calculate MD5 Hash Select this check box to check the file's MD5.

Advanced settings

Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.

Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.

Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component can be used as standalone component.

Related scenario
Displaying the properties of a processed file on page 1159

1238
tFTPGet

tFTPGet
Downloads files to a local directory from an FTP directory.

tFTPGet Standard properties


These properties are used to configure tFTPGet running in the Standard Job framework.
The Standard tFTPGet component belongs to the Internet family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Host The IP address or hostname of the FTP server.

Port The listening port number of the FTP server.

Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Local directory The local directory in which downloaded files will be saved.

Remote directory The FTP directory from which files will be downloaded.

Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.

Transfer mode Select the transfer mode from the list, either asciibinary.

Overwrite file Select the action to be performed when the file already
exists.
• never: Never overwrite the file.

1239
tFTPGet

• always: Always overwrite the file.


• size different or: Overwrite the file when the file size is
different.
• overwrite: Overwrite the existing file.
• resume: Resume downloading the file from the point of
interruption.
• append: Add data to the end of the file without
overwriting data.
overwrite, resume, and append are available when the SFTP
Support check box is selected.

Append Select this check box to append data at the end of the file in
order to avoid overwriting data.

SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.

Warning: This option does not work with an HTTP/


HTTPS proxy. If you need a proxy, set a SOCKS proxy in
the Advanced settings tab.

Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.

Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.

FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.

Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.

Keystore Password The password for your keystore file.


This property is available only when the FTPS Support
check box is selected.

Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.

1240
tFTPGet

Use Perl5 Regex Expression as Filemask Select this check box to use Perl5 regular expressions in the
or Files field as file filters. This is useful when the name of
the file to be processed contains special characters such as
parentheses.
For more information about Perl5 regular expression syntax,
see Perl5 Regular Expression Syntax.

Files The names of the files or the paths to the files to be


downloaded. You can specify multiple files in a line by using
wildcards or a regular expression.

Connection mode Select the connection mode from the list, either Passive or
Active.

Encoding Specify the encoding type by selecting an encoding type


from the list or selecting CUSTOM and enter the encoding
type manually.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any error and continue the Job
execution process.

Advanced settings

Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.

Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.

Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.

Print message Select this check box to display the list of files downloaded
on the console.

Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

1241
tFTPGet

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

NB_FILE The number of the files processed. This is an After variable


and it returns an integer.

CURRENT_STATUS The execution result of the component. This is a Flow


variable and it returns a string.

TRANSFER_MESSAGES The file transferred information. This is an After variable


and it returns a string.

Usage

Usage rule This component is typically used as a single-component


subJob but can also be used as output or end object.

Related scenario
Listing and getting files/folders on an FTP directory on page 1230

1242
tFTPPut

tFTPPut
Uploads files from a local directory to an FTP directory.

tFTPPut Standard properties


These properties are used to configure tFTPPut running in the Standard Job framework.
The Standard tFTPPut component belongs to the Internet family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Host The IP address or hostname of the FTP server.

Port The listening port number of the FTP server.

Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Local directory The local directory from which the files will be uploaded to
the FTP server.

Remote directory The FTP directory where the uploaded files will be placed.

Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.

Transfer mode Select the transfer mode from the list, either asciibinary.

Overwrite file Select the action to be performed when the file already
exists.

1243
tFTPPut

• never: Never overwrite the file.


• always: Always overwrite the file.
• size different or: Overwrite the file when the file size is
different.
• overwrite: Overwrite the existing file.
• resume: Resume downloading the file from the point of
interruption.
• append: Add data to the end of the file without
overwriting data.
overwrite, resume, and append are available when the SFTP
Support check box is selected.

Append Select this check box to append data at the end of the file in
order to avoid overwriting data.

SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.

Warning: This option does not work with an HTTP/


HTTPS proxy. If you need a proxy, set a SOCKS proxy in
the Advanced settings tab.

Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.

Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.

FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.

Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.

Keystore Password The password for your keystore file.


This property is available only when the FTPS Support
check box is selected.

Security Mode Select the security mode from the list, either Implicit or
Explicit.

1244
tFTPPut

This property is available only when the FTPS Support


check box is selected.

Use Perl5 Regex Expression as Filemask Select this check box to use Perl5 regular expressions in the
or Files field as file filters. This is useful when the name of
the file to be processed contains special characters such as
parentheses.
For more information about Perl5 regular expression syntax,
see Perl5 Regular Expression Syntax.

Use Perl5 Regex Expression as Filemask Select this check box to use Perl5 regular expressions in the
or Files field as file filters. This is useful when the name of
the file to be processed contains special characters such as
parentheses.
For more information about Perl5 regular expression syntax,
see Perl5 Regular Expression Syntax.

Files Specify the files to be uploaded.


• Filemask: the file names or the path to the files to be
uploaded.
• New name: the name to give the file after the transfer.

Connection mode Select the connection mode from the list, either Passive or
Active.

Encoding Specify the encoding type by selecting an encoding type


from the list or selecting CUSTOM and enter the encoding
type manually.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any error and continue the Job
execution process.

Advanced settings

Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.

Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.

Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.

Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.

1245
tFTPPut

This property is available only when the FTPS Support


check box is selected.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

NB_FILE The number of the files processed. This is an After variable


and it returns an integer.

CURRENT_STATUS The execution result of the component. This is a Flow


variable and it returns a string.

CURRENT_FILE_EXISTS The result of whether the current file exists. This is a Flow
variable and it returns a boolean.

TRANSFER_MESSAGES The file transferred information. This is an After variable


and it returns a string.

Usage

Usage rule This component is typically used as a single-component


subJob but can also be used as output component.

Putting files onto an FTP server


Here is an example of using Talend FTP components to put several files in a local directory onto an
FTP server.

1246
tFTPPut

Creating a Job for putting files onto an FTP server


Create a Job to connect to an FTP server, then put several local files onto the server, finally close the
connection to the server.

Procedure
1. Create a new Job and add a tFTPConnection component, a tFTPPut component, and a tFTPClose
component by typing their names in the design workspace or dropping them from the Palette.
2. Link the tFTPConnection component to the tFTPPut component using a Trigger > OnSubjobOk
connection.
3. Link the tFTPPut component to the tFTPClose component using a Trigger > OnSubjobOk
connection.

Opening a connection to the FTP server


Configure the tFTPConnection component to open a connection to the FTP server.

Procedure
1. Double-click the tFTPConnection component to open its Basic settings view.
2. In the Host and Port fields, enter the FTP server IP address and the listening port number
respectively.
3. In the Username and Password fields, enter the authentication details.
4. From the Connection Mode drop-down list, select the FTP connection mode you want to use,
Active in this example.

Putting files onto the FTP server


Configure the tFTPPut component to put several local files onto the FTP server root directory.

Procedure
1. Double-click the tFTPPut component to open its Basic settings view.

1247
tFTPPut

2. Specify the connection details required to access the FTP server. In this example, select the Use
an existing connection check box and from the Component list drop-down list displayed, select
the connection component to reuse its connection details you have already defined.
3. In the Local directory field, specify the local directory that contains the files to be put onto the
FTP server. In this example, it is D:/components.
4. In the Remote directory field, specify the FTP server directory onto which the files will be put. In
this example, it is /, which means the root directory of the FTP server.
5. Clear the Move to the current directory check box.
6. In the Files table, click twice the [+] button to add two lines, and in the two Filemask column
fields, enter *.txt and *.png respectively, which means only the text and png files in the specified
local directory will be put onto the FTP server root directory.

Closing the connection to the FTP server


Configure the tFTPClose component to close the connection to the FTP server.

Procedure
1. Double-click the tFTPClose component to open its Basic settings view.

2. From the Component list drop-down list, select the tFTPConnection component that opens the
connection you need to close. In this example, only one tFTPConnection component is used and it
is selected by default.

1248
tFTPPut

Executing the Job to put files on the FTP server


After setting up the Job and configuring the components used in the Job for putting files onto the FTP
server, you can then execute the Job and verify the Job execution result.

Procedure
1. Press Ctrl + S to save the Job and then F6 to execute the Job.
2. Connect to the FTP server to verify the result.

As shown above, only the text and png files in the local directory are put onto the FTP server.

1249
tFTPRename

tFTPRename
Renames files in an FTP directory.

tFTPRename Standard properties


These properties are used to configure tFTPRename running in the Standard Job framework.
The Standard tFTPRename component belongs to the Internet family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Host The IP address or hostname of the FTP server.

Port The listening port number of the FTP server.

Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Remote directory The path to the FTP directory where the files to be renamed
are available.

Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.

Overwrite file Select the action to be performed when the file already
exists.
• never: Never overwrite the file.
• always: Always overwrite the file.

1250
tFTPRename

• size different or: Overwrite the file when the file size is
different.

SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.

Warning: This option does not work with an HTTP/


HTTPS proxy. If you need a proxy, set a SOCKS proxy in
the Advanced settings tab.

Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.

Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.

FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.

Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.

Keystore Password The password for your keystore file.


This property is available only when the FTPS Support
check box is selected.

Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.

Files Specify the files to be renamed and their new names.


• Filemask: specify the file to be renamed by entering
the filename or filemask using wildcharacters or
regular expressions.
• New name: enter the new name of the file.

Connection mode Select the connection mode from the list, either Passive or
Active.

1251
tFTPRename

Encoding Specify the encoding type by selecting an encoding type


from the list or selecting CUSTOM and enter the encoding
type manually.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any error and continue the Job
execution process.

Advanced settings

Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.

Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.

Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

NB_FILE The number of the files processed. This is an After variable


and it returns an integer.

CURRENT_STATUS The execution result of the component. This is a Flow


variable and it returns a string.

Usage

Usage rule This component is generally used as a subJob with one


component, but it can also be used as an output or end
component.

1252
tFTPRename

Renaming a file located on an FTP server


Here is an example of using Talend FTP components to rename a file located on an FTP server.

Creating a Job for renaming a file on an FTP server


Create a Job to connect to an FTP server, then rename a file on the server, finally close the connection
to the server.

Before you begin


Prerequisites: To replicate this scenario, an FTP server must be started and a file must be put onto
the server. In this example, the file movies.json has been put into the folder movies under the root
directory of the FTP server.

Procedure
1. Create a new Job and add a tFTPConnection component, a tFTPRename component, and a
tFTPClose component by typing their names in the design workspace or dropping them from the
Palette.
2. Link the tFTPConnection component to the tFTPRename component using a Trigger >
OnSubjobOk connection.
3. Link the tFTPRename component to the tFTPClose component using a Trigger > OnSubjobOk
connection.

1253
tFTPRename

Opening a connection to the FTP server


Configure the tFTPConnection component to open a connection to the FTP server.

Procedure
1. Double-click the tFTPConnection component to open its Basic settings view.
2. In the Host and Port fields, enter the FTP server IP address and the listening port number
respectively.
3. In the Username and Password fields, enter the authentication details.

Renaming the file on the FTP server


Configure the tFTPRename component to rename the file on the FTP server.

Procedure
1. Double-click the tFTPRename component to open its Basic settings view.

2. Specify the connection details required to access the FTP server. In this example, select the Use
an existing connection check box and from the Component list drop-down list displayed, select
the connection component to reuse its connection details you have already defined.
3. In the Remote directory field, enter the directory on the FTP server where the file to be renamed
exists. In this example, it is /movies.
4. Clear the Move to the current directory check box.
5. In the Files table, click the [+] button to add a line, and then enter the existing file name in the
Filemask column field and the new file name in the New name column field. In this example, they
are movies.json and action_movies.json respectively.

Closing the connection to the FTP server


Configure the tFTPClose component to close the connection to the FTP server.

Procedure
1. Double-click the tFTPClose component to open its Basic settings view.

1254
tFTPRename

2. From the Component list drop-down list, select the tFTPConnection component that opens the
connection you need to close. In this example, only one tFTPConnection component is used and it
is selected by default.

Executing the Job to rename the file on the FTP server


After setting up the Job and configuring the components used in the Job for renaming the file on the
FTP server, you can then execute the Job and verify the Job execution result.

Procedure
1. Press Ctrl + S to save the Job and then F6 to execute the Job.
2. Connect to the FTP server to verify the result.

As shown above, the file on the FTP server has been renamed from movies.json to action_movies.
json.

1255
tFTPTruncate

tFTPTruncate
Truncates files in an FTP directory.

tFTPTruncate Standard properties


These properties are used to configure tFTPTruncate running in the Standard Job framework.
The Standard tFTPTruncate component belongs to the Internet family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Host The IP address or hostname of the FTP server.

Port The listening port number of the FTP server.

Username and Password The user authentication data to access the FTP server.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Remote directory The path to the FTP directory in which the files will be
truncated.

Move to the current directory Select this check box to change the directory to the one
specified in the Remote directory field. The next FTP
component in the Job will take this directory as the root of
the remote directory when using the same connection.
This property is available only when the Use an existing
connection check box is selected.

SFTP Support Select this check box to connect to the FTP server via an
SFTP connection.

1256
tFTPTruncate

Warning: This option does not work with an HTTP/


HTTPS proxy. If you need a proxy, set a SOCKS proxy in
the Advanced settings tab.

Authentication method Select the SFTP authentication method, either Public key or
Password.
• Public key: Enter the path to the private key and the
passphrase for the key in the Private key and Key
Passphrase fields correspondingly.
• Password: Enter the password required.
This property is available only when the SFTP Support
check box is selected.

Filename encoding Select this check box to set the encoding used to convert
file names from Strings to bytes. It should be the same
encoding used on the SFTP server. If the SFTP server's
version is greater than 3, the encoding should be UTF-8, or
else an error occurs.
This property is available only when the SFTP Support
check box is selected.

FTPS Support Select this check box to connect to the FTP server via an
FTPS connection.
If you are using an HTTP proxy, via a tSetProxy component
for example, you need to select this check box and set the
connection mode to Passive.

Keystore File The path to your keystore file, a password protected file
containing several keys and certificates.
This property is available only when the FTPS Support
check box is selected.

Keystore Password The password for your keystore file.


This property is available only when the FTPS Support
check box is selected.

Security Mode Select the security mode from the list, either Implicit or
Explicit.
This property is available only when the FTPS Support
check box is selected.

Use Perl5 Regex Expression as Filemask Select this check box to use Perl5 regular expressions in the
or Files field as file filters. This is useful when the name of
the file to be processed contains special characters such as
parentheses.
For more information about Perl5 regular expression syntax,
see Perl5 Regular Expression Syntax.

Files The names of the files or the paths to the files to be


truncated. You can specify multiple files in a line by using
wildcards or a regular expression.

Connection mode Select the connection mode from the list, either Passive or
Active.

1257
tFTPTruncate

Encoding Specify the encoding type by selecting an encoding type


from the list or selecting CUSTOM and enter the encoding
type manually.

Advanced settings

Use Socks Proxy Select this check box if you are using a proxy, and
in the Proxy host, Proxy port, Proxy user and Proxy
password fields displayed, specify the proxy server settings
information.

Ignore Failure At Quit (FTP) Select this check box to ignore library closing errors or FTP
closing errors.

Data Channel Protection Level The data channel protection level with which data is
transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Protection Buffer Size The maximum size (in bytes) of the encoded data blocks to
be transferred between the client and the server. For more
information, see RFC 2228: FTP Security Extensions.
This property is available only when the FTPS Support
check box is selected.

Connection timeout Specify the timeout value (in ms) for the connection. A value
of 0 or any negative values will be ignored. In this case, the
default value (that is, 60000ms) will be used.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

NB_FILE The number of the files processed. This is an After variable


and it returns an integer.

CURRENT_STATUS The execution result of the component. This is a Flow


variable and it returns a string.

Usage

Usage rule This component is typically used as a single-component


subJob but can also be used with other components.

Related scenario
No scenario is available for this component yet.

1258
tFuzzyMatch

tFuzzyMatch
Compares a column from the main flow with a reference column from the lookup flow and outputs
the main flow data displaying the distance.

tFuzzyMatch Standard properties


These properties are used to configure tFuzzyMatch running in the Standard Job framework.
The Standard tFuzzyMatch component belongs to the Data Quality family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored remotely
in the Repository.
Two read-only columns, Value and Match are added to the
output schema automatically.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and job
designs. Related topic: see Talend Studio User Guide.

Matching type Select the relevant matching algorithm among:


Levenshtein: Based on the edit distance theory. It calculates
the number of insertion, deletion or substitution required
for an entry to match the reference entry.
Metaphone: Based on a phonetic algorithm for indexing
entries by their pronunciation. It first loads the phonetics of
all entries of the lookup reference and checks all entries of
the main flow against the entries of the reference flow. It
does not support Chinese characters.
Double Metaphone: a new version of the Metaphone
phonetic algorithm, that produces more accurate results
than the original algorithm. It can return both a primary
and a secondary code for a string. This accounts for
some ambiguous cases as well as for multiple variants
of surnames with common ancestry. It does not support
Chinese characters.

Min distance (Levenshtein only) Set the minimum number of changes


allowed to match the reference. If set to 0, only perfect
matches are returned.

Max distance (Levenshtein only) Set the maximum number of changes


allowed to match the reference.

1259
tFuzzyMatch

Matching column Select the column of the main flow that needs to be
checked against the reference (lookup) key column

Unique matching Select this check box if you want to get the best match
possible, in case several matches are available.

Matching item separator In case several matches are available, all of them are
displayed unless the unique match box is selected. Define
the delimiter between all matches.

Advanced settings

tStatCatcher Select this check box to collect log data at the component level.
Statistics

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is not startable (green background) and it


requires two input components and an output component.

Checking the Levenshtein distance of 0 in first names


This scenario describes a four-component Job aiming at checking the edit distance between the First
Name column of an input file with the data of the reference input file. The output of this Levenshtein
type check is displayed along with the content of the main flow on a table

1260
tFuzzyMatch

Setting up the Job


Procedure
1. Drag and drop the following components from the Palette to the design workspace: tFileInputDeli
mited (x2), tFuzzyMatch, tLogRow.
2. Link the first tFileInputDelimited component to the tFuzzyMatch component using a Row > Main
connection.
3. Link the second tFileInputDelimited component to the tFuzzyMatch using a Row > Main
connection (which appears as a Lookup row on the design workspace).
4. Link the tFuzzyMatch component to the standard output tLogRow using a Row > Main connection.

Configuring the components


Procedure
1. Define the first tFileInputDelimited in its Basic settings view. Browse the system to the input file
to be analyzed.
2. Define the schema of the component. In this example, the input schema has two columns,
firstname and gender.
3. Define the second tFileInputDelimited component the same way.

Warning:
Make sure the reference column is set as key column in the schema of the lookup flow.

1261
tFuzzyMatch

4. Double-click the tFuzzyMatch component to open its Basic settings view, and check its schema.
The Schema should match the Main input flow schema in order for the main flow to be checked
against the reference.

Note that two columns, Value and Matching, are added to the output schema. These are standard
matching information and are read-only.
5. Select the method to be used to check the incoming data. In this scenario, Levenshtein is the
Matching type to be used.
6. Then set the distance. In this method, the distance is the number of char changes (insertion,
deletion or substitution) that needs to be carried out in order for the entry to fully match the
reference.

In this use case, we set both the minimum distance and the maximum distance to 0. This means
only the exact matches will be output.
7. Also, clear the Case sensitive check box.
8. Check that the matching column and look up column are correctly selected.
9. Leave the other parameters as default.

Executing the Job


Procedure
Save the Job and press F6 to execute the Job.

1262
tFuzzyMatch

Results
As the edit distance has been set to 0 (min and max), the output shows the result of a regular join
between the main flow and the lookup (reference) flow, hence only full matches with Value of 0 are
displayed.
A more obvious example is with a minimum distance of 1 and a maximum distance of 2, see
Procedure on page 1263

Checking the Levenshtein distance of 1 or 2 in first names


This scenario is based on the scenario described above. Only the minimum and maximum distance
settings in the tFuzzyMatch component are modified, which will change the output displayed.

Procedure
Procedure
1. In the Component view of the tFuzzyMatch, change the minimum distance from 0 to 1. This
excludes straight away the exact matches (which would show a distance of 0).
2. Change also the maximum distance to 2. The output will provide all matching entries showing a
discrepancy of 2 characters at most.

No other changes are required.

1263
tFuzzyMatch

3. Make sure the Matching item separator is defined, as several references might be matching the
main flow entry.
4. Save the new Job and press F6 to run it.

As the edit distance has been set to 2, some entries of the main flow match more than one
reference entry.

Results
You can also use another method, the metaphone, to assess the distance between the main flow and
the reference, which will be described in the next scenario.

Checking the Metaphonic distance in first name


This scenario is based on the scenario described above.

Procedure
Procedure
1. Change the Matching type to Metaphone. There is no minimum nor maximum distance to set as
the matching method is based on the discrepancies with the phonetics of the reference.

2. Save the Job and press F6. The phonetics value is displayed along with the possible matches.

1264
tFuzzyMatch

1265
tGoogleDataprocManage

tGoogleDataprocManage
Creates or deletes a Dataproc cluster in the Global region on Google Cloud Platform.

tGoogleDataprocManage Standard properties


These properties are used to configure tGoogleDataprocManage running in the Standard Job
framework.
The Standard tGoogleDataprocManage component belongs to the Cloud family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Project identifier Enter the ID of your Google Cloud Platform project.


If you are not certain about your project ID, check it in the
Manage Resources page of your Google Cloud Platform
services.

Cluster identifier Enter the ID of your Dataproc cluster to be used.

Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or visit
Google Cloud Platform Auth Guide.

Action Select the action you want tGoogleDataprocManage to


perform on the your cluster:
• Start to create a cluster
• Stop to destroy a cluster

Version Select the version of the image to be used to create a


Dataproc cluster.

Region From this drop-down list, select the Google Cloud region to
be used.

Zone Select the geographic zone in which the computing


resources are used and your data is stored and processed.
The available zones vary depending on the region you have
selected from the Regional drop-down list.
A zone in terms of Google Cloud is an isolated location
within a region, another geographical term employed by
Google Cloud.

Instance configuration Enter the parameters to determine how many masters and
workers to be used by the Dataproc cluster to be created
and the performance of these masters and workers.

1266
tGoogleDataprocManage

Advanced settings

Wait for cluster Select this check box to keep this component running until the cluster is completely set up.
ready
When you clear this check box, this component stops running immediately after sending the
command to create.

Master disk size Enter a number without quotation marks to determine the size of the disk of each master instance.

Master local SSD Enter a number without quotation marks to determine the number of local solid-state drive (SSD)
storage devices to be added to each master instance.
According to Google, these local SSDs are suitable only for temporary storage such as caches,
processing space or low value data. It is recommended to store important data to durable storage
options of Google. For further information about the Google storage options, see Durable storage
options.

Worker disk size Enter a number without quotation marks to determine the size of the disk of each worker instance.

Worker local SSD Enter a number without quotation marks to determine the number of local solid-state drive (SSD)
storage devices to be added to each worker instance.
According to Google, these local SSDs are suitable only for temporary storage such as caches,
processing space or low value data. It is recommended to store important data to durable storage
options of Google. For further information about the Google storage options, see Durable storage
options.

Network or Select either check box to use a Google Compute Engine network or subnetwork for the cluster to be
Subnetwork created to enable intra-cluster communications.
As Google does not allow network and subnetwork to be used concurrently, selecting one check box
hides the other check box.
For further information about Google Dataproc cluster network configuration, see Dataproc Network.

Initialization action In this table, select the initialization actions that are available in the shared bucket on Google Cloud
Storage to run on all the nodes in your Dataproc cluster immediately after this cluster is set up.
If you need to use custom initialization scripts, upload them to this shared Google bucket so that
tGoogleDataprocManage can read them.
• In the Executable file column, enter the Google Cloud Storage URI to these scripts to be used,
for example gs://dataproc-initialization-actions/MyScript
• In the Executable timeout column, enter the amount of time within double quotation marks
to determine the duration of the execution. If the executable is not completed at the end of
this timeout, an explanatory error message is returned. The value is a string with up to nine
fractional digits, for example, "3.5s" for 3.5 seconds.
For further information about this shared bucket and the initialization actions, see Initialization
actions.

tStatCatcher Select this check box to collect log data at the component level.
Statistics

Usage

Usage rule This component is used standalone in a subJob.

1267
tGoogleDriveConnection

tGoogleDriveConnection
Opens a Google Drive connection that can be reused by other Google Drive components.

tGoogleDriveConnection Standard properties


These properties are used to configure tGoogleDriveConnection running in the Standard Job
framework.
The Standard tGoogleDriveConnection component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Application Name The application name required by Google Drive to get


access to its APIs.

OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.

Access Token The access token generated through Google Developers


OAuth 2.0 Playground.
This property is available only when Access Token is
selected from the OAuth Method drop-down list.

1268
tGoogleDriveConnection

Client ID and Client Secret The client ID and client secret.


These two properties are available only when Installed
Application (Id & Secret) is selected from the
OAuth Method drop-down list.

Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.

Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.

Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.

Advanced settings

DataStore Path The path to the credential file that stores the refresh token.

Note: When your client ID, client secret, or any other


configuration related to the Installed Application
authentication changes, you need to delete this
credential file manually before running your Job again.

This property is available only when Installed


Application (Id & Secret) or Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

1269
tGoogleDriveConnection

Usage

Usage rule This component is more commonly used with other Google
Drive components. In a Job design, it is usually used to
open a Google Drive connection that can be reused by other
Google Drive components.

OAuth methods for accessing Google Drive


Talend provides the following four OAuth methods to access Google Drive using Google Drive
components and metadata wizard.
• Installed Application (Id & Secret)
• Installed Application (JSON)
• Service Account
• Access Token (deprecated)

How to access Google Drive using client ID and secret


To use client ID and client secret to access Google Drive, you need to first generate the client ID and
client secret by completing the following steps using Google Chrome.

Before you begin


A Google account has already been signed up for using Google Drive.

Procedure
1. Go to Google API Console and select an existing project or create a new one. In this example, we
create a new project TalendProject.

2. Go to the Library page and in the right panel, find Google Drive API and enable the Google Drive
API that allows you to access resources from Google Drive.

1270
tGoogleDriveConnection

3. Go to the Credentials page, click OAuth consent screen in the right panel and set a product name
in the Product name shown to users field. In this example, it is TalendProduct. When done,
click Save.

1271
tGoogleDriveConnection

4. Click Create credentials > OAuth client ID, and in the Create client ID page, create a new client ID
TalendApplication with Application type set to Other.

1272
tGoogleDriveConnection

5. Click Create. You will be shown your client ID and client secret that can be used by Google Drive
components and metadata wizard to access Google Drive using the OAuth method Installed
Application (Id & Secret).

How to access Google Drive using a client secret JSON file


To use a client secret JSON file to access Google Drive, you need to first download the client secret
JSON file from Google API Console by completing the following steps using Google Chrome.

1273
tGoogleDriveConnection

Before you begin


The client ID and client secret have been created in Google API Console. For more information, see
How to access Google Drive using client ID and secret on page 1270.

Procedure
1. Go to Google API Console.
2. Go to the Credentials page.
3. Click the Download JSON button to download the client secret JSON file and securely store it in a
local folder. This JSON file can then be used by Google Drive components and metadata wizard to
access Google Drive via the OAuth method Installed Application (JSON).

How to access Google Drive using a service account JSON file


To use a service account JSON file to access Google Drive, you need to first create a service account in
Google API Console, then download the service account JSON file by completing the following steps
using Google Chrome.

Before you begin


1. A Google account has already been signed up for using Google Drive.
2. In Google API Console, your project has been created, the Google Drive API has been enabled, and
the product name has been set. For more information about how to make these configuration, see
How to access Google Drive using client ID and secret on page 1270.

Procedure
1. Go to Google API Console.
2. Open the Service accounts page. If prompted, select your project.

1274
tGoogleDriveConnection

3. Click CREATE SERVICE ACCOUNT.


4. In the Create service account window, type a name for the service account, select Furnish a new
private key and then the key type JSON.

1275
tGoogleDriveConnection

5. Click Create. In the pop-up window, choose a folder and click Save to store your service account
JSON file securely. This JSON file can then be used by Google Drive components and metadata
wizard to access Google Drive via the OAuth method Service Account.

How to access Google Drive using an access token (deprecated)


To use an access token to access Google Drive, you need to first generate the access token by
completing the following steps using Google Developers OAuth Playground.

Before you begin


1. A Google account has already been signed up for using Google Drive.
2. The client ID and client secret have been created in Google API Console. For more information,
see How to access Google Drive using client ID and secret on page 1270.

Procedure
1. Go to Google Developers OAuth Playground.
2. Click OAuth 2.0 Configuration and select Use your own OAuth credentials check box, enter the
OAuth client ID and client secret you have already created in the OAuth Client ID and OAuth
Client secret fields respectively.

1276
tGoogleDriveConnection

3. In OAuth 2.0 Playground Step 1, select the scope https://www.googleapis.com/auth/


drive under Drive API v3 for the Google Drive API and click Authorize APIs, then click Allow to
generate the authorization code.

1277
tGoogleDriveConnection

4. In OAuth 2.0 Playground Step 2, click Exchange authorization code for tokens to generate the
OAuth access token.

The OAuth access token is displayed on the right panel as shown in below figure. It can be used
by Google Drive components and metadata wizard to access Google Drive via the OAuth method
Access Token.

1278
tGoogleDriveConnection

Note that the access token expires in every 3600 seconds. You can click Refresh access token in
OAuth 2.0 Playground Step 2 to refresh it.

Related scenario
Managing files with Google Drive on page 1297

1279
tGoogleDriveCopy

tGoogleDriveCopy
Creates a copy of a file/folder in Google Drive.

tGoogleDriveCopy Standard properties


These properties are used to configure tGoogleDriveCopy running in the Standard Job framework.
The Standard tGoogleDriveCopy component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component that opens the database connection
to be reused by this component.

Application Name The application name required by Google Drive to get


access to its APIs.

OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.

1280
tGoogleDriveCopy

Access Token The access token generated through Google Developers


OAuth 2.0 Playground.
This property is available only when Access Token is
selected from the OAuth Method drop-down list.

Client ID and Client Secret The client ID and client secret.


These two properties are available only when Installed
Application (Id & Secret) is selected from the
OAuth Method drop-down list.

Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.

Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.

Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.

Copy Mode Select the type of the item to be copied.


• File: Select this option when you need to copy a file.
• Folder: Select this option when you need to copy a
folder.

Source The name or ID of the source file/folder to be copied.

Source Access Mode Select the method by which the source file/folder is
specified, either by Name or by Id.

Destination Folder Name The name or ID of the destination folder in which the copy
of the source file/folder will be saved.

Destination Access Mode Select the method by which the destination folder is
specified, either by Name or by Id.

Rename (set new title) Select this check box to rename the copy of the file/folder
in the destination folder. In the Destination Name field
displayed, enter the name for the file/folder after being
copied to the destination folder.

1281
tGoogleDriveCopy

Remove Source File Select this check box to remove the source file after it is
copied to the destination folder.
This property is available only when File is selected from
the Copy Mode drop-down list.

Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. You can click
the [...] button next to Edit schema to view the predefined
schema which contains the following fields:
• sourceId: The ID of the source file/folder.
• destinationId: The ID of the destination file/folder.

Advanced settings

DataStore Path The path to the credential file that stores the refresh token.

Note: When your client ID, client secret, or any other


configuration related to the Installed Application
authentication changes, you need to delete this
credential file manually before running your Job again.

This property is available only when Installed


Application (Id & Secret) or Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

SOURCE_ID The ID of the source file/folder. This is an After variable and


it returns a string.

DESTINATION_ID The ID of the destination file/folder. This is an After variable


and it returns a string.

Usage

Usage rule This component can be used as a standalone component or


as a start component of a Job or subJob.

Related scenario
Managing files with Google Drive on page 1297

1282
tGoogleDriveCreate

tGoogleDriveCreate
Creates a new folder in Google Drive.

tGoogleDriveCreate Standard properties


These properties are used to configure tGoogleDriveCreate running in the Standard Job framework.
The Standard tGoogleDriveCreate component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component that opens the database connection
to be reused by this component.

Application Name The application name required by Google Drive to get


access to its APIs.

OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.

1283
tGoogleDriveCreate

Access Token The access token generated through Google Developers


OAuth 2.0 Playground.
This property is available only when Access Token is
selected from the OAuth Method drop-down list.

Client ID and Client Secret The client ID and client secret.


These two properties are available only when Installed
Application (Id & Secret) is selected from the
OAuth Method drop-down list.

Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.

Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.

Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.

Parent Folder The name or ID of the parent folder in which a new folder
will be created.

Access Method Select the method by which the parent folder is specified,
either by Name or by Id.

New Folder Name The name of the new folder to be created.

Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. You can click
the [...] button next to Edit schema to view the predefined
schema which contains the following fields:
• parentFolderId: the ID of the parent folder.
• newFolderId: the ID of the new folder.

Advanced settings

DataStore Path The path to the credential file that stores the refresh token.

1284
tGoogleDriveCreate

Note: When your client ID, client secret, or any other


configuration related to the Installed Application
authentication changes, you need to delete this
credential file manually before running your Job again.

This property is available only when Installed


Application (Id & Secret) or Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

PARENT_FOLDER_ID The ID of the parent folder. This is an After variable and it


returns a string.

NEW_FOLDER_ID The ID of the new folder. This is an After variable and it


returns a string.

Usage

Usage rule This component can be used as a standalone component or


as a start component of a Job or subJob.

Related scenario
Managing files with Google Drive on page 1297

1285
tGoogleDriveDelete

tGoogleDriveDelete
Deletes a file/folder in Google Drive.

tGoogleDriveDelete Standard properties


These properties are used to configure tGoogleDriveDelete running in the Standard Job framework.
The Standard tGoogleDriveDelete component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component that opens the database connection
to be reused by this component.

Application Name The application name required by Google Drive to get


access to its APIs.

OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.

1286
tGoogleDriveDelete

Access Token The access token generated through Google Developers


OAuth 2.0 Playground.
This property is available only when Access Token is
selected from the OAuth Method drop-down list.

Client ID and Client Secret The client ID and client secret.


These two properties are available only when Installed
Application (Id & Secret) is selected from the
OAuth Method drop-down list.

Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.

Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.

Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.

File/Folder The name or ID of the file/folder to be deleted.

Delete Mode Select the method by which the file/folder to be deleted is


specified, either by Name or by Id.

Use Trash Select this check box to move the file/folder to be deleted
to the trash.

Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. You can click
the [...] button next to Edit schema to view the predefined
schema with only one field named fileId which describes
the ID of the file/folder.

Advanced settings

DataStore Path The path to the credential file that stores the refresh token.

1287
tGoogleDriveDelete

Note: When your client ID, client secret, or any other


configuration related to the Installed Application
authentication changes, you need to delete this
credential file manually before running your Job again.

This property is available only when Installed


Application (Id & Secret) or Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

FILE_ID The ID of the file/folder. This is an After variable and it


returns a string.

Usage

Usage rule This component can be used as a standalone component or


as a start component of a Job or subJob.

Related scenario
No scenario is available for this component yet.

1288
tGoogleDriveGet

tGoogleDriveGet
Gets a file's content and downloads the file to a local directory.

tGoogleDriveGet Standard properties


These properties are used to configure tGoogleDriveGet running in the Standard Job framework.
The Standard tGoogleDriveGet component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component that opens the database connection
to be reused by this component.

Application Name The application name required by Google Drive to get


access to its APIs.

OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.

1289
tGoogleDriveGet

Access Token The access token generated through Google Developers


OAuth 2.0 Playground.
This property is available only when Access Token is
selected from the OAuth Method drop-down list.

Client ID and Client Secret The client ID and client secret.


These two properties are available only when Installed
Application (Id & Secret) is selected from the
OAuth Method drop-down list.

Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.

Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.

Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.

File The name or ID of the file to be downloaded.

Access Method Select the method by which the file to be downloaded is


specified, either by Name or by Id.

Save as File Select this check box to save the file to a local directory.
In the Save to field displayed, browse to or enter the path
where you want to save the file to be downloaded.

Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. You can click
the [...] button next to Edit schema to view the predefined
schema with only one field named content which describes
the content of the file to be downloaded.

Advanced settings

DataStore Path The path to the credential file that stores the refresh token.

1290
tGoogleDriveGet

Note: When your client ID, client secret, or any other


configuration related to the Installed Application
authentication changes, you need to delete this
credential file manually before running your Job again.

This property is available only when Installed


Application (Id & Secret) or Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

Export Google Doc as Select the type for the Google Doc to be exported.

Export Google Draw as Select the type for the Google Draw to be exported.

Export Google Presentation as Select the type for the Google Presentation to be exported.

Export Google Spreadsheet as Select the type for the Google Spreadsheet to be exported.

Add extension Select this check box to add extension to the exported file.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

FILE_ID The ID of the file. This is an After variable and it returns a


string.

Usage

Usage rule This component can be used as a standalone component or


as a start component of a Job or subJob.

Related scenario
No scenario is available for this component yet.

1291
tGoogleDriveList

tGoogleDriveList
Lists all files, or folders, or both files and folders in a specified Google Drive folder.

tGoogleDriveList Standard properties


These properties are used to configure tGoogleDriveList running in the Standard Job framework.
The Standard tGoogleDriveList component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component that opens the database connection
to be reused by this component.

Application Name The application name required by Google Drive to get


access to its APIs.

OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.

1292
tGoogleDriveList

Access Token The access token generated through Google Developers


OAuth 2.0 Playground.
This property is available only when Access Token is
selected from the OAuth Method drop-down list.

Client ID and Client Secret The client ID and client secret.


These two properties are available only when Installed
Application (Id & Secret) is selected from the
OAuth Method drop-down list.

Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.

Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.

Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.

Folder Name The name or ID of the folder in which the files/folders will
be listed.

Access Method Select the method by which the folder is specified, either by
Name or by Id.

FileList Type Select the type of data you want to list.


• Files: Only files.
• Directories: Only folders.
• Both: Both files and folders.

Include SubDirectories Select this check box to list also the files/folders in the
subdirectories.

Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. You can click
the [...] button next to Edit schema to view the predefined
schema which contains the following fields:
• id: The ID of the file/folder.
• name: The name of the file/folder.

1293
tGoogleDriveList

• mimeType: The MIME type of the file/folder.


• modifiedTime: The last modification date of the file/
folder.
• size: The file size in bytes.
• kind: The kind of the resource.
• trashed: Whether the file has been trashed.
• parents: The ID of the parent folder.
• webViewLink: A link for opening the file in a Google
editor or viewer in a browser.

Advanced settings

DataStore Path The path to the credential file that stores the refresh token.

Note: When your client ID, client secret, or any other


configuration related to the Installed Application
authentication changes, you need to delete this
credential file manually before running your Job again.

This property is available only when Installed


Application (Id & Secret) or Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

Include trashed files Select this check box to also take into account files and
folders that have been removed from the specified path.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is usually used as a start component of a


Job or subJob and it always needs an output link.

Related scenario
Managing files with Google Drive on page 1297

1294
tGoogleDrivePut

tGoogleDrivePut
Uploads data from a data flow or a local file to Google Drive.

tGoogleDrivePut Standard properties


These properties are used to configure tGoogleDrivePut running in the Standard Job framework.
The Standard tGoogleDrivePut component belongs to the Cloud family.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component that opens the database connection
to be reused by this component.

Application Name The application name required by Google Drive to get


access to its APIs.

OAuth Method Select an OAuth method used to access Google Drive from
the drop-down list.
• Access Token (deprecated): uses an access token to
access Google Drive.
• Installed Application (Id & Secret): uses the client ID
and client secret created through Google API Console
to access Google Drive. For more information about
this method, see Google Identity Platform > Installed
applications .
• Installed Application (JSON): uses the client secret
JSON file that is created through Google API Console
and contains the client ID, client secret, and other
OAuth 2.0 parameters to access Google Drive.
• Service Account: uses a service account JSON file
created through Google API Console to access Google
Drive. For more information about this method, see
Google Identity Platform > Service accounts.
For more detailed information about how to access Google
Drive using each method, see OAuth methods for accessing
Google Drive.

1295
tGoogleDrivePut

Access Token The access token generated through Google Developers


OAuth 2.0 Playground.
This property is available only when Access Token is
selected from the OAuth Method drop-down list.

Client ID and Client Secret The client ID and client secret.


These two properties are available only when Installed
Application (Id & Secret) is selected from the
OAuth Method drop-down list.

Client Secret JSON The path to the client secret JSON file.
This property is available only when Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

Service Account JSON The path to the service account JSON file.
This property is available only when Service Account is
selected from the OAuth Method drop-down list.

Use Proxy Select this check box when you are working behind a proxy.
With this check box selected, you need to specify the value
for the following parameters:
• Host: The IP address of the HTTP proxy server.
• Port: The port number of the HTTP proxy server.

Use SSL Select this check box if an SSL connection is used to access
Google Drive. With this check box selected, you need to
specify the value for the following parameters:
• Algorithm: The name of the SSL cryptography
algorithm.
• Keystore File: The path to the certificate TrustStore file
that contains the list of certificates the client trusts.
• Password: The password used to check the integrity of
the TrustStore data.

File Name The name for the file after being uploaded.

Destination Folder The name or ID of the folder in which uploaded data will be
stored.

Access Method Select the method by which the destination folder is


specified, either by Name or by Id.

Replace if Existing Select this check box to overwrite any existing file with the
newly uploaded one.

Upload Mode Select one of the following upload modes from the drop-
down list:
• Upload Incoming content as File: Select this option
to upload data from an input flow of the preceding
component.
• Upload Local File: Select this option to upload data
from a local file. In the File field displayed, specify the
path to the file to be uploaded.
• Expose As OutputStream: Select this option to expose
output stream of this component, which can be used
by other components to write the file content. For

1296
tGoogleDrivePut

example, you can use the Use Output Stream feature


of the tFileOutputDelimited component to feed a given
tGoogleDrivePut's exposed output stream. For more
information, see tFileOutputDelimited on page 1113.

Schema and Edit schema A schema is a row description, and it defines the fields to be
processed and passed on to the next component.
The schema of this component is read-only. You can click
the [...] button next to Edit schema to view the predefined
schema which contains the following fields:
• content: The content of the uploaded data.
• parentFolderId: The ID of the parent folder.
• fileId: The ID of the file.

Advanced settings

DataStore Path The path to the credential file that stores the refresh token.

Note: When your client ID, client secret, or any other


configuration related to the Installed Application
authentication changes, you need to delete this
credential file manually before running your Job again.

This property is available only when Installed


Application (Id & Secret) or Installed
Application (JSON) is selected from the OAuth
Method drop-down list.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

PARENT_FOLDER_ID The ID of the parent folder. This is an After variable and it


returns a string.

FILE_ID The ID of the file. This is an After variable and it returns a


string.

Usage

Usage rule This component can be used as a standalone component to


upload a local file to Google Drive or an end component to
upload data from an input flow of the preceding component
to Google Drive.

Managing files with Google Drive


This scenario describes a Job that uploads two files to an empty folder Talend in the root directory
of Google Drive, then creates a new folder Talend Backup in the root directory and copies one of

1297
tGoogleDrivePut

the two files to the new folder Talend Backup, and finally lists and displays all files and folders in
the root directory of Google Drive on the console.

Creating a Job for managing files with Google Drive


Procedure
1. Create a new Job and add a tGoogleDriveConnection component, two tGoogleDrivePut
components, a tFileInputRaw component, a tGoogleDriveCreate component, a tGoogleDriveCopy
component, a tGoogleDriveList component, and five tLogRow components to the Job.

1298
tGoogleDrivePut

2. Link the first tGoogleDrivePut component to the first tLogRow component using a Row > Main
connection.
3. Do the same to link the tFileInputRaw component to the second tGoogleDrivePut component,
the second tGoogleDrivePut component to the second tLogRow component, the tGoogleDriveCr
eate component to the third tLogRow component, the tGoogleDriveCopy component to the fourth
tLogRow component, the tGoogleDriveList component to the fifth tLogRow component.
4. Link the tGoogleDriveConnection component to the first tGoogleDrivePut component using a
Trigger > On Subjob Ok connection.
5. Do the same to link the first tGoogleDrivePut component to the tFileInputRaw component,
the tFileInputRaw component to the tGoogleDriveCreate component, the tGoogleDriveCreate
component to the tGoogleDriveCopy component, and the tGoogleDriveCopy component to the
tGoogleDriveList component.

Opening a connection to Google Drive


Configure the tGoogleDriveConnection component to connect to Google Drive using a client secret
JSON file.

1299
tGoogleDrivePut

Before you begin


• The client secret JSON file has been downloaded into a local folder through Google API Console.
For more information, see How to access Google Drive using a client secret JSON file on page
1273.
• An empty folder Talend has been created in the root directory of Google Drive.

Procedure
1. Double-click the tGoogleDriveConnection component to open its Basic settings view in the
Component tab.

2. In the Application Name field, enter the application name required by Google Drive to get access
to its API. In this example, it is TalendProject.
3. Select Installed Application (JSON) from the OAuth Method drop-down list.
4. In the Client Secret JSON field, specify the path to the client secret JSON file you have generated,
D:/client_secret.json in this example.

Uploading files to Google Drive


Procedure
1. Double-click the first tGoogleDrivePut component to open its Basic settings view in the
Component tab.

1300
tGoogleDrivePut

2. Select the component that will create the Google Drive connection from the Connection
Component drop-down list, tGoogleDriveConnection_1 in this example.
3. Select by Name from the Access Method drop-down list and in the Destination Folder field, enter
the name of the folder in which the file will be uploaded, Talend in this example.

Note: When accessing a Google Drive resource by its name, if the name matches more than one
resource, an error will be thrown because the resource cannot be identified precisely. In this
case, you can specify the Google Drive resource using a pseudo path hierarchy, like /Talend/
Documentation. This example specifies a folder named Documentation under the folder
Talend under the Google Drive root folder.

4. In the File Name field, enter the name for the file after being uploaded. In this example, it is
Talend Customers.csv.
5. Select Upload Local File from the Upload Mode drop-down list and in the File field, browse
to or enter the path to the file to be uploaded. In this example, it is D:/Downloads/Talend
Customers.csv.
6. Double-click the tFileInputRaw component and on its Basic settings view, select Read the
file as a bytes array in the Mode area and specify the path to the file whose content will
be uploaded in the Filename field, D:/Downloads/Talend Products.txt in this example.
7. Double-click the second tGoogleDrivePut component to open its Basic settings view in the
Component tab.

8. Repeat step 2 on page 1301 to step 3 on page 1301 to configure this component.
9. In the File Name field, enter the name for the file after being uploaded. In this example, it is
Talend Products.txt.
10. Select Upload Incoming content as File from the Upload Mode drop-down list.

1301
tGoogleDrivePut

Creating a new folder in Google Drive


Procedure
1. Double-click tGoogleDriveCreate to open its Basic settings view in the Component tab.

2. Select the component that will create the Google Drive connection from the Connection
Component drop-down list, tGoogleDriveConnection_1 in this example.
3. In the Parent Folder field, enter the name of the folder in which a new folder will be created. In
this example, it is root.
4. In the New Folder Name field, enter the name of the folder to be created. In this example, it is
Talend Backup.
5. Double-click the third tLogRow component to open its Basic settings view in the Component tab.
6. In the Mode area, select Vertical (each row is a key/value list) for a better display of the results.

Copying a file to the newly created folder


Procedure
1. Double-click the tGoogleDriveCopy component to open its Basic settings view in the Component
tab.

2. Select the component that will create the Google Drive connection from the Connection
Component drop-down list, tGoogleDriveConnection_1 in this example.
3. Select File from the Copy Mode drop-down list.
4. In the Source field, enter the name of the file to be copied. In this example, it is Talend
Customers.csv.
5. In the Destination Folder Name field, enter the name of the folder to which the file will be copied.
In this example, it is Talend Backup.
6. Select the Rename (set new title) check box and in the Destination Name field, enter a new
name for the file after being copied to the destination folder. In this example, it is Talend
Customers v1.0.csv.

1302
tGoogleDrivePut

7. Double-click the fourth tLogRow component to open its Basic settings view in the Component tab.
8. In the Mode area, select Vertical (each row is a key/value list) for a better display of the results.

Listing files and folders in Google Drive


Procedure
1. Double-click the tGoogleDriveList component to open its Basic settings view in the Component
tab.

2. Select the component that will create the Google Drive connection from the Connection
Component drop-down list, tGoogleDriveConnection_1 in this example.
3. In the Folder Name field, enter the name of the folder in which the files/folders will be listed. In
this example, it is the root directory of Google Drive and you can use the alias root to refer to it.
4. Select Both from the FileList Type drop-down list to list both files and folders in the root
directory.
5. Select the Include SubDirectories check box to list also the files/folders in the subdirectories.
6. Double-click the fifth tLogRow component to open its Basic settings view in the Component tab.

7. In the Mode area, select Vertical (each row is a key/value list) for a better display of the results.

Saving and executing the Job


Procedure
1. Press Ctrl + S to save the Job.
2. Execute the Job by pressing F6 or clicking Run on the Run tab.

1303
tGoogleDrivePut

1304
tGoogleDrivePut

As shown above, two files Talend Products.txt and Talend Customers.csv were
uploaded to the folder Talend, then a new folder Talend Backup was created in the root
folder and the file Talend Customers.csv was copied to the new folder and renamed to
Talend Customers v1.0.csv, and finally all files and folders in the root directory are listed
on the console.

1305
tGPGDecrypt

tGPGDecrypt
Calls the gpg -d command to decrypt a GnuPG-encrypted file and saves the decrypted file in the
specified directory.

tGPGDecrypt Standard properties


These properties are used to configure tGPGDecrypt running in the Standard Job framework.
The Standard tGPGDecrypt component belongs to the File family.
The component in this framework is available in all Talend products.

Basic settings

Input encrypted file File path to the encrypted file.

Output file File path to the output decrypted file.

GPG binary path File path to the GPG command.

Passphrase Enter the passphrase used in encrypting the specified input


file.
To enter the passphrase, click the [...] button next to the
passphrase field, and then in the pop-up dialog box enter
the passphrase between double quotes and click OK to save
the settings.

No TTY Terminal Select this check box to speficy that no TTY terminal is
used by adding the --no-tty option to the decryption
command.

Advanced settings

tStatCatcher Statistics Select this check box to gather the processing metadata at
the Job level as well as at each component level.

Global Variables

Global Variables FILE: the name of the output file. This is a Flow variable and
it returns a string.
FILEPATH: the path of the output file. This is a Flow variable
and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

1306
tGPGDecrypt

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as a standalone component.

Decrypting a GnuPG-encrypted file and display its content


The following scenario describes a three-component Job that decrypts a GnuPG-encrypted file and
displays the content of the decrypted file on the Run console.

Dragging and linking the components


Procedure
1. Drop a tGPGDecrypt component, a tFileInputDelimited component, and a tLogRow component
from the Palette to the design workspace.
2. Connect the tGPGDecrypt component to the tFileInputDelimited component using a Trigger >
OnSubjobOk link, and connect the tFileInputDelimited component to the tLogRow component
using a Row > Main link.

Configuring the components


Procedure
1. Double-click the tGPGDecrypt to open its Component view and set its properties:

1307
tGPGDecrypt

2. In the Input encrypted file field, browse to the file to be decrypted.


3. In the Output decrypted file field, enter the path to the decrypted file.

Warning:
If the file path contains accented characters, you will get an error message when running the
Job.

4. In the GPG binary path field, browse to the GPG command file.
5. In the Passphrase field, enter the passphrase used when encrypting the input file.
6. Double-click the tFileInputDelimited component to open its Component view and set its
properties:

7. In the File name/Stream field, define the path to the decrypted file, which is the output path you
have defined in the tGPGDecrypt component.
8. In the Header, Footer and Limit fields, define respectively the number of rows to be skipped in the
beginning of the file, at the end of the file and the number of rows to be processed.
9. Use a Built-In schema. This means that it is available for this Job only.
10. Click Edit schema and edit the schema for the component. Click twice the [+] button to add two
columns that you will call idState and labelState.
11. Click OK to validate your changes and close the editor.

1308
tGPGDecrypt

12. Double-click the tLogRow component and set its properties:

13. Use a Built-In schema for this scenario.


14. In the Mode area, define the console display mode according to your preference. In this scenario,
select Table (print values in cells of a table).

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job
2. Press F6 or click Run from the Run tab to run it.

1309
tGPGDecrypt

Results
The specified file is decrypted and the defined number of rows of the decrypted file are printed on the
Run console.

1310
tGreenplumBulkExec

tGreenplumBulkExec
Improves performance when loading data in a Greenplum database.
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT
statement used to feed a database. These two steps are fused together in the tGreenplumOutp
utBulkExec component, detailed in a separate section. The advantage of using a two step process is
that it makes it possible to transform data before it is loaded in the database.
tGreenplumBulkExec performs an Insert action on the data.

tGreenplumBulkExec Standard properties


These properties are used to configure tGreenplumBulkExec running in the Standard Job framework.
The Standard tGreenplumBulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

1311
tGreenplumBulkExec

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

Schema Exact name of the schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.

Filename Name of the file to be loaded.

Warning:
This file is located on the machine specified by the URI
in the Host field so it should be on the same machine as
the database server.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

1312
tGreenplumBulkExec

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Advanced settings

Action on data Select the operation you want to perform:


Bulk insert Bulk update The details asked will be different
according to the action chosen.

Copy the OID for each row Retrieve the ID item for each row.

Contains a header line with the names of each column in th Specify that the table contains header.
e file

File type Select the file type to process.

Null string String displayed to indicate that the value is null.

Fields terminated by Character, string or regular expression to separate fields.

Escape char Character of the row to be escaped

Text enclosure Character used to enclose text.

Force not null for columns Define the columns nullability


Force not null: Select the check box next to the column you
want to define as not null.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is generally used with a tGreenplumOutp


utBulk component. Used together they offer gains in
performance while feeding a Greenplum database.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for

1313
tGreenplumBulkExec

example, when your Job has to be deployed and executed


independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For more information about tGreenplumBulkExec, see:
• Inserting transformed data in MySQL database on page 2482.
• Inserting data in bulk in MySQL database on page 2489.
• Truncating and inserting file data into an Oracle database on page 2681.

1314
tGreenplumClose

tGreenplumClose
Closes a connection to the Greenplum database.

tGreenplumClose Standard properties


These properties are used to configure tGreenplumClose running in the Standard Job framework.
The Standard tGreenplumClose component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tGreenplumConnection component in the list if


more than one connection are planned for the current Job.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is to be used along with Greenplum


components, especially with tGreenplumConnection and
tGreenplumCommit.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

1315
tGreenplumClose

Related scenarios
No scenario is available for the Standard version of this component yet.

1316
tGreenplumCommit

tGreenplumCommit
Commits global transaction in one go instead of repeating the operation for every row or every batch
and thus provides gain in performance.
tGreenplumCommit validates the data processed through the Job into the connected DB. This
component uses an unique connection.

tGreenplumCommit Standard properties


These properties are used to configure tGreenplumCommit running in the Standard Job framework.
The Standard tGreenplumCommit component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tGreenplumConnection component in the list if


more than one connection are planned for the current Job.

Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.

Warning:
If you want to use a Row > Main connection to link
tGreenplumCommit to your Job, your data will be
committed row by row. In this case, do not select the Close
connection check box or your connection will be closed
before the end of your first row commit.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other


tGreenplum* components, especially with the
tGreenplumConnection and tGreenplumRollback
components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database

1317
tGreenplumCommit

connection dynamically from multiple connections planned


in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For tGreenplumCommit related scenarios, see:
• Mapping data using a simple implicit join on page 686.
• Inserting data in mother/daughter tables on page 2426.

1318
tGreenplumConnection

tGreenplumConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tGreenplumConnection opens a connection to the database for a current transaction.

tGreenplumConnection Standard properties


These properties are used to configure tGreenplumConnection running in the Standard Job framework.
The Standard tGreenplumConnection component belongs to the Databases and the ELT families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

Schema Exact name of the schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the

1319
tGreenplumConnection

tRunJob component. Using a shared connection together


with a tRunJob component with either of these two options
enabled will cause your Job to fail.

Advanced settings

Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.

Usage

Usage rule This component is more commonly used with


other tGreenplum* components, especially with the
tGreenplumCommit and tGreenplumRollback components.

Related scenarios
For tGreenplumConnection related scenarios, see:
• Mapping data using a simple implicit join on page 686.
• tMysqlConnection on page 2425.

1320
tGreenplumGPLoad

tGreenplumGPLoad
Bulk loads data into a Greenplum table either from an existing data file, an input flow, or directly from
a data flow in streaming mode through a named-pipe.
tGreenplumGPLoad inserts data into a Greenplum database table using Greenplum's gpload utility.

tGreenplumGPLoad Standard properties


These properties are used to configure tGreenplumGPLoad running in the Standard Job framework.
The Standard tGreenplumGPLoad component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Host Database server IP address.

Port Listening port number of the DB server.

Database Name of the Greenplum database.

Schema Exact name of the schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table into which the data is to be inserted.

Action on table On the table defined, you can perform one of the following
operations before loading the data:
None: No operation is carried out.
Clear table: The table content is deleted before the data is
loaded.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not
exist.
Drop and create table: The table is removed and created
again.
Drop table if exists and create: The table is removed if it
already exists and created again.

1321
tGreenplumGPLoad

Truncate table: The table content is deleted. You do not


have the possibility to rollback the operation.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries.
Merge: Updates or adds data to the table.

Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Merge operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to set
as primary key(s). To define the Update/Merge options,
select in the Match Column column the check boxes
corresponding to the column names that you want to use as
a base for the Update and Merge operations, and select in
the Update Column column the check boxes corresponding
to the column names that you want to update. To define
the Update condition, type in the condition that will be
used to update the data.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Data file Full path to the data file to be used. If this component is
used in standalone mode, this is the name of an existing
data file to be loaded into the database. If this component
is connected with an input flow, this is the name of the file
to be generated and written with the incoming data to later
be used with gpload to load into the database. This field is
hidden when the Use named-pipe check box is selected.

1322
tGreenplumGPLoad

Use named-pipe Select this check box to use a named-pipe. This option is
only applicable when the component is connected with an
input flow. When this check box is selected, no data file is
generated and the data is transferred to gpload through a
named-pipe. This option greatly improves performance in
both Linux and Windows.

Note:
This component on named-pipe mode uses a JNI
interface to create and write to a named-pipe on any
Windows platform. Therefore the path to the associated
JNI DLL must be configured inside the java library path.
The component comes with two DLLs for both 32 and
64 bit operating systems that are automatically provided
in the Studio with the component.

Named-pipe name Specify a name for the named-pipe to be used. Ensure that
the name entered is valid.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Use existing control file (YAML formatted) Select this check box to provide a control file to be used
with the gpload utility instead of specifying all the options
explicitly in the component. When this check box is
selected, Data file and the other gpload related options no
longer apply. Refer to Greenplum's gpload manual for de
tails on creating a control file.

Control file Enter the path to the control file to be used, between
double quotation marks, or click [...] and browse to the
control file. This option is passed on to the gpload utility
via the -f argument.

CSV mode Select this check box to include CSV specific parameters
such as Escape char and Text enclosure.

Field separator Character, string, or regular expression used to separate


fields.

Warning:
This is gpload's delim argument. The default value is |. To
improve performance, use the default value.

Escape char Character of the row to be escaped.

Text enclosure Character used to enclose text.

Header (skips the first row of data file) Select this check box to skip the first row of the data file.

Additional options Set the gpload arguments in the corresponding table. Click
[+] as many times as required to add arguments to the
table. Click the Parameter field and choose among the

1323
tGreenplumGPLoad

arguments from the list. Then click the corresponding Value


field and enter a value between quotation marks.

  LOCAL_HOSTNAME: The host name or IP address of the


local machine on which gpload is running. If this machine
is configured with multiple network interface cards (NICs),
you can specify the host name or IP of each individual NIC
to allow network traffic to use all NICs simultaneously. By
default, the local machine's primary host name or IP is used.

  PORT (gpfdist port): The specific port number that the


gpfdist file distribution program should use. You can also
specify a PORT_RANGE to select an available port from
the specified range. If both PORT and PORT_RANGE are
defined, then PORT takes precedence. If neither PORT or
PORT_RANGE is defined, an available port between 8000
and 9000 is selected by default. If multiple host names are
declared in LOCAL_HOSTNAME, this port number is used for
all hosts. This configuration is desired if you want to use all
NICs to load the same file or set of files in a given directory
location.

  PORT_RANGE: Can be used instead of PORT (gpfdist port)


to specify a range of port numbers from which gpload can
choose an available port for this instance of the gpfdist file
distribution program.

  NULL_AS: The string that represents a null value. The


default is \N (backslash-N) in TEXT mode, and an empty
value with no quotation marks in CSV mode. Any source
data item that matches this string will be considered a null
value.

  FORCE_NOT_NULL: In CSV mode, processes each specified


column as though it were quoted and hence not a NULL
value. For the default null string in CSV mode (nothing
between two delimiters), this causes missing values to be
evaluated as zero-length strings.

  ERROR_LIMIT (2 or higher): Enables single row error


isolation mode for this load operation. When enabled
and the error limit count is not reached on any Greenplum
segment instance during input processing, all good rows w
ill be loaded and input rows that have format errors will be
discarded or logged to the table specified in ERROR_TABLE
if available. When the error limit is reached, input rows
that have format errors will cause the load operation to
abort. Note that single row error isolation only applies to
data rows with format errors, for example, extra or missing
attributes, attributes of a wrong data type, or invalid client
encoding sequences. Constraint errors, such as primary
key violations, will still cause the load operation to abort
if encountered. When this option is not enabled, the load
operation will abort on the first error encountered.

ERROR_TABLE: When ERROR_LIMIT is declared, specifies an


error table where rows with formatting errors will be logged
when running in single row error isolation mode. You can
then examine this error table to see error rows that were
not loaded (if any).

1324
tGreenplumGPLoad

Log file Browse to or enter the access path to the log file in your d
irectory.

Encoding Define the encoding type manually in the field.

Specify gpload path Select this check box to specify the full path to the gpload
executable. You must check this option if the gpload path is
not specified in the PATH environment variable.

Full path to gpload executable Full path to the gpload executable on the machine in
use. It is advisable to specify the gpload path in the PATH
environment variable instead of selecting this option.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
GPLOAD_OUTPUT: the output information when the gpload
utility is the executed. This is an After variable and it
returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is mainly used when no particular


transformation is required on the data to be loaded on to
the database.
This component can be used as a standalone or an output
component.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

1325
tGreenplumGPLoad

Related scenario
For a related use case, see Inserting data in bulk in MySQL database on page 2489.

1326
tGreenplumInput

tGreenplumInput
Reads a database and extracts fields based on a query.
tGreenplumInput executes a DB query with a strictly defined order which must correspond to the
schema definition and then it passes on the field list to the next component via a Main row link.

tGreenplumInput Standard properties


These properties are used to configure tGreenplumInput running in the Standard Job framework.
The Standard tGreenplumInput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

Schema Exact name of the schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

1327
tGreenplumInput

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.

Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.

Guess schema Click the Guess schema button to retrieve the table schema.

Advanced settings

Use cursor When selected, helps to decide the row set to work with at a
time and thus optimize performance.

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Remove leading and trailing whitespace from defined


columns.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

1328
tGreenplumInput

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component covers all possible SQL queries for FireBird
databases.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For related topics, see:
• Mapping data using a simple implicit join on page 686.
See also related topic: Reading data from different MySQL databases using dynamically loaded
connection parameters on page 497.

1329
tGreenplumOutput

tGreenplumOutput
Executes the action defined on the table and/or on the data of a table, according to the input flow
from the previous component.
tGreenplumOutput writes, updates, modifies or deletes the data in a database.

tGreenplumOutput Standard properties


These properties are used to configure tGreenplumOutput running in the Standard Job framework.
The Standard tGreenplumOutput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

1330
tGreenplumOutput

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.

1331
tGreenplumOutput

Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column , select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Use alternate schema Select this option to use a schema other than the one
specified by the component that establishes the database

1332
tGreenplumOutput

connection (that is, the component selected from the


Component list drop-down list in Basic settings view).
After selecting this option, provide the name of the desired
schema in the Schema field.
This option is available when Use an existing connection is
selected in Basic settings view.

Commit every Enter the number of rows to be completed before


committing batches of rows together into the DB. This
option ensures transaction quality (but not rollback) and,
above all, better performance at execution.

Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.

  Name: Type in the name of the schema column to be


altered or inserted as new column

  SQL expression: Type in the SQL statement to be executed


in order to alter or insert the relevant column data.

  Position: Select Before, Replace or After following the


action to be performed on the reference column.

  Reference column: Type in a column of reference that the


tDBOutput can use to place or replace the new or altered
column.

Use field options Select this check box to customize a request, especially
when there is double action on data.

Use Batch Select this check box to activate the batch mode for data
processing.

Note:
This check box is available only when you have selected
the Insert, Update, or Delete option in the Action on data
option.

Batch Size Specify the number of records to be processed in each


batch.
This field appears only when the Use batch mode check box
is selected.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.

1333
tGreenplumOutput

NB_LINE_INSERTED: the number of rows inserted. This is an


After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

Usage

Usage rule This component covers all possible SQL queries for
Greenplum databases. It allows you to carry out actions on
a table or on the data of a table in a Greenplum database.
It enables you to create a reject flow, with a Row > Rejects
link filtering the data in error. For a usage example, see
Retrieving data in error with a Reject link on page 2474.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For related scenarios, see:

1334
tGreenplumOutput

• Mapping data using a simple implicit join on page 686.


• Inserting a column and altering data using tMysqlOutput on page 2466.

1335
tGreenplumOutputBulk

tGreenplumOutputBulk
Prepares the file to be used as parameter in the INSERT query to feed the Greenplum database.
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT
operation used to feed a database. These two steps are fused together in the tGreenplumOutp
utBulkExec component, detailed in a separate section. The advantage of using a two step process is
that it makes it possible to transform data before it is loaded in the database.
Writes a file with columns based on the defined delimiter and the Greenplum standards

tGreenplumOutputBulk Standard properties


These properties are used to configure tGreenplumOutputBulk running in the Standard Job framework.
The Standard tGreenplumOutputBulk component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

File Name Name of the file to be generated.

Warning:
This file is generated on the local machine or a shared
folder on the LAN.

Append Select this check box to add the new rows at the end of the
records

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

1336
tGreenplumOutputBulk

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Advanced settings

Row separator String (ex: "\n"on Unix) to distinguish rows.

Field separator Character, string or regular expression to separate fields.

Include header Select this check to include the column header.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

tStatCatcher statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

1337
tGreenplumOutputBulk

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule This component is to be used along with tGreenplumBulk


Exec component. Used together they offer gains in
performance while feeding a Greenplum database.

Component family Databases/Greenplum

Related scenarios
For use cases in relation with tGreenplumOutputBulk, see the following scenarios:
• Inserting transformed data in MySQL database on page 2482.
• Inserting data in bulk in MySQL database on page 2489.

1338
tGreenplumOutputBulkExec

tGreenplumOutputBulkExec
Provides performance gains during Insert operations to a Greenplum database.
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT
operation used to feed a database. These two steps are fused together in the tGreenplumOutp
utBulkExec component.
tGreenplumOutputBulkExec executes the action on the data provided.

tGreenplumOutputBulkExec Standard properties


These properties are used to configure tGreenplumOutputBulkExec running in the Standard Job
framework.
The Standard tGreenplumOutputBulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Host Database server IP address.


Currently, only localhost, 127.0.0.1 or the exact IP address of
the local machine is allowed for proper functioning. In other
words, the database server must be installed on the same m
achine where the Studio is installed or where the Job using
tGreenplumOutputBulkExec is deployed.

Port Listening port number of DB server.

Database name Name of the database.

Schema Exact name of the schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

1339
tGreenplumOutputBulkExec

Table Name of the table to be written.


Note that only one table can be written at a time and that
the table must exist for the insert operation to succeed.

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted. You have the
possibility to rollback the operation.

File Name Name of the file to be generated and loaded.

Warning:
This file is generated on the machine specified by
the URI in the Host field so it should be on the same
machine as the database server.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon

1340
tGreenplumOutputBulkExec

completion and choose this schema metadata again in


the Repository Content window.

Advanced settings

Action on data Select the operation you want to perform:


Bulk insert Bulk update The details asked will be different
according to the action chosen.

Copy the OID for each row Retrieve the ID item for each row.

Contains a header line with the names of each column in th Specify that the table contains header.
e file

File type Select the file type to process.

Null string String displayed to indicate that the value is null.

Row separator String (ex: "\n"on Unix) to distinguish rows.

Fields terminated by Character, string or regular expression to separate fields.

Escape char Character of the row to be escaped

Text enclosure Character used to enclose text.

Force not null for columns Define the columns nullability


Force not null: Select the check box next to the column you
want to define as not null.

tStatCatcherStatistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is mainly used when no particular


transformation is required on the data to be loaded onto the
database.

Limitation The database server must be installed on the same machine


where the Studio is installed or where the Job using
tGreenplumOutputBulkExec is deployed, so that the co
mponent functions properly.

Related scenarios
For use cases in relation with tGreenplumOutputBulkExec, see the following scenarios:
• Inserting transformed data in MySQL database on page 2482.
• Inserting data in bulk in MySQL database on page 2489.

1341
tGreenplumRollback

tGreenplumRollback
Avoids to commit part of a transaction involuntarily.
tGreenplumRollback cancels the transaction committed in the connected DB.

tGreenplumRollback Standard properties


These properties are used to configure tGreenplumRollback running in the Standard Job framework.
The Standard tGreenplumRollback component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tGreenplumConnection component in the list if


more than one connection are planned for the current Job.

Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with


other tGreenplum* components, especially with the
tGreenplumConnection and tGreenplumCommit
components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different

1342
tGreenplumRollback

MySQL databases using dynamically loaded connection


parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For tGreenplumRollback related scenario, see Rollback from inserting data in mother/daughter tables
on page 2429.

1343
tGreenplumRow

tGreenplumRow
Acts on the actual DB structure or on the data (although without handling data), depending on the
nature of the query and the database.
The SQLBuilder tool helps you write easily your SQL statements.
tGreenplumRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesn't provide output.

tGreenplumRow Standard properties


These properties are used to configure tGreenplumRow running in the Standard Job framework.
The Standard tGreenplumRow component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

1344
tGreenplumRow

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Schema Exact name of the schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Table Name Name of the table to be read.

Query type Either Built-in or Repository .

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder.

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

1345
tGreenplumRow

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.

Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the represented by "?" in the SQL
instruction of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.

Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

1346
tGreenplumRow

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For a related scenario, see:
• Combining two flows for selective output on page 2503
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.

1347
tGreenplumSCD

tGreenplumSCD
Addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table.
tGreenplumSCD reflects and tracks changes in a dedicated Greenplum SCD table.

tGreenplumSCD Standard properties


These properties are used to configure tGreenplumSCD running in the Standard Job framework.
The Standard tGreenplumSCD component belongs to the Business Intelligence and the Databases
families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the Repository file where properties are


stored. The following fields are pre-filled in using fetched
data.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Connection type Select the relevant driver on the list.

Host Database server IP address.

1348
tGreenplumSCD

Port Listening port number of DB server.

Database Name of the database.

Schema Name of the DB schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

SCD Editor The SCD editor helps to build and configure the data flow
for slowly changing dimension outputs.
For more information, see SCD management methodology
on page 2511.

Use memory saving Mode Select this check box to maximize system performance.

Source keys include Null Select this check box to allow the source key columns to
have Null values.

Warning:
Special attention should be paid to the uniqueness of the
source key(s) value when this option is selected.

1349
tGreenplumSCD

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

End date time details Specify the time value of the SCD end date time setting in
the format of HH:mm:ss. The default value for this field is
12:00:00.
This field appears only when SCD Type 2 is used and Fixed
year value is selected for creating the SCD end date.

Debug mode Select this check box to display each step during
processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE_UPDATED: the number of rows updated. This is an


After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is used as Output component. It requires an


Input component and Row main link as input.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.

1350
tGreenplumSCD

The Dynamic settings table is available only when the Use


an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation This component does not support using SCD type 0 together
with other SCD types.

Related scenario
For related scenarios, see tMysqlSCD on page 2508.

1351
tGroovy

tGroovy
tGroovy broadens the functionality if the Talend Job, using the Groovy language which is a simplified
Java syntax.
tGroovy allows you to enter customized code which you can integrate in the Talend programme. The
code is run only once.

tGroovy Standard properties


These properties are used to configure tGroovy running in the Standard Job framework.
The Standard tGroovy component belongs to the Custom Code family.
The component in this framework is available in all Talend products.

Basic settings

Groovy Script Enter the Groovy code you want to run.

Variables This table has two columns.


Name: Name of the variable called in the code.
Value: Value associated with the variable.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used alone or as a subJob along


with one other component.

Limitation Knowledge of the Groovy language is required.

1352
tGroovy

Related Scenarios
• For a scenario using the Groovy code, see Calling a file which contains Groovy code on page
1355.
• For a functional example, see Printing out a variable content on page 1823

1353
tGroovyFile

tGroovyFile
Broadens the functionality of Talend Jobs using the Groovy language which is a simplified Java
syntax.
tGroovyFile allows you to call an existing Groovy script.

tGroovyFile Standard properties


These properties are used to configure tGroovyFile running in the Standard Job framework.
The Standard tGroovyFile component belongs to the Custom Code family.
The component in this framework is available in all Talend products.

Basic settings

Groovy File Name and path of the file containing the Groovy code.

Variables This table contains two columns.


Name: Name of the variable called in the code.
Value: Value associated with this variable.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used alone or as a subJob along


with another component.

Limitation Knowledge of the Groovy language is required.

1354
tGroovyFile

Calling a file which contains Groovy code


This scenario uses tGroovyFile, on its own. The Job calls a file containing Groovy code in order to
display the file information in the Console.

Setting up the Job


Open the Custom_Code folder in the Palette and drop a tGroovyFile component onto the workspace.

Configuring the tGroovyFile component


Procedure
1. Double-click the component to display the Component view.

2. In the Groovy File field, enter the path to the file containing the Groovy code, or browse to the
file in your directory. In this example, it is D:/Input/Ageducapitaine.txt, and the file contains the
following Groovy codes:

println("The captain is " + age + " years old")

3. In the Variables table, add a line by clicking the [+] button.


4. In the Name column, enter "age", and then in the Value column, enter 50.

Executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 or click Run on the Run tab to execute the Job.
The Console displays the information contained in the input file, to which the variable result is
added.

1355
tGroovyFile

1356
tGSBucketCreate

tGSBucketCreate
Creates a new bucket which you can use to organize data and control access to data in Google Cloud
Storage.

tGSBucketCreate Standard properties


These properties are used to configure tGSBucketCreate running in the Standard Job framework.
The Standard tGSBucketCreate component belongs to the Big Data and the Cloud families.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.

Bucket name Specify the name of the bucket which you want to create.
Note that the bucket name must be unique across the
Google Cloud Storage system.
For more information about the bucket naming convention,
see https://developers.google.com/storage/docs/
bucketnaming.

Special configure Select this check box to provide the additional configuration
for the bucket to be created.

Project ID Specify the project ID to which the new bucket belongs.

Location Select from the list the location where the new bucket
will be created. Currently, Europe and US are available. By
default, the bucket location is in the US.
Note that once a bucket is created in a specific location, it
cannot be moved to another location.

1357
tGSBucketCreate

Acl Select from the list the desired access control list (ACL) for
the new bucket.
Depending on the ACL on the bucket, the access requests
from users may be allowed or rejected. If you do not specify
a predefined ACL for the new bucket, the predefined
project-private ACL applies.
For more information about ACL, see https://develo
pers.google.com/storage/docs/accesscontrol?hl=en.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used together with the


tGSBucketList component to check if a new bucket is cre
ated successfully.

Related scenario
For related topics, see Verifing the absence of a bucket, creating it and listing all the S3 buckets on
page 3176.

1358
tGSBucketDelete

tGSBucketDelete
Deletes an empty bucket in Google Cloud Storage so as to release occupied resources.
Note that bucket deletion cannot be undone, so you need to back up any data that you want to keep
before the deletion.

tGSBucketDelete Standard properties


These properties are used to configure tGSBucketDelete running in the Standard Job framework.
The Standard tGSBucketDelete component belongs to the Big Data and the Cloud families.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.

Bucket name Specify the name of the bucket that you want to delete.
Make sure that the bucket to be deleted is empty.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable

1359
tGSBucketDelete

and it returns a string. This variable functions only if the


Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used together with the


tGSBucketList component to check if the specified bucket is
deleted successfully.

Related scenarios
No scenario is available for the Standard version of this component yet.

1360
tGSBucketExist

tGSBucketExist
Checks the existence of a bucket in Google Cloud Storage so as to make further operations.

tGSBucketExist Standard properties


These properties are used to configure tGSBucketExist running in the Standard Job framework.
The Standard tGSBucketExist component belongs to the Big Data and the Cloud families.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.

Bucket name Specify the name of the bucket for which you want to perf
orm a check to confirm it exists in Google Cloud Storage.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables BUCKET_EXIST: the existence of a specified bucket. This is a


Flow variable and it returns a boolean.
BUCKET_NAME: the name of a specified bucket. This is a
Flow variable and it returns a string.

1361
tGSBucketExist

ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as a standalone component.

Related scenario
For related topics, see Verifing the absence of a bucket, creating it and listing all the S3 buckets on
page 3176.

1362
tGSBucketList

tGSBucketList
Retrieves a list of buckets from all projects or one specific project in Google Cloud Storage.
tGSBucketList iterates on all buckets within all projects or one specific project in Google Cloud
Storage.

tGSBucketList Standard properties


These properties are used to configure tGSBucketList running in the Standard Job framework.
The Standard tGSBucketList component belongs to the Big Data and the Cloud families.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.

Specify project ID Select this check box and in the Project ID field specify a
project ID from which you want to retrieve a list of buckets.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables CURRENT_BUCKET_NAME: the current bucket name. This is


a Flow variable and it returns a string.
NB_BUCKET: the number of buckets. This is an After variable
and it returns an integer.

1363
tGSBucketList

ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule The tGSBucketList component can be used as a standalone


component or as a start component of a process.

Related scenario
For related topics, see Verifing the absence of a bucket, creating it and listing all the S3 buckets on
page 3176.

1364
tGSClose

tGSClose
Closes an active connection to Google Cloud Storage in order to release the occupied resources.

tGSClose Standard properties


These properties are used to configure tGSClose running in the Standard Job framework.
The Standard tGSClose component belongs to the Big Data and the Cloud families.
The component in this framework is available in all Talend products.

Basic settings

Component List Select the tGSConnection component in the list if more than
one connection is planned for the current Job.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is generally used with other Google Cloud
Storage components, particularly tGSConnection.

Related scenario
For a scenario in which tGSClose is used, see Managing files with Google Cloud Storage on page
1378.

1365
tGSConnection

tGSConnection
Provides the authentication information for making requests to the Google Cloud Storage system and
enables the reuse of the connection it creates to Google Cloud Storage.

tGSConnection Standard properties


These properties are used to configure tGSConnection running in the Standard Job framework.
The Standard tGSConnection component belongs to the Big Data and the Cloud families.
The component in this framework is available in all Talend products.

Basic settings

Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

1366
tGSConnection

Usage

Usage rule This component is generally used with other Google Cloud
Storage components, particularly tGSClose.

Related scenario
For a scenario in which tGSConnection is used, see Managing files with Google Cloud Storage on page
1378.

1367
tGSCopy

tGSCopy
Copies or moves objects within a bucket or between buckets in Google Cloud Storage.
tGSCopy streamlines processes by automating the copy tasks..

tGSCopy Standard properties


These properties are used to configure tGSCopy running in the Standard Job framework.
The Standard tGSCopy component belongs to the Big Data and the Cloud families.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.

Source bucket name Specify the name of the bucket from which you want to
copy or move objects.

Source object key Specify the key of the object to be copied.

Source is folder Select this check box if the source object is a folder.

Target bucket name Specify the name of the bucket to which you want to copy
or move objects.

Target folder Specify the target folder to which the objects will be copied
or moved.

Action Select the action that you want to perform on objects from
the list.
• Copy: copies objects from the source bucket or folder
to the target bucket or folder.

1368
tGSCopy

• Move: moves objects from the source bucket or folder


to the target bucket or folder.

Rename Select this check box and in the New name field enter a new
name for the object to be copied or moved.
The Rename check box will not be available if you select
the Source is folder check box.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables SOURCE_BUCKET: the source bucket name. This is an After


variable and it returns a string.
SOURCE_OBJECTKEY: the key of a source object. This is an
After variable and it returns a string.
DESTINATION_BUCKETNAME: the destination bucket name.
This is an After variable and it returns a string.
DESTINATION_FOLDER: the destination folder. This is an
After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used as a standalone component.

Related scenario
For a scenario in which tGSCopy is used, see Managing files with Google Cloud Storage on page 1378.

1369
tGSDelete

tGSDelete
Deletes the objects which match the specified criteria in Google Cloud Storage so as to release the
occupied resources.

tGSDelete Standard properties


These properties are used to configure tGSDelete running in the Standard Job framework.
The Standard tGSDelete component belongs to the Big Data and the Cloud families.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.

Key prefix Specify the prefix to delete only objects whose keys begin
with the specified prefix.

Delimiter Specify the delimiter in order to delete only those objects


with key names up to the delimiter.

Specify project ID Select this check box and in the Project ID field enter the
project ID from which you want to delete objects.

Delete object from bucket list Select this check box and complete the Bucket table to
delete objects in the specified buckets.
• Bucket name: type in the name of the bucket from
which you want to delete objects.
• Key prefix: type in the prefix to delete objects whose
keys begin with the specified prefix in the specified
bucket.
• Delimiter: type in the delimiter to delete those objects
with key names up to the delimiter in the specified
bucket.

1370
tGSDelete

If you select the Delete object from bucket list check box,
the Key prefix and Delimiter fields as well as the Specify
project ID check box will not be available.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used together with the tGSList
component to check if the objects which match the
specified criteria are deleted successfully.

Related scenario
For a scenario in which tGSDelete is used, see Managing files with Google Cloud Storage on page
1378.

1371
tGSGet

tGSGet
Retrieves objects which match the specified criteria from Google Cloud Storage and outputs them to a
local directory.

tGSGet Standard properties


These properties are used to configure tGSGet running in the Standard Job framework.
The Standard tGSGet component belongs to the Big Data and the Cloud families.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.

Key prefix Specify the prefix to download only objects which keys
begin with the specified prefix.

Delimiter Specify the delimiter in order to download only those


objects with key names up to the delimiter.

Specify project ID Select this check box and in the Project ID field enter the
project ID from which you want to obtain objects.

Use keys Select this check box and complete the Keys table to define
the criteria for objects to be downloaded from Google Cloud
Storage.
• Bucket name: type in the name of the bucket from
which you want to download objects.
• Key: type in the key of the object to be downloaded.
• New name: type in a new name for the object to be
downloaded.
If you select the Use keys check box, the Key prefix and
Delimiter fields as well as the Specify project ID check box

1372
tGSGet

and the Get files from bucket list check box will not be
available.

Get files from bucket list Select this check box and complete the Bucket table to
define the criteria for objects to be downloaded from
Google Cloud Storage.
• Bucket name: type in the name of the bucket from
which you want to download objects.
• Key prefix: type in the prefix to download objects
whose keys start with the specified prefix from the
specified bucket.
• Delimiter: specify the delimiter to download those
objects with key names up to the delimiter from the
specified bucket.
If you select the Get files from bucket list check box, the
Key prefix and Delimiter fields as well as the Specify project
ID check box and the Use keys check box will not be avai
lable.

Output directory Specify the directory where you want to store the
downloaded objects.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is usually used together with other Google
Cloud Storage components, particularly tGSPut.

1373
tGSGet

Related scenarios
No scenario is available for the Standard version of this component yet.

1374
tGSList

tGSList
Retrieves a list of objects from Google Cloud Storage one by one.
tGSList iterates on a list of objects which match the specified criteria in Google Cloud Storage.

tGSList Standard properties


These properties are used to configure tGSList running in the Standard Job framework.
The Standard tGSList component belongs to the Big Data and the Cloud families.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.

Key prefix Specify the key prefix so that only the objects whose keys
begin with the specified string will be listed.

Delimiter Specify the delimiter in order to list only those objects with
key names up to the delimiter.

Specify project ID Select this check box and in the Project ID field enter the
project ID from which you want to retrieve a list of objects.

List objects in bucket list Select this check box and complete the Bucket table to
retrieve objects in the specified buckets.
• Bucket name: type in the name of the bucket from
which you want to retrieve objects.
• Key prefix: type in the prefix to list only objects whose
keys begin with the specified string in the specified
bucket.
• Delimiter: type in the delimiter to list only those
objects with key names up to the delimiter.

1375
tGSList

If you select the List objects in bucket list check box, the
Key prefix and Delimiter fields as well as the Specify project
ID check box will not be available.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables CURRENT_BUCKET: the current bucket name. This is a Flow


variable and it returns a string.
CURRENT_KEY: the current key. This is a Flow variable and
it returns a string.
NB_LINE: the number of rows read by an input component
or transferred to an output component. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule The tGSList component can be used as a standalone


component or as a start component of a process.

Related scenario
For a scenario in which tGSList is used, see Managing files with Google Cloud Storage on page 1378

1376
tGSPut

tGSPut
Uploads files from a local directory to Google Cloud Storage so that you can manage them with
Google Cloud Storage.

tGSPut Standard properties


These properties are used to configure tGSPut running in the Standard Job framework.
The Standard tGSPut component belongs to the Big Data and the Cloud families.
The component in this framework is available in all Talend products.

Basic settings

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Access Key and Secret Key Type in the authentication information obtained from
Google for making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the project
from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.
The Access Key and Secret Key fields will be available only
if you do not select the Use an existing connection check
box.

Bucket name Type in the name of the bucket into which you want to
upload files.

Local directory Type in the full path of or browse to the local directory
where the files to be uploaded are located.

Google Storage directory Type in the Google Storage directory to which you want to
upload files.

Use files list Select this check box and complete the Files table.
• Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions.
• New name: enter a new name for the file after being
uploaded.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

1377
tGSPut

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used together with other


components, particularly the tGSGet component.

Managing files with Google Cloud Storage


The scenario describes a Job which uploads files from the local directory to a bucket in Google Cloud
Storage, then performs copy, move and delete operations on those files, and finally lists and displays
the files in relevant buckets on the console.

1378
tGSPut

Prerequisites: You have purchased a Google Cloud Storage account and created three buckets under
the same Google Storage directory. In this example, the buckets created are bighouse, bed_room, and
study_room.

Dropping and linking the components


About this task
To design the Job, proceed as follows:

Procedure
1. Drop the following components from the Palatte to design the workspace: one tGSConnection
component, one tGSPut component, two tGSCopy components, one tGSDelete component, one

1379
tGSPut

tGSList component, one tIterateToFlow component, one tLogRow component and one tGSClose
component.
2. Connect tGSConnection to tGSPut using a Trigger > On Subjob Ok link.
3. Connect tGSPut to the first tGSCopy using a Trigger > On Subjob Ok link.
4. Do the same to connect the first tGSCopy to the second tGSCopy, connect the second tGSCopy to
tGSDelete, connect tGSDelete to tGSList, and connect tGSList to tGSClose.
5. Connect tGSList to tIterateToFlow using a Row > Iterate link.
6. Connect tIterateToFlow to tLogRow using a Row > Main link.

Configuring the components


Opening a connection to Google Cloud Storage

Procedure
1. Double-click the tGSConnection component to open its Basic settings view in the Component tab.

2. Navigate to the Google APIs Console in your web browser to access the Google project hosting
the Cloud Storage services you need to use.
3. Click Google Cloud Storage > Interoperable Access to open its view, and copy the access key and
secret key.
4. In the Component view of the Studio, paste the access key and secret key to the corresponding
fields respectively.

Uploading files to Google Cloud Storage

Procedure
1. Double-click the tGSPut component to open its Basic settings view in the Component tab.

2. Select the Use an existing connection check box and then select the connection you have
configured earlier.
3. In the Bucket name field, enter the name of the bucket into which you want to upload files. In this
example, bighouse.
4. In the Local directory field, browse to the directory from which the files will be uploaded, D:/Input/
House in this example.

1380
tGSPut

The files under this directory are shown below:

5. Leave other settings as they are.

Copying all files from one bucket to another bucket

Procedure
1. Double-click the first tGSCopy component to open its Basic settings view in the Component tab.

2. Select the Use an existing connection check box and then select the connection you have
configured earlier.
3. In the Source bucket name field, enter the name of the bucket from which you want to copy files,
bighouse in this example.
4. Select the Source is a folder check box. All files from the bucket bighouse will be copied.
5. In the Target bucket name field, enter the name of the bucket into which you want to copy files,
bed_room in this example.
6. Select Copy from the Action list.

Moving a file from one bucket to another bucket and renaming it

Procedure
1. Double-click the second tGSCopy component to open its Basic settings view in the Component
tab.

1381
tGSPut

2. Select the Use an existing connection check box and then select the connection you have
configured earlier.
3. In the Source bucket name field, enter the name of the bucket from which you want to move files,
bighouse in this example.
4. In the Source object key field, enter the key of the object to be moved, computer_01.txt in this
example.
5. In the Target bucket name field, enter the name of the bucket into which you want to move files,
study_room in this example.
6. Select Move from the Action list. The specified source file computer_01.txt will be moved from the
bucket bighouse to study_room.
7. Select the Rename check box. In the New name field, enter a new name for the moved file. In this
example, the new name is laptop.txt.
8. Leave other settings as they are.

Deleting a file in one bucket

Procedure
1. Double-click the tGSDelete component to open its Basic settings view in the Component tab.

2. Select the Use an existing connection check box and then select the connection you have
configured earlier.

1382
tGSPut

3. Select the Delete object from bucket list check box. Fill in the Bucket table with the file
information that you want to delete.
In this example, the file computer_03.csv will be deleted from the bucket bed_room whose files are
copied from the bucket bighouse.

Listing all files in the three buckets

Procedure
1. Double-click the tGSList component to open its Basic settings view in the Component tab.

2. Select the Use an existing connection check box and then select the connection you have
configured earlier.
3. Select the List objects in bucket list check box. In the Bucket table, enter the name of the three
buckets in the Bucket name column, bighouse, study_room, and bed_room.
4. Double-click the tIterateToFlow component to open its Basic settings view in the Component tab.

5. Click Edit schema to define the data to pass on to tLogRow.


In this example, add two columns bucketName and key, and set their types to Object.

1383
tGSPut

6. The Mapping table will be populated with the defined columns automatically.
In the Value column, enter globalMap.get("tGSList_2_CURRENT_BUCKET") for the bucketName
column and globalMap.get("tGSList_2_CURRENT_KEY") for the key column. You can also press Ctrl +
Space and then choose the appopriate variable.
7. Double-click the tLogRow component to open its Basic settings view in the Component tab.
8. Select Table (print values in cells of a table) for a better view of the results.

Closing the connection to Google Cloud Storage

Procedure
1. Double-click the tGSClose component to open its Basic settings view in the Component tab.
2. Select the connection you want to close from the Component List.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Execute the Job by pressing F6 or clicking Run on the Run tab.

1384
tGSPut

The files in the three buckets are displayed. As expected, at first, the files from the bucket
bighouse are copied to the bucket bed_room, then the file computer_01.txt from the bucket
bighouse is moved to the bucket study_room and renamed to be laptop.txt, finally the file
computer_03.csv is deleted from the bucket bed_room.

1385
tHashInput

tHashInput
Reads from the cache memory data loaded by tHashOutput to offer high-speed data feed, facilitating
transactions involving a large amount of data.
The components of the Technical family are normally hidden from the Palette by default. For more
information about how to show them on the Palette, see Talend Studio User Guide.

tHashInput Standard properties


These properties are used to configure tHashInput running in the Standard Job framework.
The Standard tHashInput component belongs to the Technical family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see the Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see the
Talend Studio User Guide.

Link with a tHashOutput Select this check box to connect to a tHashOutput


component. It is always selected by default.

Component list Drop-down list of available tHashOutput components.

Clear cache after reading Select this check box to clear the cache after reading the
data loaded by a certain tHashOutput component. This way,
the following tHashInput components, if any, will not be
able to read the cached data loaded by that tHashOutput
component.

1386
tHashInput

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is used along with tHashOutput. It reads


from the cache memory data loaded by tHashOutput.
Together, these twin components offer high-speed data
access to facilitate transactions involving a massive amount
of data.

Reading data from the cache memory for high-speed data


access
The following Job reads from the cache memory a huge amount of data loaded by two tHashOutput
components and pass it to a tFileOutputDelimited. The goal of this scenario is to show the speed
at which mass data is read and written. In practice, data feed generated in this way can be used as
lookup table input for some use cases where a big amount of data needs to be referenced.

Dropping and linking the components


Procedure
1. Drag and drop the following components from the Palette to the workspace: tFixedFlowInput (X2),
tHashOutput (X2), tHashInput and tFileOutputDelimited.
2. Connect the first tFixedFlowInput to the first tHashOutput using a Row > Main link.
3. Connect the second tFixedFlowInput to the second tHashOutput using a Row > Main link.
4. Connect the first subJob (from tFixedFlowInput_1) to the second subJob (to tFixedFlowInput_2)
using an OnSubjobOk link.
5. Connect tHashInput to tFileOutputDelimited using a Row > Main link.

1387
tHashInput

6. Connect the second subJob to the last subJob using an OnSubjobOk link.

Configuring the components


Configuring data inputs and hash cache

Procedure
1. Double-click the first tFixedFlowInput component to display its Basic settings view.

2. Select Built-In from the Schema drop-down list.

Note:
You can select Repository from the Schema drop-down list to fill in the relevant fields
automatically if the relevant metadata has been stored in the Repository. For more information
about Metadata, see the Talend Studio User Guide.

1388
tHashInput

3. Click Edit schema to define the data structure of the input flow. In this case, the input has two
columns: ID and ID_Insurance, and then click OK to close the dialog box.

4. Fill in the Number of rows field to specify the entries to output, e.g. 50000.
5. Select the Use Single Table check box. In the Values table and in the Value column, assign values
to the columns, e.g. 1 for ID and 3 for ID_Insurance.
6. Perform the same operations for the second tFixedFlowInput component, with the only difference
in the values. That is, 2 for ID and 4 for ID_Insurance in this case.
7. Double-click the first tHashOutput to display its Basic settings view.

8. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema
from the previous component. Select Keep all from the Keys management drop-down list and
keep the Append check box selected.
9. Perform the same operations for the second tHashOutput component, and select the Link with a
tHashOutput check box.

Configuring data retrieval from hash cache and data output

Procedure
1. Double-click tHashInput to display its Basic settings view.

1389
tHashInput

2. Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure,
which is the same as that of tHashOutput.
3. Select tHashOutput_1 from the Component list drop down list.
4. Double-click tFileOutputDelimited to display its Basic settings view.

5. Select Built-In from the Property Type drop-down list. In the File Name field, enter the full path
and name of the file, e.g. "E:/Allr70207V5.0/Talend-All-r70207-V5.0.0NB/workspace/out.csv".
6. Select the Include Header check box and click Sync columns to retrieve the schema from the
previous component.

Saving and executing the Job


Procedure
1. Press Ctrl+S to save the Job.
2. Press F6, or click Run on the Run tab to execute the Job.

Results

You can find that mass entries are written and read very rapidly.

1390
tHashInput

Clearing the memory before loading data to it in case an


iterator exists in the same subJob
In this scenario, the usage of the Append option of tHashOutput is demonstrated as it helps remove
repetitive or unwanted data in case an iterator exists in the same subJob as tHashOutput.
To build the Job, do the following:

Dropping and linking the components


Procedure
1. Drag and drop the following components from the Palette to the workspace: tLoop,
tFixedFlowInput, tHashOutput, tHashInput and tLogRow.
2. Connect tLoop to tFixedFlowInput using a Row > Iterate link.
3. Connect tFixedFlowInput to tHashOutput using a Row > Main link.
4. Connect tHashInput to tLogRow using a Row > Main link.
5. Connect tLoop to tHashInput using an OnSubjobOk link.

Configuring the components


Configuring data input and hash cache

Procedure
1. Double-click the tLoop component to display its Basic settings view.

2. Select For as the loop type. Type in 1, 2 1 in the From, To and Step fields respectively. Keep the
Values are increasing check box selected.
3. Double-click the tFixedFlowInput component to display its Basic settings view.

1391
tHashInput

4. Select Built-In from the Schema drop-down list.

Note:
You can select Repository from the Schema drop-down list to fill in the relevant fields
automatically if the relevant metadata has been stored in the Repository. For more information
about Metadata, see the Talend Studio User Guide.

5. Click Edit schema to define the data structure of the input flow. In this case, the input has one
column: Name.

6. Click OK to close the dialog box.


7. Fill in the Number of rows field to specify the entries to output, for example 1.
8. Select the Use Single Table check box. In the Values table, assign a value to the Name field, e.g.
Marx.
9. Double-click tHashOutput to display its Basic settings view.

1392
tHashInput

10. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema
from the previous component. Select Keep all from the Keys management drop-down list and
deselect the Append check box.

Configuring data retrieval from hash cache and data output

Procedure
1. Double-click tHashInput to display its Basic settings view.

2. Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure,
which is the same as that of tHashOutput.
3. Select tHashOutput_2 from the Component list drop-down list.
4. Double-click tLogRow to display its Basic settings view.

5. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema
from the previous component. In the Mode area, select Table (print values in cells of a table).

Saving and executing the Job


Procedure
1. Press Ctrl+S to save the Job.
2. Press F6, or click Run on the Run tab to execute the Job.
You can find that only one row was output although two rows were generated by tFixedFlowInpu
t.

1393
tHashInput

1394
tHashOutput

tHashOutput
Loads data to the cache memory to offer high-speed access, facilitating transactions involving a large
amount of data.
It should be noted that loading data will consume a lot of memory to store records for each record
has an overhead. The number of inputted entries also impacts the usage of memory.
The components of the Technical family are normally hidden from the Palette by default. For more
information about how to show them on the Palette, see Talend Studio User Guide.

tHashOutput Standard properties


These properties are used to configure tHashOutput running in the Standard Job framework.
The Standard tHashOutput component belongs to the Technical family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either built-in or remotely stored
in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see the Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see the
Talend Studio User Guide.

Link with a tHashOutput Select this check box to connect to a tHashOutput


component.

1395
tHashOutput

Note:
If multiple tHashOutput components are linked in this
way, the data loaded to the cache by all of them can be
read by a tHashInput component that is linked with any
of them.

Component list Drop-down list of available tHashOutput components.

Data write model Drop-down list of available data write modes.

Keys management Drop-down list of available keys management modes.


• Keep all: writes all the data received to the cache
memory.
• Keep first: writes only the first record to the cache
memory if multiple records received have the same key
value.

Append Selected by default, this option is designed to append data


to the memory in case an iterator exists in the same subJob.
If it is unchecked, tHashOutput will clear the memory before
loading data to it.

Note:
If Link with a tHashOutput is selected, this check box will
be hidden but is always enabled.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
NB_LINE: the number of rows processed. This is an After
variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component writes data to the cache memory and
is closely related to tHashInput. Together, these twin

1396
tHashOutput

components offer high-speed data access to facilitate


transactions involving a massive amount of data.

Related scenarios
For related scenarios, see:
• Reading data from the cache memory for high-speed data access on page 1387.
• Clearing the memory before loading data to it in case an iterator exists in the same subJob on
page 1391.

1397
tHBaseClose

tHBaseClose
Closes an HBase connection you have established in your Job.
tHBaseClose closes an active connection to an HBase database.

tHBaseClose Standard properties


These properties are used to configure tHBaseClose running in the Standard Job framework.
The Standard tHBaseClose component belongs to the Big Data and the Databases NoSQL families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Component list Select the tHBaseConnection component in the list if more


than one connection are planned for the current Job.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is to be used along with HBase


components, especially with tHBaseConnection.

Prerequisites Before starting, ensure that you have met the Loopback IP
prerequisites expected by your database.
The Hadoop distribution must be properly installed, so as to
guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.

1398
tHBaseClose

• Ensure that you have installed the MapR client in the


machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenario
For a scenario in which tHBaseClose is used, see Exchanging customer data with HBase on page
1411.

1399
tHBaseConnection

tHBaseConnection
Establishes an HBase connection to be reused by other HBase components in your Job.
tHBaseConnection opens a connection to an HBase database.

tHBaseConnection Standard properties


These properties are used to configure tHBaseConnection running in the Standard Job framework.
The Standard tHBaseConnection component belongs to the Big Data and the Databases NoSQL
families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


- Built-in : no property data stored centrally.
- Repository : select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your

1400
tHBaseConnection

connection accordingly. However, because of


the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

HBase version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Hadoop version of the distribution This list is displayed only when you have selected Custom
from the distribution list to connect to a cluster not yet
officially supported by the Studio. In this situation, you need
to select the Hadoop version of this custom cluster, that is
to say, Hadoop 1 or Hadoop 2.

Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.

Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.

Inspect the classpath for configurations Select this check box to allow the component to check the
configuration files in the directory you have set with the
$HADOOP_CONF_DIR variable and directly read parameters
from these files in this directory. This feature allows you to
easily change the Hadoop configuration for the component
to switch between different environments, for example,
from a test environment to a production environment.
In this situation, the fields or options used to configure
Hadoop connection and/or Kerberos security are hidden.

1401
tHBaseConnection

If you want to use certain parameters such as the Kerberos


parameters but these parameters are not included in these
Hadoop configuration files, you need to create a file called
talend-site.xml and put this file into the same directory
defined with $HADOOP_CONF_DIR. This talend-site.xml file
should read as follows:

<!-- Put site-specific property


overrides in this file. -->
<configuration>
<property>
<name>talend.k
erberos.authentication </name>
<value>kinit </value>
<description> Set the
Kerberos authentication method to
use. Valid values are: kinit or
keytab. </description>
</property>
<property>
<name>talend.k
erberos.keytab.principal </name>
<value>[email protected] </
value>
<description> Set the
keytab's principal name. </
description>
</property>
<property>
<name>talend.k
erberos.keytab.path </name>
<value>/kdc/user.keytab </
value>
<description> Set the
keytab's path. </description>
</property>
<property>
<name>talend.encryption </
name>
<value>none </value>
<description> Set the
encryption method to use. Valid
values are: none or ssl. </
description>
</property>
<property>
<name>talend.s
sl.trustStore.path </name>
<value>ssl </value>
<description> Set
SSL trust store path. </
description>
</property>
<property>
<name>talend.s
sl.trustStore.password </name>
<value>ssl </value>
<description> Set SSL
trust store password. </
description>
</property>
</configuration>

1402
tHBaseConnection

The parameters read from these configuration files override


the default ones used by the Studio. When a parameter
does not exist in these configuration files, the default one is
used.

Use kerberos authentication If the database to be used is running with Kerberos security,
select this check box, then, enter the principal names in the
displayed fields. You should be able to find the information
in the hbase-site.xml file of the cluster to be used.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
If you need to use a Kerberos keytab file to log in, select
Use a keytab to authenticate. A keytab file contains pairs
of Kerberos principals and encrypted keys. You need to
enter the principal to be used in the Principal field and the
access path to the keytab file itself in the Keytab field. This
keytab file must be stored in the machine in which your Job
actually runs, for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

Advanced settings

Properties If you need to use custom configuration for your HBase,


complete this table with the property or properties to be
customized. Then at runtime, the customized property or
properties will override those corresponding ones defined
earlier for your HBase.
For example, you need to define the value of the
dfs.replication property as 1 for the HBase configuration.
Then you need to add one row to this table using the plus
button and type in the name and the value of this property
in this row.

tStatCatcher Statistics Select this check box to collect the log data at a
component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the

1403
tHBaseConnection

Die on error check box is cleared, if the component has this


check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is generally used with other HBase


components, particularly tHBaseClose.

Prerequisites Before starting, ensure that you have met the Loopback IP
prerequisites expected by your database.
The Hadoop distribution must be properly installed, so as to
guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenario
For a scenario in which tHBaseConnection is used, see Exchanging customer data with HBase on page
1411.

1404
tHBaseInput

tHBaseInput
Reads data from a given HBase database and extracts columns of selection.
HBase is a distributed, column-oriented database that hosts very large, sparsely populated tables on
clusters.
tHBaseInput extracts columns corresponding to schema definition. Then it passes these columns to
the next component via a Main row link.

HBase filters
This table presents the HBase filters available in Talend Studio and the parameters required by those
filters.

Filter type Filter column Filter Filter Filter Filter Objective


family operation value comparator
type

Single Column Yes Yes Yes Yes Yes It compares the values of a given
Value Filter column against the value defined
for the Filter value parameter.
If the filtering condition is met,
all columns of the row will be ret
urned.

Family filter Yes Yes Yes It returns the columns of the


family that meets the filtering
condition.

Qualifier filter Yes Yes Yes It returns the columns whose


column qualifiers match the
filtering condition.

Column prefix Yes Yes It returns all columns of which the


filter qualifiers have the prefix defined
for the Filter column parameter.

Multiple Yes (Multiple Yes It works the same way as a


column prefix prefixes are Column prefix filter does but
filter separated allows specifying multiple
by comma, prefixes.
for example,
id,id_1,id_2)

Column range Yes (The ends Yes It allows intra row scanning and
filter of a range are returns all matching columns of a
separated by scanned row.
comma. )

Row filter Yes Yes Yes It filters on row keys and returns
all rows that matches the filtering
condition.

Value filter Yes Yes Yes It returns only columns that have
a specific value.

1405
tHBaseInput

The use explained above of the listed HBase filters is subject to revisions made by Apache in its
Apache HBase project; therefore, in order to fully understand how to use these HBase filters, we
recommend reading Apache's HBase documentation.

tHBaseInput Standard properties


These properties are used to configure tHBaseInput running in the Standard Job framework.
The Standard tHBaseInput component belongs to the Big Data and the Databases NoSQL families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file

1406
tHBaseInput

should contain the libraries of the different Hadoop


elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

HBase version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Hadoop version of the distribution This list is displayed only when you have selected Custom
from the distribution list to connect to a cluster not yet
officially supported by the Studio. In this situation, you need
to select the Hadoop version of this custom cluster, that is
to say, Hadoop 1 or Hadoop 2.

Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.

Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.

Use kerberos authentication If the database to be used is running with Kerberos security,
select this check box, then, enter the principal names in the
displayed fields. You should be able to find the information
in the hbase-site.xml file of the cluster to be used.

1407
tHBaseInput

• If this cluster is a MapR cluster of the version 5.0.0


or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
If you need to use a Kerberos keytab file to log in, select
Use a keytab to authenticate. A keytab file contains pairs
of Kerberos principals and encrypted keys. You need to
enter the principal to be used in the Principal field and the
access path to the keytab file itself in the Keytab field. This
keytab file must be stored in the machine in which your Job
actually runs, for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Set table Namespace mappings Enter the string to be used to construct the mapping
between an Apache HBase table and a MapR table.
For the valid syntax you can use, see http://doc.mapr.com/
display/MapR40x/Mapping+Table+Namespace+Between
+Apache+HBase+Tables+and+MapR+Tables.

1408
tHBaseInput

Table name Type in the name of the table from which you need to
extract columns.

Define a row selection Select this check box and then in the Start row and the
End row fields, enter the corresponding row keys to specify
the range of the rows you want the current component to
extract.
Different from the filters you can set using Is by filter
requiring the loading of all records before filtering the ones
to be used, this feature allows you to directly select only the
rows to be used.

Mapping Complete this table to map the columns of the table to be


used with the schema columns you have defined for the
data flow to be processed.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Properties If you need to use custom configuration for your database,


complete this table with the property or properties to be
customized. Then at runtime, the customized property or
properties will override the corresponding ones used by the
Studio.
For example, you need to define the value of the
dfs.replication property as 1 for the database configuration.
Then you need to add one row to this table using the plus
button and type in the name and the value of this property
in this row.

Note:
This table is not available when you are using an
existing connection by selecting the Using an existing
connection check box in the Basic settings view.

Is by filter Select this check box to use filters to perform fine-grained


data selection from your database, such as selection of keys,
or values, based on regular expressions.
Once selecting it, the Filter table that is used to define
filtering conditions becomes available.
This feature leverages filters provided by HBase and subject
to constraints explained in Apache HBase documentation.
Therefore, advanced knowledge of HBase is required to
make full use of these filters.

Logical operation Select the operator you need to use to define the logical
relation between filters. This available operators are:

1409
tHBaseInput

• And: every defined filtering conditions must


be satisfied. It represents the relationship
FilterList.Operator.MUST_PASS_ALL
• Or: at least one of the defined filtering conditions
must be satisfied. It represents the relationship:
FilterList.Operator.MUST_PASS_ONE

Filter Click the button under this table to add as many rows as
required, each row representing a filter. The parameters you
may need to set for a filter are:
• Filter type: the drop-down list presents pre-existing
filter types that are already defined by HBase. Select
the type of the filter you need to use.
• Filter column: enter the column qualifier on which you
need to apply the active filter. This parameter becomes
mandatory depending on the type of the filter and
of the comparator you are using. For example, it is
not used by the Row Filter type but is required by the
Single Column Value Filter type.
• Filter family: enter the column family on which you
need to apply the active filter. This parameter becomes
mandatory depending on the type of the filter and
of the comparator you are using. For example, it is
not used by the Row Filter type but is required by the
Single Column Value Filter type.
• Filter operation: select from the drop-down list the
operation to be used for the active filter.
• Filter Value: enter the value on which you want to use
the operator selected from the Filter operation drop-
down list.
• Filter comparator type: select the type of the
comparator to be combined with the filter you are
using.
Depending on the Filter type you are using, some or each of
the parameters become mandatory. For further information,
see HBase filters on page 1405

Retrieve timestamps Select this check box to load the timestamps of an HBase
column into the data flow.
• Retrieve from an HBase column: select the HBase
column which is tracked for changes in order to
retrieve its corresponding timestamps.
• Write to a schema column: select the column you
have defined in the schema to store the retrieved
timestamps.
The type of this column must be Long.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

1410
tHBaseInput

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is a start component of a Job and always


needs an output link.

Prerequisites Before starting, ensure that you have met the Loopback IP
prerequisites expected by your database.
The Hadoop distribution must be properly installed, so as to
guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Exchanging customer data with HBase


This scenario applies only to Talend products with Big Data.
In this scenario, a six-component Job is used to exchange customer data with a given HBase.

1411
tHBaseInput

The six components are:


• tHBaseConnection: creates a connection to your HBase database.
• tFixedFlowInput: creates the data to be written into your HBase. In the real use case, this
component could be replaced by the other input components like tFileInputDelimited.
• tHBaseOutput: writes the data it receives from the preceding component into your HBase.
• tHBaseInput: extracts the columns of interest from your HBase.
• tLogRow: presents the execution result.
• tHBaseClose: closes the transaction.
To replicate this scenario, proceed as the following sections illustrate.

Note:
Before starting the replication, your Hbase and Zookeeper service should have been correctly
installed and well configured. This scenario explains only how to use Talend solution to make data
transaction with a given HBase.

Dropping and linking the components


About this task
To do this, proceed as follows:

Procedure
1. Drop tHBaseConnection, tFixedFlowInput, tHBaseOutput, tHBaseInput, tLogRow and tHBaseClose
from Palette onto the Design workspace.
2. Right-click tHBaseConnection to open its contextual menu and select the Trigger > On Subjob Ok
link from this menu to connect this component to tFixedFlowInput.

1412
tHBaseInput

3. Do the same to create the OnSubjobOk link from tFixedFlowInput to tHBaseInput and then to
tHBaseClose.
4. Right-click tFixedFlowInput and select the Row > Main link to connect this component to
tHBaseOutput.
5. Do the same to create the Main link from tHBaseInput to tLogrow.

Results
The components to be used in this scenario are all placed and linked. Then you need continue to
configure them sucessively.

Configuring the connection


About this task
To configure the connection to your Zookeeper service and thus to the HBase of interest, proceed as
follows:

Procedure
1. On the Design workspace of your Studio, double-click the tHBaseConnection component to open
its Component view.

2. Select Hortonworks Data Platform 1.0 from the HBase version list.
3. In the Zookeeper quorum field, type in the name or the URL of the Zookeeper service you are
using. In this example, the name of the service in use is hbase.
4. In the Zookeeper client port field, type in the number of client listening port. In this example, it is
2181.
5. If the Zookeeper znode parent location has been defined in the Hadoop cluster you are
connecting to, you need to select the Set zookeeper znode parent check box and enter the value
of this property in the field that is displayed.

Configuring the process of writing data into the HBase


About this task
To do this, proceed as follows:

1413
tHBaseInput

Procedure
1. On the Design workspace, double-click the tFixedFlowInput component to open its Component
view.

2. In this view, click the three-dot button next to Edit schema to open the schema editor.

3. Click the plus button three times to add three rows and in the Column column, rename the three
rows respectively as: id, name and age.
4. In the Type column, click each of these rows and from the drop-down list, select the data type of
every row. In this scenario, they are Integer for id and age, String for name.
5. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog
box.
6. In the Mode area, select the Use Inline Content (delimited file) to display the fields for editing.

1414
tHBaseInput

7. In the Content field, type in the delimited data to be written into the HBase, separated with the
semicolon ";". In this example, they are:

1;Albert;23
2;Alexandre;24
3;Alfred-Hubert;22
4;Andre;40
5;Didier;28
6;Anthony;35
7;Artus;32
8;Catherine;34
9;Charles;21
10;Christophe;36
11;Christian;67
12;Danniel;54
13;Elisabeth;58
14;Emile;32
15;Gregory;30

8. Double-click tHBaseOutput to open its Component view.

Note: If this component does not have the same schema of the preceding component, a
warning icon appears. In this case, click the Sync columns button to retrieve the schema from
the preceding one and once done, the warning icon disappears.

9. Select the Use an existing connection check box and then select the connection you have
configured earlier. In this example, it is tHBaseConnection_1.
10. In the Table name field, type in the name of the table to be created in the HBase. In this example,
it is customer.
11. In the Action on table field, select the action of interest from the drop-down list. In this scenario,
select Drop table if exists and create. This way, if a table named customer exists already in the
HBase, it will be disabled and deleted before creating this current table.
12. Click the Advanced settings tab to open the corresponding view.

1415
tHBaseInput

13. In the Family parameters table, add two rows by clicking the plus button, rename them as family1
and family2 respectively and then leave the other columns empty. These two column families will
be created in the HBase using the default family performance options.

Note: The Family parameters table is available only when the action you have selected in the
Action on table field is to create a table in HBase. For further information about this Family
parameters table, see tHBaseOutput on page 1419.

14. In the Families table of the Basic settings view, enter the family names in the Family name
column, each corresponding to the column this family contains. In this example, the id and the
age columns belong to family1 and the name column to family2.

Note: These column families should already exist in the HBase to be connected to; if not, you
need to define them in the Family parameters table of the Advanced settings view for creating
them at runtime.

Configuring the process of extracting data from the HBase


About this task
To do this, perform the following operations:

Procedure
1. Double-click tHBaseInput to open its Component view.

1416
tHBaseInput

2. Select the Use an existing connection check box and then select the connection you have
configured earlier. In this example, it is tHBaseConnection_1.
3. Click the three-dot button next to Edit schema to open the schema editor.

4. Click the plus button three times to add three rows and rename them as id, name and age
respectively in the Column column. This means that you extract these three columns from the
HBase.
5. Select the types for each of the three columns. In this example, Integer for id and age, String for
name.
6. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog
box.
7. In the Table name field, type in the table from which you extract the columns of interest. In this
scenario, the table is customer.
8. In the Mapping table, the Column column has been already filled automatically since the schema
was defined, so simply enter the name of every family in the Column family column, each
corresponding to the column it contains.
9. Double-click tHBaseClose to open its Component view.

1417
tHBaseInput

10. In the Component List field, select the connection you need to close. In this example, this
connection is tHBaseConnection_1.

Executing the Job


To execute this Job, press F6.
Once done, the Run view is opened automatically, where you can check the execution result.

These columns of interest are extracted and you can process them according to your needs.
Login to your HBase database, you can check the customer table this Job has created.

1418
tHBaseOutput

tHBaseOutput
Writes columns of data into a given HBase database.
tHBaseOutput receives data from its preceding component, creates a table in a given HBase database
and writes the received data into this HBase table.

tHBaseOutput Standard properties


These properties are used to configure tHBaseOutput running in the Standard Job framework.
The Standard tHBaseOutput component belongs to the Big Data and the Databases NoSQL families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add

1419
tHBaseOutput

other required jar files which the base distribution does


not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

HBase version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Hadoop version of the distribution This list is displayed only when you have selected Custom
from the distribution list to connect to a cluster not yet
officially supported by the Studio. In this situation, you need
to select the Hadoop version of this custom cluster, that is
to say, Hadoop 1 or Hadoop 2.

Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.

Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.

1420
tHBaseOutput

Use kerberos authentication If the database to be used is running with Kerberos security,
select this check box, then, enter the principal names in the
displayed fields. You should be able to find the information
in the hbase-site.xml file of the cluster to be used.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
If you need to use a Kerberos keytab file to log in, select
Use a keytab to authenticate. A keytab file contains pairs
of Kerberos principals and encrypted keys. You need to
enter the principal to be used in the Principal field and the
access path to the keytab file itself in the Keytab field. This
keytab file must be stored in the machine in which your Job
actually runs, for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are

1421
tHBaseOutput

not enclosed within quotation marks. If they are, you must


remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Set table Namespace mappings Enter the string to be used to construct the mapping
between an Apache HBase table and a MapR table.
For the valid syntax you can use, see http://doc.mapr.com/
display/MapR40x/Mapping+Table+Namespace+Between
+Apache+HBase+Tables+and+MapR+Tables.

Table name Type in the name of the HBase table you need create.

Action on table Select the action you need to take for creating an HBase
table.

Custom Row Key Select this check box to use the customized row keys. Once
selected, the corresponding field appears. Then type in the
user-defined row key to index the rows of the HBase table
being created.
For example, you can type in "France"+Numer
ic.sequence("s1",1,1) to produce the row key series:
France1, France2, France3 and so on.

Families Complete this table to map the columns of the table to be


used with the schema columns you have defined for the
data flow to be processed.
The Column column of this table is automatically filled
once you have defined the schema; in the Family name
column, enter the column families you want to create
or use to group the columns in the Column column. For
further information about a column family, see Apache
documentation at Column families.

Custom timestamp column Select a Long column from your schema to provide
timestamps for the HBase columns to be created or updated
by tHBaseOutput.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

Use batch mode Select this check box to activate the batch mode for data
processing.

Batch size Specify the number of records to be processed in each


batch.
This field appears only when the Use batch mode check box
is selected.

Properties If you need to use custom configuration for your database,


complete this table with the property or properties to be
customized. Then at runtime, the customized property or
properties will override the corresponding ones used by the
Studio.

1422
tHBaseOutput

For example, you need to define the value of the


dfs.replication property as 1 for the database configuration.
Then you need to add one row to this table using the plus
button and type in the name and the value of this property
in this row.

Note:
This table is not available when you are using an
existing connection by selecting the Using an existing
connection check box in the Basic settings view.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Family parameters Type in the names and, when needs be, the custom
performance options of the column families to be created.
These options are all attributes defined by the HBase data
model, so for further explanation about these options, see
Apache's HBase documentation.

Note: The parameter Compression type allows you to


select the format for output data compression.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is normally an end component of a Job and


always needs an input link.

Prerequisites Before starting, ensure that you have met the Loopback IP
prerequisites expected by your database.
The Hadoop distribution must be properly installed, so as to
guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.

1423
tHBaseOutput

According to MapR's documentation, the library or


libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenario
For related scenario to the Standard version of tHBaseOutput, see Exchanging customer data with
HBase on page 1411.

1424
tHCatalogInput

tHCatalogInput
Reads data from an HCatalog managed Hive database and send data to the component that follows.
The tHCatalogInput component reads data from the specified HCatalog managed database and sends
data in the data flow to the console or to a specified local file by connecting this component to a
proper component.

tHCatalogInput Standard properties


These properties are used to configure tHCatalogInput running in the Standard Job framework.
The Standard tHCatalogInput component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
designs. Related topic: see Talend Studio User Guide.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:

1425
tHCatalogInput

• If available in this Distribution drop-down list, the


Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

HCatalog version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

1426
tHCatalogInput

Templeton hostname Fill this field with the URL of Templeton Webservice.

Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.

Templeton port Fill this field with the port of URL of Templeton Webservice.
By default, this value is 50111.

Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the

1427
tHCatalogInput

principal to be used is guest; in this situation, ensure that


user1 has the right to read the keytab file to be used.

Database The database in which the HCatalog managed tables are


placed. By default, this database is the Hive one named
default.

Table Fill this field to operate on one or multiple tables in the


specified database.

Partition Fill this field to specify one or more partitions for the
partition operation on a specified table. When you specify
multiple partitions, use commas to separate every two
partitions and use double quotation marks to quote the
partition string.
If you are reading a non-partitioned table, leave this field
empty.

Note:
For further information about Partition, see https://cwiki.
apache.org/Hive/.

Username Fill this field with the username for the Hive database
authentication.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

Row separator The separator used to identify the end of a row.

Field separator Enter character, string or regular expression to separate


fields for the transferred data.

Custom encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the

1428
tHCatalogInput

documentation you want. For demonstration purposes, the


links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

Retrieve the HCatalog logs Select this check box to retrieve log files generated during
HCatalog operations.

Standard Output Folder Fill this field with the path to which log files are stored.

Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.

Error Output Folder Fill this field with the path to which error log files are
stored.

Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is commonly used as the starting


component in a Job.
HCatalog is built on top of the Hive metastore to provide
read and write interface for Pig and MapReduce, so that the
latter systems can use the metadata of Hive to easily read
and write data in HDFS.

1429
tHCatalogInput

For further information, see Apache documentation about


HCatalog: https://cwiki.apache.org/confluence/display/Hive/
HCatalog.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitation When Use kerberos authentication is selected, the


component cannot work with IBM JVM.

Related scenario
For a related scenario, see Managing HCatalog tables on Hortonworks Data Platform on page 1444.

1430
tHCatalogLoad

tHCatalogLoad
Reads data directly from HDFS and writes this data into an established HCatalog managed table.

tHCatalogLoad Standard properties


These properties are used to configure tHCatalogLoad running in the Standard Job framework.
The Standard tHCatalogLoad component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the

1431
tHCatalogLoad

configuration zip corresponding to your distribution


from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

HCatalog version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Templeton hostname Fill this field with the URL of Templeton Webservice.

Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.

Templeton port Fill this field with the port of URL of Templeton Webservice.
By default, this value is 50111.

Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.

1432
tHCatalogLoad

• If this cluster is a MapR cluster of the version 5.0.0


or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

Database Enter the name of the database you need to write data in.
This database must already exist.

Table Enter the name of the table you need to write data in. This
table must already exist.

Partition Fill this field to specify one or more partitions for the pa
rtition operation on the specified table. When you specify
multiple partitions, use commas to separate every two
partitions and use double quotation marks to quote the
partition string.
If you are reading a non-partitioned table, leave this field
empty.

Note:
For further information about Partition, see https://cwiki.
apache.org/Hive/.

Username Fill this field with the username for the DB authentication.

File location Enter the absolute path pointing to the HDFS location from
which data is read.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

1433
tHCatalogLoad

Advanced settings

Retrieve the HCatalog logs Select this check box to retrieve log files generated during
HCatalog operations.

Standard Output Folder Fill this field with the path to which log files are stored.

Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.

Error Output Folder Fill this field with the path to which error log files are
stored.

Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used in a single-component subJob.


HCatalog is built on top of the Hive metastore to provide
read and write interface for Pig and MapReduce, so that the
latter systems can use the metadata of Hive to easily read
and write data in HDFS.
For further information, see Apache documentation about
HCatalog: https://cwiki.apache.org/confluence/display/Hive/
HCatalog.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.

1434
tHCatalogLoad

According to MapR's documentation, the library or


libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitation When Use kerberos authentication is selected, the


component cannot work with IBM JVM.

Related scenario
For a related scenario, see Managing HCatalog tables on Hortonworks Data Platform on page 1444.

1435
tHCatalogOperation

tHCatalogOperation
Prepares the HCatalog managed database/table/partition to be processed.
tHCatalogOperation manages the data stored in HCatalog managed Hive database/table/partition.

tHCatalogOperation Standard properties


These properties are used to configure tHCatalogOperation running in the Standard Job framework.
The Standard tHCatalogOperation component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of

1436
tHCatalogOperation

the ongoing evolution of the different Hadoop-


related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

HCatalog version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Templeton hostname Fill this field with the URL of Templeton Webservice.

Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.

Templeton port Fill this field with the port of URL of Templeton Webservice.
By default, the value for this field is 50111.

Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field

1437
tHCatalogOperation

displayed. This enables you to use your user name to


authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

Operation on Select an object from the list for the DB operation as


follows:
Database: The HCatalog managed database in HDFS.
Table: The HCatalog managed table in HDFS.
Partition: The partition specified by the user.

Operation Select an action from the list for the DB operation. For
further information about the DB operation in HDFS, see
https://cwiki.apache.org/Hive/.

Create the table only it doesn't exist already Select this check box to avoid creating duplicate table when
you create a table.

Note:
This check box is enabled only when you have selected
Table from the Operation on list.

Database Fill this field with the name of the database in which the
HCatalog managed tables are placed.

Table Fill this field to operate on one or multiple tables in a


database or on a specified HDFS location.

1438
tHCatalogOperation

Note:
This field is enabled only when you have selected Table
from the Operation on list. For further information about
the operation on Table, see https://cwiki.apache.org/Hiv
e/.

Partition Fill this field to specify one or more partitions for the
partition operation on a specified table. When you specify
multiple partitions, use comma to separate every two
partitions and use double quotation marks to quote the
partition string.
If you are reading a non-partitioned table, leave this field
empty.

Note:
This field is enabled only when you select Partition from
the Operation on list. For further information about the
operation on Partition, see https://cwiki.apache.org/Hiv
e/.

Username Fill this field with the username for the DB authentication.

Database location Fill this field with the location of the database file in HDFS.

Note:
This field is enabled only when you select Database from
the Operation on list.

Database description The description for the database to be created.

Note:
This field is enabled only when you select Database from
the Operation on list.

Create an external table Select this field to create an external table in an alternative
path defined in the Set HDFS location field in the Advanced
settings view. For further information about creating
external table, see https://cwiki.apache.org/Hive/.

Note:
This check box is enabled only when you select Table
from the Operation on list and Create/Drop and create/
Drop if exist and create from the Operation list.

Format Select a file format from the list to specify the format of the
external table you want to create:
TEXTFILE: Plain text files.
RCFILE: Record Columnar files. For further information
about RCFILE, see https://cwiki.apache.org/confluence/
display/Hive/RCFile.

1439
tHCatalogOperation

Note:
RCFILE is only available starting with Hive 0.6.0. This
list is enabled only when you select Table from the
Operation on list and Create/Drop and create/Drop if
exist and create from the Operation list.

Set partitions Select this check box to set the partition schema by clicking
the Edit schema to the right of Set partitions check box.
The partition schema is either built-in or remote in the
Repository.

Note:
This check box is enabled only when you select
Table from the Operation on list and Create/Drop and
create/Drop if exist and create from the Operation list.
You must follow the rules of using partition schema in
HCatalog managed tables. For more information about
the rules in using partition schema, see https://cwiki.
apache.org/confluence/display/Hive/HCatalog.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
designs. Related topic: see Talend Studio User Guide.

Set the user group to use Select this check box to specify the user group.

Note:
This check box is enabled only when you select
Drop/Drop if exist/Drop and create/Drop if exist and
create from the Operation list. By default, the value for
this field is root. For more information about the user
group in the server, contact your system administrator.

Option Select a clause when you drop a database.

Note:
This list is enabled only when you select Database from
the Operation on list and Drop/Drop if exist/Drop and
create/Drop if exist and create from the Operation list.
For more information about Drop operation on database,
see https://cwiki.apache.org/Hive/.

Set the permissions to use Select this check box to specify the permissions needed by
the operation you select from the Operation list.

1440
tHCatalogOperation

Note:
This check box is enabled only when you select
Drop/Drop if exist/Drop and create/Drop if exist and
create from the Operation list. By default, the value for
this field is rwxrw-r-x. For more information on user
permissions, contact your system administrator.

Set File location Enter the directory in which partitioned data is stored.

Note:
This check box is enabled only when you select
Partition from the Operation on list and Create/Drop and
create/Drop if exist and create from the Operation list.
For further information about storing partitioned data in
HDFS, see https://cwiki.apache.org/Hive/.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

Comment Fill this field with the comment for the table you want to
create.

Note:
This field is enabled only when you select Table from
the Operation on list and Create/Drop and create/Drop
if exist and create from the Operation list in the Basic
settings view.

Set HDFS location Select this check box to specify an HDFS location to which
the table you want to create is saved. Deselect it to save the
table you want to create in the warehouse directory defined
in the key hive.metastore.warehouse.dir in Hive configuration
file hive-site.xml.

Note:
This check box is enabled only when you select
Table from the Operation on list and Create/Drop and
create/Drop if exist and create from the Operation list
in the Basic settings view. For further information about
saving data in HDFS, see https://cwiki.apache.org/Hive/.

Set row format(terminated by) Select this check box to use and define the row formats
when you want to create a table:
Field: Select this check box to use Field as the row format.
The default value for this field is "\u0001". You can also
specify a customized char in this field.
Collection Item: Select this check box to use Collection
Item as the row format. The default value for this field is
"\u0002". You can also specify a customized char in this
field.

1441
tHCatalogOperation

Map Key: Select this check box to use Map Key as the row
format. The default value for this field is "\u0003". You can
also specify a customized char in this field.
Line: Select this check box to use Line as the row format.
The default value for this field is "\n". You can also specify a
customized char in this field.

Note:
This check box is enabled only when you select
Table from the Operation on list and Create/Drop and
create/Drop if exist and create from the Operation list
in the Basic settings view. For further information about
row formats in the HCatalog managed table, see https://
cwiki.apache.org/Hive/.

Properties Click [+] to add one or more lines to define table properties.
The table properties allow you to tag the table definition
with your own metadata key/value pairs. Make sure that
values in both Key row and Value row must be quoted in
double quotation marks.

Note:
This table is enabled only when you select
Database/Table from the Operation on list and
Create/Drop and create/Drop if exist and create from
the Operation list in the Basic settings view. For further
information about table properties, see https://cwiki.
apache.org/Hive/.

Retrieve the HCatalog logs Select this check box to retrieve log files generated during
HCatalog operations.

Standard Output Folder Browse to, or enter the directory where the log files are
stored.

Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.

Error Output Folder Browse to, or enter the directory where the error log files
are stored.

Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the

1442
tHCatalogOperation

Die on error check box is cleared, if the component has this


check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is commonly used in a single-component


subJob.
HCatalog is built on top of the Hive metastore to provide
read and write interface for Pig and MapReduce, so that the
latter systems can use the metadata of Hive to easily read
and write data in HDFS.
For further information, see Apache documentation about
HCatalog: https://cwiki.apache.org/confluence/display/Hive/
HCatalog.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitation When Use kerberos authentication is selected, the


component cannot work with IBM JVM.

1443
tHCatalogOperation

Managing HCatalog tables on Hortonworks Data Platform


This scenario applies only to Talend products with Big Data.
This scenario describes a six-component Job that includes the common operations for the HCatalog
table management on Hortonworks Data Platform. Sub-sections in this scenario covers DB operations
including:
• Creating a table to the database in HDFS;
• Writing data to the HCatalog managed table;
• Writing data to the partitioned table using tHCatalogLoad;
• Reading data from the HCatalog managed table;
• Outputting the data read from the table in HDFS.

Note:
Knowledge of Hive Data Definition Language and HCatalog Data Definition Language is required.
For further information about Hive Data Definition Language, see https://cwiki.apache.org/
confluence/display/Hive/LanguageManual+DDL. For further information about HCatalog Data
Definition Language, see https://cwiki.apache.org/confluence/display/HCATALOG/Design
+Document+-+Java+APIs+for+HCatalog+DDL+Commands.

Setting up the Job


Procedure
1. Drop the following components from the Palette to the design workspace: tHCatalogOperation,
tHCatalogLoad, tHCatalogInput, tHCatalogOutput, tFixedFlowInput, and tFileOutputDelimited.

2. Right-click tHCatalogOperation to connect it to tFixedFlowInput component using a


Trigger>OnSubjobOk connection.

1444
tHCatalogOperation

3. Right-click tFixedFlowInput to connect it to tHCatalogOutput using a Row > Main connection.


4. Right-click tFixedFlowInput to connect it to tHCatalogLoad using a Trigger > OnSubjobOk
connection.
5. Right-click tHCatalogLoad to connect it to the tHCatalogInput component using a Trigger >
OnSubjobOk connection.
6. Right-click tHCatalogInput to connect it to tFileOutputDelimited using a Row > Main connection.

Creating a table in HDFS


Procedure
1. Double-click tHCatalogOperation to open its Basic settings view.

2. Click Edit schema to define the schema for the table to be created.

1445
tHCatalogOperation

3. Click [+] to add at least one column to the schema and click OK when you finish setting the
schema. In this scenario, the columns added to the schema are: name, country and age.
4. Fill the Templeton hostname field with URL of the Templeton webservice you are using. In this
scenario, fill this field with "192.168.0.131".
5. Fill the Templeton port field with the port for Templeton hostname. By default, the value for this
field is "50111"
6. Select Table from the Operation on list and Drop if exist and create from the Operation list to
create a table in HDFS.
7. Fill the Database field with an existing database name in HDFS. In this scenario, the database
name is "talend".
8. Fill the Table field with the name of the table to be created. In this scenario, the table name is
"Customer".
9. Fill the Username field with the username for the DB authentication.
10. Select the Set the user group to use check box to specify the user group. The default user group is
"root", you need to specify the value for this field according to real practice.
11. Select the Set the permissions to use check box to specify the user permission. The default value
for this field is "rwxrwxr-x".
12. Select the Set partitions check box to enable the partition schema.
13. Click the Edit schema button next to the Set partitions check box to define the partition schema.
14. Click [+] to add one column to the schema and click OK when you finish setting the schema. In
this scenario, the column added to the partition schema is: match_age.

Writing data to the existing table


Procedure
1. Double-click tFixedFlowInput to open its Basic settings view.

2. Click Edit schema to define a same schema as the one you defined in tHCatalogOperation.
3. Fill the Number of rows field with integer 8.

1446
tHCatalogOperation

4. Select Use Inline Table in the Mode area.


5. Click [+] to add new lines in the inline table.
6. Double-click tHCatalogOutput to open its Basic settings view.

7. Click Sync columns to retrieve the schema defined in the preceding component.
8. Fill the NameNode URI field with the URI to the NameNode. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber; WebHDFS with SSL is not supported yet.
9. Fill the File name field with the HDFS location of the file you write data to. In this scenario, the
file location is "/user/hdp/Customer/Customer.csv".
10. Select Overwrite from the Action list.
11. Fill the Templeton hostname field with URL of the Templeton webservice you are using. In this
scenario, fill this field with "192.168.0.131".
12. Fill the Templeton port field with the port for Templeton hostname. By default, the value for this
field is "50111"
13. Fill the Database field, the Table field, the Username field with the same value you specified in
tHCatalogOperation.
14. Fill the Partition field with "match_age=27".
15. Fill the File location field with the HDFS location to which the table will be saved. In this
example, use "hdfs://192.168.0.131:8020/user/hdp/Customer".

Writing data to the partitioned table using tHCatalogLoad


Procedure
1. Double-click tHCatalogLoad to open its Basic settings view.

1447
tHCatalogOperation

2. Fill the Partition field with "match_age=26".


3. Do the rest of the settings in the same way as configuring tHCatalogOperation.

Reading data from the table in HDFS


Procedure
1. Double-click tHCatalogInput to open its Basic settings view.

2. Click Edit schema to define the schema of the table to be read from the database.

1448
tHCatalogOperation

3. Click [+] to add at least one column to the schema. In this scenario, the columns added to the
schema are age and name.
4. Fill the Partition field with "match_age=26".
5. Do the rest of the settings in the same way as configuring tHCatalogOperation.

Outputting the data read from the table in HDFS to the console
Procedure
1. Double-click tLogRow to open its Basic settings view.

2. Click Sync columns to retrieve the schema defined in the preceding component.
3. Select Table from the Mode area.

Job execution
Press CTRL+S to save your Job and F6 to execute it.

1449
tHCatalogOperation

The data of the restricted table read from the HDFS is displayed onto the console.
Type in http://talend-hdp:50075/browseDirectory.jsp?dir=/user/hdp/Customer&namenodeInfoPort=50070
to the address bar of your browser to view the table you created:

1450
tHCatalogOperation

Click the Customer.csv link to view the content of the table you created.

1451
tHCatalogOperation

1452
tHCatalogOutput

tHCatalogOutput
Receives data from its incoming flow and writes this data into an HCatalog managed table.

tHCatalogOutput Standard properties


These properties are used to configure tHCatalogOutput running in the Standard Job framework.
The Standard tHCatalogOutput component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

1453
tHCatalogOutput

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

1454
tHCatalogOutput

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

HCatalog version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

File name Browse to, or enter the location of the file which you write
data to. This file is created automatically if it does not exist.

Action Select a DB operation in HDFS:


Create: Creates a file with data using the file name defined
in the File Name field.

1455
tHCatalogOutput

Overwrite: Overwrites the data in the file specified in the


File Name field.
Append: Inserts the data into the file specified in the File
Name field. The specified file is created automatically if it
does not exist.

Templeton hostname Fill this field with the URL of Templeton Webservice.

Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.

Templeton port Fill this field with the port of URL of Templeton Webservice.
By default, this value is 50111.

Note:
Templeton is a webservice API for HCatalog. It has
been renamed to WebHCat by the Apache community.
This service facilitates the access to HCatalog and
the related Hadoop elements such as Pig. For further
information about Templeton (WebHCat), see https://
cwiki.apache.org/confluence/display/Hive/WebHCat
+UsingWebHCat.

Database Fill this field to specify an existing database in HDFS.

Table Fill this field to specify an existing table in HDFS.

Partition Fill this field to specify one or more partitions for the pa
rtition operation on the specified table. When you specify
multiple partitions, use commas to separate every two
partitions and use double quotation marks to quote the
partition string.
If you are reading a non-partitioned table, leave this field
empty.

Note:
For further information about Partition, see https://cwiki.
apache.org/Hive/.

Username Fill this field with the username for the DB authentication.

File location Fill this field with the path to which source data file is sto
red.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

1456
tHCatalogOutput

Advanced settings

Row separator The separator used to identify the end of a row.

Field separator Enter character, string or regular expression to separate


fields for the transferred data.

Custom encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

Retrieve the HCatalog logs Select this check box to retrieve log files generated during
HCatalog operations.

Standard Output Folder Browse to, or enter the directory where the log files are st
ored.

Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.

Error Output Folder Browse to, or enter the directory where the error log files
are stored.

Note:
This field is enabled only when you selected Retrieve the
HCatalog logs check box.

1457
tHCatalogOutput

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is commonly used together with an input


component.
HCatalog is built on top of the Hive metastore to provide
read and write interface for Pig and MapReduce, so that the
latter systems can use the metadata of Hive to easily read
and write data in HDFS.
For further information, see Apache documentation about
HCatalog: https://cwiki.apache.org/confluence/display/Hive/
HCatalog.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of

1458
tHCatalogOutput

the Data viewer to view locally in the Studio the data


stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenario
For a related scenario, see Managing HCatalog tables on Hortonworks Data Platform on page 1444.

1459
tHDFSCompare

tHDFSCompare
Compares two files in HDFS and based on the read-only schema, generates a row flow that presents
the comparison information.
tHDFSCompare helps to control the quality of the data processed.

tHDFSCompare Standard properties


These properties are used to configure tHDFSCompare running in the Standard Job framework.
The Standard tHDFSCompare component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.

1460
tHDFSCompare

2. Select Import from zip to import the configuration zip


for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the

1461
tHDFSCompare

authentication key generated upon the registration of


the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

1462
tHDFSCompare

User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
The schema of this component is read-only. You can click
Edit schema to view the schema.

Comparison mode Select the mode to be applied on the comparison.

File to compare Browse, or enter the path to the file in HDFS you need to
check for quality control.

Reference file Browse, or enter the path to the file in HDFS the comparison
is based on.

If differences detected, display and If no differences Type in a message to be displayed in the Run console based
detected, display on the result of the comparison.

Print to console Select this check box to display the message in the Run
console.

Advanced settings

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:

1463
tHDFSCompare

• Typically, the HDFS-related properties can be found in


the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables DIFFERENCE: the result of the comparison. This is a Flow


variable and it returns a boolean.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tHDFSCompare can be standalone component or send the


information it generates to its following component.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

1464
tHDFSCompare

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitation JRE 1.6+ is required.

Related scenarios
No scenario is available for the Standard version of this component yet.

1465
tHDFSConnection

tHDFSConnection
Connects to a given HDFS so that the other Hadoop components can reuse the connection it creates
to communicate with this HDFS.
tHDFSConnection provides connection to the Hadoop distributed file system (HDFS) of interest at
runtime.

tHDFSConnection Standard properties


These properties are used to configure tHDFSConnection running in the Standard Job framework.
The Standard tHDFSConnection component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip

1466
tHDFSConnection

files which you can download from this Hadoop


configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this

1467
tHDFSConnection

application on Azure. For further information, see Azure


documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Inspect the classpath for configurations Select this check box to allow the component to check the
configuration files in the directory you have set with the
$HADOOP_CONF_DIR variable and directly read parameters
from these files in this directory. This feature allows you to
easily change the Hadoop configuration for the component
to switch between different environments, for example,
from a test environment to a production environment.
In this situation, the fields or options used to configure
Hadoop connection and/or Kerberos security are hidden.
If you want to use certain parameters such as the Kerberos
parameters but these parameters are not included in these
Hadoop configuration files, you need to create a file called
talend-site.xml and put this file into the same directory
defined with $HADOOP_CONF_DIR. This talend-site.xml file
should read as follows:

<!-- Put site-specific property


overrides in this file. -->
<configuration>
<property>
<name>talend.k
erberos.authentication </name>
<value>kinit </value>
<description> Set the
Kerberos authentication method to
use. Valid values are: kinit or
keytab. </description>
</property>
<property>
<name>talend.k
erberos.keytab.principal </name>
<value>[email protected] </
value>
<description> Set the
keytab's principal name. </
description>
</property>
<property>
<name>talend.k
erberos.keytab.path </name>
<value>/kdc/user.keytab </
value>

1468
tHDFSConnection

<description> Set the


keytab's path. </description>
</property>
<property>
<name>talend.encryption </
name>
<value>none </value>
<description> Set the
encryption method to use. Valid
values are: none or ssl. </
description>
</property>
<property>
<name>talend.s
sl.trustStore.path </name>
<value>ssl </value>
<description> Set
SSL trust store path. </
description>
</property>
<property>
<name>talend.s
sl.trustStore.password </name>
<value>ssl </value>
<description> Set SSL
trust store password. </
description>
</property>
</configuration>
The parameters read from these configuration files override
the default ones used by the Studio. When a parameter
does not exist in these configuration files, the default one is
used.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file

1469
tHDFSConnection

itself in the Keytab field. This keytab file must be stored in


the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

User name User authentication name of HDFS.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

Use datanode hostname Select the Use datanode hostname check box to allow the
Job to access datanodes via their hostnames. This actually
sets the dfs.client.use.datanode.hostname property to true.

Setup HDFS encryption configurations If the HDFS transparent encryption has been enabled
in your cluster, select the Setup HDFS encryption
configurations check box and in the HDFS encryption key
provider field that is displayed, enter the location of the
KMS proxy.
For further information about the HDFS transparent
encryption and its KMS proxy, see Transparent Encryption in
HDFS.

1470
tHDFSConnection

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is generally used with other Hadoop


components.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

1471
tHDFSConnection

Limitations JRE 1.6+ is required.

Related scenarios
No scenario is available for the Standard version of this component yet.

1472
tHDFSCopy

tHDFSCopy
copies a source file or folder into a target directory in HDFS and removes this source if required.

tHDFSCopy Standard properties


These properties are used to configure tHDFSCopy running in the Standard Job framework.
The Standard tHDFSCopy component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.

1473
tHDFSCopy

In Talend Exchange, members of Talend community


have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.

1474
tHDFSCopy

Ensure that the application to be used has appropriate


permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,

1475
tHDFSCopy

the user name of the machine hosting the Studio will be


used.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

Source file or directory Browse to, or enter the path pointing to the data to be used
in the file system.

Target location Browse to, or enter the directory in HDFS to which you need
to copy the data.

Rename To rename the file or folder copied to the target location,


select this check box to display the New name field, then,
enter the new name.

Copy merge Select this check box to merge the part files generated at
the end of a MapReduce computation.
Once selecting it, you need to enter the name of the final
merged file in the Merge name field.

Remove source Select this check box to remove the source file or folder
once this source is copied to the target location.

Override target file (This option does not override the Select this check box to override the file already existing in
directory) the target location. This option does not override the folder.

Advanced settings

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

1476
tHDFSCopy

Global Variables

Global Variables DESTINATION_FILEPATH: the destination file path. This is


an After variable and it returns a string.
SOURCE_FILEPATH: the source file path. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tHDFSCopy is a standalone component.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien

1477
tHDFSCopy

t jar file. For further information, see the following link


from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitation JRE 1.6+ is required.

Related scenario
Related topic, see Procedure on page 990
Related topic, see Iterating on a HDFS directory on page 1523

1478
tHDFSDelete

tHDFSDelete
Deletes a file located on a given Hadoop distributed file system (HDFS).

tHDFSDelete Standard properties


These properties are used to configure tHDFSDelete running in the Standard Job framework.
The Standard tHDFSDelete component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.

1479
tHDFSDelete

In Talend Exchange, members of Talend community


have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.

1480
tHDFSDelete

Ensure that the application to be used has appropriate


permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

User name User authentication name of HDFS.

1481
tHDFSDelete

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

File or Directory Path Browse to, or enter the path to the file or folder to be
deleted on HDFS.

Advanced settings

Hadoop properties If you need to use custom configuration for the Hadoop of
interest, complete this table with the property or properties
to be customized. Then at runtime, the customized property
or properties will override those corresponding ones
defined earlier for the same Hadoop.
For further information about the properties required by
Hadoop, see the Hadoop documentation.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables DELETE_PATH: the path to the deleted file or folder. This is
an After variable and it returns a string.
CURRENT_STATUS: the execution result of the component.
This is an After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is used to compose a single-component Job


or subJob.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .

1482
tHDFSDelete

The Dynamic settings table is available only when the Use


an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitations JRE 1.6+ is required.

Related scenarios
No scenario is available for the Standard version of this component yet.

1483
tHDFSExist

tHDFSExist
Checks whether a file exists in a specific directory in HDFS.

tHDFSExist Standard properties


These properties are used to configure tHDFSExist running in the Standard Job framework.
The Standard tHDFSExist component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file

1484
tHDFSExist

should contain the libraries of the different Hadoop


elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of

1485
tHDFSExist

the application that the current Job you are developing


uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

1486
tHDFSExist

User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

HDFS directory Browse to, or enter the path pointing to the data to be used
in the file system.

File name or relative path Enter the name of the file you want to check whether this
file exists. Or if needs be, browse to the file or enter the
path to the file, relative to the directory you entered in
HDFS directory.

Advanced settings

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables EXISTS: the result of whether a specified file exists. This is a
Flow variable and it returns a boolean.
FILENAME: the name of the file processed. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable

1487
tHDFSExist

and it returns a string. This variable functions only if the


Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tHDFSExist is a standalone component.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the

1488
tHDFSExist

Run/Debug view in the Preferences dialog box in the


Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitation JRE 1.6+ is required.

Checking the existence of a file in HDFS


This scenario applies only to Talend products with Big Data.
In this scenario, the two-component Job checks whether a specific file exists in HDFS and returns a
message to indicate the result of the verification.
In the real-world practice, you can take further action to process the file checked according to the
verification result, using the other HDFS components provided with the Studio.

Launch the Hadoop distribution in which you want to check the existence of a particular file. Then,
proceed as follows:

Linking the components


Procedure
1. In the Integration perspective of Talend Studio , create an empty Job, named hdfsexist_file for
example, from the Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2. Drop tHDFSExist and tMsgBox onto the workspace.
3. Connect them using the Trigger > Run if link.

Configuring the connection to HDFS


Procedure
1. Double-click tHDFSExist to open its Component view.

1489
tHDFSExist

2. In the Version area, select the Hadoop distribution you are connecting to and its version.
3. In the Connection area, enter the values of the parameters required to connect to the HDFS.
In the real-world practice, you may use tHDFSConnection to create a connection and reuse it from
the current component. For further information, see tHDFSConnection on page 1466.
4. In the HDFS Directory field, browse to, or enter the path to the folder where the file to be checked
is. In this example, browse to /user/ychen/data/hdfs/out/dest.
5. In the File name or relative path field, enter the name of the file you want to check the existence.
For example, output.csv.

Defining the message to be returned


Procedure
1. Double-click tMsgBox to open its Component view.

2. In the Title field, enter the title to be used for the pop-up message box to be created.
3. In the Buttons list, select OK. This defines the button to be displayed on the message box.
4. In the Icon list, select Icon information.
5. In the Message field, enter the message you want to displayed once the file checking is done. In
this example, enter "This file does not exist!".

1490
tHDFSExist

Defining the condition


Procedure
1. Click the If link to open the Basic settings view, where you are able to define the condition for
checking the existence of this file.

2. In the Condition box, press Ctrl+Space to access the variable list and select the global variable
EXISTS. Type an exclamation mark before the variable to negate the meaning of the variable.

Executing the Job


Procedure
Press F6 to execute this Job.

Results
Once done, a message box pops up to indicate that this file called output.csv does not exist in the
directory you defined earlier.

In the HDFS we check the existence of the file, browse to this directory specified, you can see that this
file does not exist.

1491
tHDFSExist

1492
tHDFSGet

tHDFSGet
Copies files from Hadoop distributed file system(HDFS), pastes them in a user-defined directory and if
needs be, renames them.
tHDFSGet connects to Hadoop distributed file system, helping to obtain large-scale files with
optimized performance.

tHDFSGet Standard properties


These properties are used to configure tHDFSGet running in the Standard Job framework.
The Standard tHDFSGet component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add

1493
tHDFSGet

other required jar files which the base distribution does


not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:

1494
tHDFSGet

• In the Client ID and the Client key fields, enter,


respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the

1495
tHDFSGet

principal to be used is guest; in this situation, ensure that


user1 has the right to read the keytab file to be used.

User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

HDFS directory Browse to, or enter the path pointing to the data to be used
in the file system.

Local directory Browse to, or enter the local directory to store the files
obtained from HDFS.

Overwrite file Options to overwrite or not the existing file with the new
one.

Append Select this check box to add the new rows at the end of the
records.

Include subdirectories Select this check box if the selected input source type
includes sub-directories.

Files In the Files area, the fields to be completed are:


- File mask: type in the file name to be selected from HDFS.
Regular expression is available.
- New name: give a new name to the obtained file.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the

1496
tHDFSGet

documentation you want. For demonstration purposes, the


links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

Global Variables

Global Variables NB_FILE: the number of files processed. This is an After


variable and it returns an integer.
CURRENT_STATUS: the execution result of the component.
This is a Flow variable and it returns a string.
TRANSFER_MESSAGES: file transferred information. This is
an After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component combines HDFS connection and data


extraction, thus used as a single-component subJob to move
data from HDFS to an user-defined local directory.
Different from the tHDFSInput and the tHDFSOutput
components, it runs standalone and does not generate input
or output flow for the other components.
It is often connected to the Job using OnSubjobOk or
OnComponentOk link, depending on the context.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the

1497
tHDFSGet

Component List box in the Basic settings view becomes


unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitations JRE 1.6+ is required.

Computing data with Hadoop distributed file system


This scenario applies only to Talend products with Big Data.
The following scenario describes a simple Job that creates a file in a defined directory, get it into and
out of HDFS, subsequently store it to another local directory, and read it at the end of the Job.

Setting up the Job


Procedure
1. Drop the following components from the Palette onto the design workspace: tFixedFlowInput,
tFileOutputDelimited, tHDFSPut, tHDFSGet, tFileInputDelimited and tLogRow.
2. Connect tFixedFlowInput to tFileOutputDelimited using a Row > Main connection.

1498
tHDFSGet

3. Connect tFileInputDelimited to tLogRow using a Row > Main connection.


4. Connect tFixedFlowInput to tHDFSPut using an OnSubjobOk connection.
5. Connect tHDFSPut to tHDFSGet using an OnSubjobOk connection.
6. Connect tHDFSGet to tFileInputDelimitedusing an OnSubjobOk connection.

Configuring the input component


Procedure
1. Double-click tFixedFlowInput to define the component in its Basic settings view.
2. Set the Schema to Built-In and click the three-dot [...] button next to Edit Schema to describe the
data structure you want to create from internal variables. In this scenario, the schema contains
one column: content.

1499
tHDFSGet

3. Click the plus button to add the parameter line.


4. Click OK to close the dialog box and accept to propagate the changes when prompted by the
studio.
5. In Basic settings, define the corresponding value in the Mode area using the Use Single Table
option. In this scenario, the value is "Hello world!".

Configuring the tFileOutputDelimited component


Procedure
1. Double-click tFileOutputDelimited to define the component in its Basic settings view.

2. Click the [...] button next to the File Name field and browse to the output file you want to write
data in, in.txt in this example.

Loading the data from the local file


Procedure
1. Double-click tHDFSPut to define the component in its Basic settings view.

1500
tHDFSGet

2. Select, for example, Apache 0.20.2 from the Hadoop version list.
3. In the NameNode URI, the Username and the Group fields, enter the connection parameters to
the HDFS. If you are using WebHDFS, the location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.
4. Next to the Local directory field, click the three-dot [...] button to browse to the folder with
the file to be loaded into the HDFS. In this scenario, the directory has been specified while
configuring tFileOutputDelimited: C:/hadoopfiles/putFile/.
5. In the HDFS directory field, type in the intended location in HDFS to store the file to be loaded. In
this example, it is /testFile.
6. Click the Overwrite file field to stretch the drop-down.
7. From the menu, select always.
8. In the Files area, click the plus button to add a row in which you define the file to be loaded.
9. In the File mask column, enter *.txt to replace newLine between quotation marks and leave the
New name column as it is. This allows you to extract all the .txt files in the specified directory
without changing their names. In this example, the file is in.txt.

Getting the data from the HDFS


Procedure
1. Double-click tHDFSGet to define the component in its Basic settings view.

1501
tHDFSGet

2. Select, for example, Apache 0.20.2 from the Hadoop version list.
3. In the NameNode URI, the Username, the Group fields, enter the connection parameters to the
HDFS. If you are using WebHDFS, the location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.
4. In the HDFS directory field, type in location storing the loaded file in HDFS. In this example, it is /
testFile.
5. Next to the Local directory field, click the three-dot [...] button to browse to the folder intended to
store the files that are extracted out of the HDFS. In this scenario, the directory is: C:/hadoopfiles/
getFile/.
6. Click the Overwrite file field to stretch the drop-down.
7. From the menu, select always.
8. In the Files area, click the plus button to add a row in which you define the file to be extracted.
9. In the File mask column, enter *.txt to replace newLine between quotation marks and leave the
New name column as it is. This allows you to extract all the .txt files from the specified directory
in the HDFS without changing their names. In this example, the file is in.txt.

Reading data from the HDFS and saving the data locally
Procedure
1. Double-click tFileInputDelimited to define the component in its Basic settings view.

1502
tHDFSGet

2. Set property type to Built-In.


3. Next to the File Name/Stream field, click the three-dot button to browse to the file you have
obtained from the HDFS. In this scenario, the directory is C:/hadoopfiles/getFile/in.txt.
4. Set Schema to Built-In and click Edit schema to define the data to pass on to the tLogRow
component.

5. Click the plus button to add a new column.


6. Click OK to close the dialog box and accept to propagate the changes when prompted by the
studio.

Executing the Job


Save the Job and press F6 to execute it.
The in.txt file is created and loaded into the HDFS.

1503
tHDFSGet

The file is also extracted from the HDFS by tHDFSGet and is read by tFileInputDelimited.

1504
tHDFSInput

tHDFSInput
Extracts the data in a HDFS file for other components to process it.
tHDFSInput reads a file located on a given Hadoop distributed file system (HDFS) and puts the data of
interest from this file into a Talend schema. Then it passes the data to the component that follows.

tHDFSInput Standard properties


These properties are used to configure tHDFSInput running in the Standard Job framework.
The Standard tHDFSInput component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Built-In: You create and store the schema locally for this
component only.

Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.

1505
tHDFSInput

Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

1506
tHDFSInput

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the

1507
tHDFSInput

Kerberos principal name for the NameNode in the field


displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

File Name Browse to, or enter the path pointing to the data to be used
in the file system.
If the path you set points to a folder, this component will
read all of the files stored in that folder. Furthermore, if
sub-folders exist in that folder and you need to read files in
the sub-folders, select the Include sub-directories if path is
directory check box in the Advanced settings view.

Type Select the type of the file to be processed. The type of the
file may be:
• Text file.
• Sequence file: a Hadoop sequence file consists of
binary key/value pairs and is suitable for the Map/
Reduce framework. For further information, see http://
wiki.apache.org/hadoop/SequenceFile.

1508
tHDFSInput

Once you select the Sequence file format, the Key


column list and the Value column list appear to allow
you to select the keys and the values of that Sequence
file to be processed.

Row separator The separator used to identify the end of a row.


This field is not available for a Sequence file.

Field separator Enter character, string or regular expression to separate


fields for the transferred data.
This field is not available for a Sequence file.

Header Set values to ignore the header of the transferred data. For
example, enter 0 to ignore no rows for the data without
header and set 1 for the data with header at the first row.
This field is not available for a Sequence file.

Custom encoding You may encounter encoding issues when you process
the stored data. In that situation, select this check box to
display the Encoding list.
Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
This option is not available for a Sequence file.

Compression Select the Uncompress the data check box to uncompress t


he input data.
Hadoop provides different compression formats that help
reduce the space needed for storing files and speed up data
transfer. When reading a compressed file, the Studio needs
to uncompress it before being able to feed it to the input
flow.
This option is not available for a Sequence file.

Advanced settings

Include sub-directories if path is directory Select this check box to read not only the folder you have
specified in the File name field but also the sub-folders in
that folder.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are

1509
tHDFSInput

using or see Apache's Hadoop documentation on http://


hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

tStatCatcher Statistics Select this check box to collect log data at the component
level. Note that this check box is not available in the Map/
Reduce version of the component.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component needs an output link.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

1510
tHDFSInput

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitations JRE 1.6+ is required.

Using HDFS components to work with Azure Data Lake


Storage (ADLS)
This scenario describes how to use the HDFS components to read data from and write data to Azure
Data Lake Storage.
This scenario applies only to Talend products with Big Data.
• tFixedFlowInput: it provides sample data to the Job.
• tHDFSOutput: it writes sample data to Azure Data Lake Store.
• tHDFSInput: it reads sample data from Azure Data Lake Store.
• tLogRow: it displays the output of the Job on the console of the Run view of the Job.

Grant your application the access to your ADLS Gen2


Before you begin
An Azure subscription is required.

Procedure
1. Create your Azure Data Lake Storage Gen2 account if you do not have it yet.

1511
tHDFSInput

• For more details, see Create an Azure Data Lake Storage Gen2 account from the Azure
documentation.
2. Create an Azure Active Directory application on your Azure portal. For more details about how to
do this, see the "Create an Azure Active Directory application" section in Azure documentation:
Use portal to create an Azure Active Directory application.
3. Obtain the application ID, object ID and the client secret of the application to be used from the
portal.
a) On the list of the registered applications, click the application you created and registered in
the previous step to display its information blade.
b) Click Overview to open its blade, and from the top section of the blade, copy the Object ID and
the application ID displayed as Application (client) ID. Keep them somewhere safe for later
use.
c) Click Certificates & secrets to open its blade and then create the authentication key (client
secret) to be used on this blade in the Client secrets section.
4. Back to the Overview blade of the application to be used, click Endpoints on the top of this blade,
copy the value of OAuth 2.0 token endpoint (v1) from the endpoint list that appears and keep it
somewhere safe for later use.
5. Set the read and write permissions to the ADLS Gen2 filesystem to be used for the service
principal of your application.
It is very likely that the administrator of your Azure system has included your account and your
applications in the group that has access to a given ADLS Gen2 storage account and a given ADLS
Gen2 filesystem. In this case, ask your administrator to ensure that you have the proper access and
then ignore this step.
a) Start your Microsoft Azure Storage Explorer and find your ADLS Gen2 storage account on the
Storage Accounts list.
If you have not installed Microsoft Azure Storage Explorer, you can download it from the
Microsoft Azure official site.
b) Expand this account and the Blob Containers node under it; then click the ADLS Gen2
hierarchical filesystem to be used under this node.

1512
tHDFSInput

Example

The filesystem in this image is for demonstration purposes only. Create the filesystem to be
used under the Blob Containers node in your Microsoft Azure Storage Explorer, if you do not
have one yet.
c) On the blade that is opened, click Manage Access to open its wizard.
d) At the bottom of this wizard, add the object ID of your application to the Add user or group
field and click Add.
e) Select the object ID just added from the Users and groups list and select all the permission for
Access and Default.
f) Click Save to validate these changes and close this wizard.

Creating an HDFS Job in the Studio


Procedure
1. On the Integration perspective, drop the following components from the Palette onto the design
workspace: tFixedFlowInput, tHDFSOutput, tHDFSInput and tLogRow.
2. Connect tFixedFlowInput to tHDFSOutput using a Row > Main link.
3. Do the same to connect tHDFSInput to tLogRow.
4. Connect tFixedFlowInput to tHDFSInput using a Trigger > OnSubjobOk link.

1513
tHDFSInput

Results

Configuring the HDFS components to work with Azure Data Lake Storage
Procedure
1. Double-click tFixedFlowInput to open its Component view to provide sample data to the Job.
The sample data to be used contains only one row with two column: id and name.
2. Click the [...] button next to Edit schema to open the schema editor.
3. Click the [+] button to add the two columns and rename them to id and name.
4. Click OK to close the schema editor and validate the schema.
5. In the Mode area, select Use single table.
The id and the name columns automatically appear in the Value table and you can enter the
values you want within double quotation marks in the Value column for the two schema values.
6. Double-click tHDFSOutput to open its Component view.

1514
tHDFSInput

Example

7. In the Version area, select Hortonworks or Cloudera depending on the distribution you are using.
In the Standard framework, only these two distributions with ADLS are supported by the HDFS
components.
8. From the Scheme drop-down list, select ADLS. The ADLS related parameters appear in the
Component view.
9. In the URI field, enter the NameNode service of your application. The location of this service is
actually the address of your Data Lake Store.
For example, if your Data Lake Storage name is data_lake_store_name, the NameNode URI
to be used is adl://data_lake_store_name.azuredatalakestore.net.
10. In the Client ID and the Client key fields, enter, respectively, the authentication ID and the
authentication key generated upon the registration of the application that the current Job you are
developing uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate permissions to access Azure Data Lake.
You can check this on the Required permissions view of this application on Azure. For further
information, see Azure documentation Assign the Azure AD application to the Azure Data Lake
Storage account file or folder.
This application must be the one to which you assigned permissions to access your Azure Data
Lake Storage in the previous step.
11. In the Token endpoint field, copy-paste the OAuth 2.0 token endpoint that you can obtain from
the Endpoints list accessible on the App registrations page on your Azure portal.

1515
tHDFSInput

12. In the File name field, enter the directory to be used to store the sample data on Azure Data Lake
Storage.
13. From the Action drop-down list, select Create if the directory to be used does not exist yet on
Azure Data Lake Storage; if this folder already exists, select Overwrite.
14. Do the same configuration for tHDFSInput.
15. If you run your Job on Windows, following this procedure to add the winutils.exe program to your
Job.
16. Press F6 to run your Job.

1516
tHDFSList

tHDFSList
tHDFSList retrieves a list of files or folders based on a filemask pattern and iterates on each unity.

tHDFSList Standard properties


These properties are used to configure tHDFSList running in the Standard Job framework.
The Standard tHDFSList component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.

1517
tHDFSList

In Talend Exchange, members of Talend community


have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.

1518
tHDFSList

Ensure that the application to be used has appropriate


permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,

1519
tHDFSList

the user name of the machine hosting the Studio will be


used.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

HDFS Directory Browse to, or enter the path pointing to the data to be used
in the file system.

FileList Type Select the type of input you want to iterate on from the list:
Files if the input is a set of files,
Directories if the input is a set of directories,
Both if the input is a set of the above two types.

Include subdirectories Select this check box if the selected input source type
includes sub-directories.

Case Sensitive Set the case mode from the list to either create or not
create case sensitive filter on filenames.

Use Glob Expressions as Filemask This check box is selected by default. It filters the results
using a Global Expression (Glob Expressions).

Files Click the plus button to add as many filter lines as needed:
Filemask: in the added filter lines, type in a filename or a
filemask using special characters or regular expressions.

Order by The folders are listed first of all, then the files. You can
choose to prioritise the folder and file order either:
By default: alphabetical order, by folder then file;
By file name: alphabetical order or reverse alphabetical
order;
By file size: smallest to largest or largest to smallest;
By modified date: most recent to least recent or least recent
to most recent.

Note:
If ordering by file name, in the event of identical file
names then modified date takes precedence. If ordering
by file size, in the event of identical file sizes then file
name takes precedence. If ordering by modified date,
in the event of identical dates then file name takes
precedence.

Order action Select a sort order by clicking one of the following radio
buttons:
ASC: ascending order;
DESC: descending order;

Advanced settings

Use Exclude Filemask Select this check box to enable Exclude Filemask field to
exclude filtering condition based on file type:

1520
tHDFSList

Exclude Filemask: Fill in the field with file types to be


excluded from the Filemasks in the Basic settings view.

Note:
File types in this field should be quoted with double
quotation marks and seperated by comma.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables CURRENT_FILE: the current file name. This is a Flow


variable and it returns a string.
CURRENT_FILEDIRECTORY: the current file directory. This is
a Flow variable and it returns a string.
CURRENT_FILEEXTENSION: the extension of the current file.
This is a Flow variable and it returns a string.
CURRENT_FILEPATH: the current file path. This is a Flow
variable and it returns a string.
NB_FILE: the number of files iterated upon so far. This is a
Flow variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

1521
tHDFSList

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tHDFSList provides a list of files or folders from a defined


HDFS directory on which it iterates.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Connections Outgoing links (from this component to another):


Row: Iterate
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

Incoming links (from one component to this one):


Row: Iterate.
Trigger: Run if; On Subjob Ok; On Subjob Error; On
component Ok; On Component Error; Synchronize; Paralle
lize.

For further information regarding connections, see Talend


Studio User Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien

1522
tHDFSList

t jar file. For further information, see the following link


from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitation JRE 1.6+ is required.

Iterating on a HDFS directory


This scenario applies only to Talend products with Big Data.
This scenario uses a two-component Job to iterate on a specified directory in HDFS so as to select the
files from there towards a local directory.

Preparing the data to be used


Procedure
Create the files to be iterated on in the HDFS you want to use. In this scenario, two files are created in
the directory: /user/ychen/data/hdfs/out.

1523
tHDFSList

You can design a Job in the Studio to create the two files. For further information, see tHDFSPut on
page 1548 or tHDFSOutput on page 1528.

Linking the components


Procedure
1. In the Integration perspective of Talend Studio , create an empty Job, named HDFSList for
example, from the Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2. Drop tHDFSList and tHDFSGet onto the workspace.
3. Connect them using the Row > Iterate link.

Configuring the iteration


Procedure
1. Double-click tHDFSList to open its Component view.

1524
tHDFSList

2. In the Version area, select the Hadoop distribution you are connecting to and its version.
3. In the Connection area, enter the values of the parameters required to connect to the HDFS.
In the real-world practice, you may use tHDFSConnection to create a connection and reuse it from
the current component. For further information, see tHDFSConnection on page 1466.
4. In the HDFS Directory field, enter the path to the folder where the files to be iterated on are. In
this example, as presented earlier, the directory is /user/ychen/data/hdfs/out/.
5. In the FileList Type field, select File.
6. In the Files table, click to add one row and enter * between the quotation marks to iterate on
any files existing.

Selecting the files


Procedure
1. Double-click tHDFSGet to open its Component view.

1525
tHDFSList

2. In the Version area, select the Hadoop distribution you are connecting to and its version.
3. In the Connection area, enter the values of the parameters required to connect to the HDFS.
In the real-world practice, you may have used tHDFSConnection to create a connection; then you
can reuse it from the current component. For further information, see tHDFSConnection on page
1466.
4. In the HDFS directory field, enter the path to the folder holding the files to be retrieved.
To do this with the auto-completion list, place the mouse pointer in this field, then, press Ctrl
+Space to display the list and select the tHDFSList_1_CURRENT_FILEDIRECTORY variable to reuse
the directory you have defined in tHDFSList. In this variable, tHDFSList_1 is the label of the
component. If you label it differently, select the variable accordingly.
Once selecting this variable, the directory reads, for example, ((String)globalMap.get("tHDF
SList_1_CURRENT_FILEDIRECTORY")) in this field.
For further information about how to label a component, see the Talend Studio User Guide.
5. In the Local directory field, enter the path, or browse to the folder you want to place the selected
files in. This folder will be created if it does not exist. In this example, it is C:/hdfsFiles.
6. In the Overwrite file field, select always.
7. In the Files table, click to add one row and enter * between the quotation marks in the
Filemask column in order to get any files existing.

Executing the Job


Procedure
Press F6 to execute this Job.

1526
tHDFSList

Results
Once done, you can check the files created in the local directory.

1527
tHDFSOutput

tHDFSOutput
Writes data flows it receives into a given Hadoop distributed file system (HDFS).

tHDFSOutput Standard properties


These properties are used to configure tHDFSOutput running in the Standard Job framework.
The Standard tHDFSOutput component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Built-In: You create and store the schema locally for this
component only.

Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component

1528
tHDFSOutput

you are using. Among these options, the following ones


requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

1529
tHDFSOutput

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing

1530
tHDFSOutput

ticket issued for the same username, leave both the


Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

File Name Browse to, or enter the location of the file which you write
data to. This file is created automatically if it does not exist.

Type Select the type of the file to be processed. The type of the
file may be:
• Text file.
• Sequence file: a Hadoop sequence file consists of
binary key/value pairs and is suitable for the Map/
Reduce framework. For further information, see http://
wiki.apache.org/hadoop/SequenceFile.
Once you select the Sequence file format, the Key
column list and the Value column list appear to allow
you to select the keys and the values of that Sequence
file to be processed.

Action Select an operation in HDFS:


Create: Creates a file with data using the file name defined
in the File Name field.
Overwrite: Overwrites the data in the file specified in the
File Name field.
Append: Inserts the data into the file specified in the File
Name field. The specified file is created automatically if it
does not exist.

Row separator The separator used to identify the end of a row.

1531
tHDFSOutput

This field is not available for a Sequence file.

Field separator Enter character, string or regular expression to separate


fields for the transferred data.
This field is not available for a Sequence file.

Custom encoding You may encounter encoding issues when you process
the stored data. In that situation, select this check box to
display the Encoding list.
Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
This option is not available for a Sequence file.

Compression Select the Compress the data check box to compress the
output data.
Hadoop provides different compression formats that help
reduce the space needed for storing files and speed up data
transfer. When reading a compressed file, the Studio needs
to uncompress it before being able to feed it to the input
flow.
Note that when the type of the file to be written is
Sequence File, the compression algorithm is embedded
within the container files (the part- files) of this sequence
file. These files can be read by a Talend component
such as tHDFSInput within MapReduce Jobs and other
applications that understand the sequence file format.
Alternatively, when the type is Text File, the output files
can be accessed with standard compression utilities that
understand the bzip2 or gzip container files.

Include header Select this check box to output the header of the data.
This option is not available for a Sequence file.

Advanced settings

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:

1532
tHDFSOutput

• Typically, the HDFS-related properties can be found in


the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component needs an input component.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.

1533
tHDFSOutput

• Ensure that you have installed the MapR client in the


machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitations JRE 1.6+ is required.

Related scenario
• Related topic, see Writing data in a delimited file on page 1116.
• Related topic, see Computing data with Hadoop distributed file system on page 1498.

1534
tHDFSOutputRaw

tHDFSOutputRaw
Transfers data of different formats such as hierarchical data in the form of a single column into a
given HDFS file system.
tHDFSOutputRaw receives a single-column input flow and writes the data into HDFS.

tHDFSOutputRaw Standard properties


These properties are used to configure tHDFSOutputRaw running in the Standard Job framework.
The Standard tHDFSOutputRaw component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
The schema of this component is read-only. You can click
Edit schema to view the schema.
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need

1535
tHDFSOutputRaw

to configure the connections to the HD Insightcluster


and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be

1536
tHDFSOutputRaw

• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.

1537
tHDFSOutputRaw

This check box is available depending on the Hadoop


distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

Use Datanode hostname Select the Use datanode hostname check box to allow the
Job to access datanodes via their hostnames. This actually
sets the dfs.client.use.datanode.hostname property to true.
When connecting to a S3N filesystem, you must select this
check box.

User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

File Name Browse to, or enter the location of the file which you write
data to. This file is created automatically if it does not exist.

Action Select an operation in HDFS:


Create: Creates a file with data using the file name defined
in the File Name field.
Overwrite: Overwrites the data in the file specified in the
File Name field.
Append: Inserts the data into the file specified in the File
Name field. The specified file is created automatically if it
does not exist.

Custom encoding You may encounter encoding issues when you process
the stored data. In that situation, select this check box to
display the Encoding list.
Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.
This option is not available for a Sequence file.

Compression Select the Compress the data check box to compress the
output data.

1538
tHDFSOutputRaw

Hadoop provides different compression formats that help


reduce the space needed for storing files and speed up data
transfer. When reading a compressed file, the Studio needs
to uncompress it before being able to feed it to the input
flow.
Note that when the type of the file to be written is
Sequence File, the compression algorithm is embedded
within the container files (the part- files) of this sequence
file. These files can be read by a Talend component
such as tHDFSInput within MapReduce Jobs and other
applications that understand the sequence file format.
Alternatively, when the type is Text File, the output files
can be accessed with standard compression utilities that
understand the bzip2 or gzip container files.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables FILENAME_PATH: the path of the input file. This is an After
variable and it returns a string.

1539
tHDFSOutputRaw

ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component needs an input component that provides


the data of a single column. This column must be labeled to
content and its type must be Object.
For example, you can:
• use tConvertType to convert a column from String to
Object, or
• use tJavaRow to add the data to be processed into the
globalMap object so that this data becomes available
as a global variable for the other components such
as tFixedFlowInput to construct this required single
column.
For further information about tConvertType, see
tConvertType on page 504.
For further information about tJavaRow, see tJavaRow on
page 1845.
For further information about tFixedFlowInput, see
tFixedFlowInput on page 1200.
For further information about how to use a global variable,
see the section describing how to use contexts and
variables in Talend Studio User Guide.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic

1540
tHDFSOutputRaw

settings and context variables, see Talend Studio User


Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related Scenario
Once you have properly configured the connection to HDFS for this component, this component works
exactly the same way as tFileOutputRaw.
For further information about tFileOutputRaw, see tFileOutputRaw on page 1153.

1541
tHDFSProperties

tHDFSProperties
Creates a single row flow that displays the properties of a file processed in HDFS.

tHDFSProperties Standard properties


These properties are used to configure tHDFSProperties running in the Standard Job framework.
The Standard tHDFSProperties component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.

1542
tHDFSProperties

In Talend Exchange, members of Talend community


have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.

1543
tHDFSProperties

Ensure that the application to be used has appropriate


permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,

1544
tHDFSProperties

the user name of the machine hosting the Studio will be


used.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
The schema of this component is read-only. You can click
Edit schema to view the schema.

File Browse to, or enter the path pointing to the data to be used
in the file system.

Get file checksum Select this check box to generate and output the MD5
information of the file processed.
Note that this is an HDFS only checksum and not a true
MD5 hash that can be compared with the MD5 value
obtained, for example, from tFileInputProperties, For further
information about this component, see tFileInputProperties
on page 1079.

Advanced settings

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

1545
tHDFSProperties

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tHDFSProperties can be standalone component or send the


information it generates to its following component.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.

1546
tHDFSProperties

Without adding the specified library or libraries, you


may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitation JRE 1.6+ is required.

Related scenario
Related topic, see Procedure on page 1159
Related topic, see Iterating on a HDFS directory on page 1523

1547
tHDFSPut

tHDFSPut
Connects to Hadoop distributed file system to load large-scale files into it with optimized
performance.
tHDFSPut copies files from an user-defined directory, pastes them into a given Hadoop distributed
file system(HDFS) and if needs be, renames these files.

tHDFSPut Standard properties


These properties are used to configure tHDFSPut running in the Standard Job framework.
The Standard tHDFSPut component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add

1548
tHDFSPut

other required jar files which the base distribution does


not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:

1549
tHDFSPut

• In the Client ID and the Client key fields, enter,


respectively, the authentication ID and the
authentication key generated upon the registration of
the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the

1550
tHDFSPut

principal to be used is guest; in this situation, ensure that


user1 has the right to read the keytab file to be used.

User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

Local directory Local directory where are stored the files to be loaded into
HDFS.

HDFS directory Browse to, or enter the path pointing to the data to be used
in the file system.

Overwrite file Options to overwrite or not the existing file with the new
one.

Use Perl5 Regex Expression as Filemask Select this check box if you want to use Perl5 regular
expressions in the Files field as file filters. This is useful
when the name of the file to be used contains special
characters such as parentheses.
For information about Perl5 regular expression syntax, see
Perl5 Regular Expression Syntax.

Files In the Files area, the fields to be completed are:


- File mask: type in the file name to be selected from the
local directory. Regular expression is available.
- New name: give a new name to the loaded file.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://

1551
tHDFSPut

hadoop.apache.org/docs and then select the version of the


documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

Global Variables

Global Variables NB_FILE: the number of files processed. This is an After


variable and it returns an integer.
TRANSFER_MESSAGES: file transferred information. This is
an After variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component combines HDFS connection and data


extraction, thus usually used as a single-component subJob
to move data from a user-defined local directory to HDFS.
Different from the tHDFSInput and the tHDFSOutput
components, it runs standalone and does not generate input
or output flow for the other components.
It is often connected to the Job using OnSubjobOk or
OnComponentOk link, depending on the context.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.

1552
tHDFSPut

For examples on using dynamic parameters, see Reading


data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitations JRE 1.6+ is required.

Related scenario
For related scenario, see Computing data with Hadoop distributed file system on page 1498.

1553
tHDFSRename

tHDFSRename
Renames the selected files or specified directory on HDFS.

tHDFSRename Standard properties


These properties are used to configure tHDFSRename running in the Standard Job framework.
The Standard tHDFSRename component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository


Built-in: No property data stored centrally.
Repository: Select the repository file in which the
properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file

1554
tHDFSRename

should contain the libraries of the different Hadoop


elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the
authentication key generated upon the registration of

1555
tHDFSRename

the application that the current Job you are developing


uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

1556
tHDFSRename

User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

HDFS directory Browse to, or enter the path pointing to the data to be used
in the file system.

Overwrite file Select the options to overwrite or not the existing file with
the new one.

Files Click the [+] button to add the lines you want to use as
filters:
Filemask: enter the filename or filemask using
wildcharacters (*) or regular expressions.
New name: name to give to the HDFS file after the transfer.

Die on error This check box is selected by default. Clear the check box to
skip the row in error and complete the process for error-free
rows.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

1557
tHDFSRename

Global Variables

Global Variables NB_FILE: the number of files processed. This is an After


variable and it returns an integer.
CURRENT_STATUS: the execution result of the component.
This is a Flow variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is used to compose a single-component Job


or subJob.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for

1558
tHDFSRename

Windows is \lib\native\MapRClient.dll in the MapR clien


t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitation JRE 1.6+ is required.

Related scenario
For related scenario, see Computing data with Hadoop distributed file system on page 1498.

1559
tHDFSRowCount

tHDFSRowCount
Reads a file in HDFS row by row in order to determine the number of rows this file contains.
tHDFSRowCount counts the number of rows in a file in HDFS. If the file to be processed is a Hadoop
sequence file type or a large dataset, it is recommended to use a tAggregateRow to count the records.

tHDFSRowCount Standard properties


These properties are used to configure tHDFSRowCount running in the Standard Job framework.
The Standard tHDFSRowCount component belongs to the Big Data and the File families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property Type Built-In: You create and store the schema locally for this
component only.
Repository: You have already created the schema and stored
it in the Repository. You can reuse it in various projects and
Job designs.

Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse
the connection details already defined.
Note that when a Job contains the parent Job and the
child Job, Component List presents only the connection
components in the same Job level.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.

1560
tHDFSRowCount

2. Select Import from zip to import the configuration zip


for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Scheme Select the URI scheme of the file system to be used from
the Scheme drop-down list. This scheme could be
• HDFS
• WebHDFS. WebHDFS with SSL is not supported yet.
• ADLS. Only Azure Data Lake Storage Gen1 is supported.
The schemes present on this list vary depending on the
distribution you are using and only the scheme that appears
on this list with a given distribution is officially supported
by Talend.
Once a scheme is selected, the corresponding syntax such
as webhdfs://localhost:50070/ for WebHDFS is
displayed in the field for the NameNode location of your
cluster.
If you have selected ADLS, the connection parameters to be
defined become:
• In the Client ID and the Client key fields, enter,
respectively, the authentication ID and the

1561
tHDFSRowCount

authentication key generated upon the registration of


the application that the current Job you are developing
uses to access Azure Data Lake Storage.
Ensure that the application to be used has appropriate
permissions to access Azure Data Lake. You can
check this on the Required permissions view of this
application on Azure. For further information, see Azure
documentation Assign the Azure AD application to the
Azure Data Lake Storage account file or folder.
• In the Token endpoint field, copy-paste the OAuth 2.0
token endpoint that you can obtain from the Endpoints
list accessible on the App registrations page on your
Azure portal.
For a video demonstration, see Configure and use Azure in a
Job.

NameNode URI Type in the URI of the Hadoop NameNode, the master
node of a Hadoop system. For example, we assume that
you have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS
with SSL is not supported yet.

Use kerberos authentication If you are accessing the Hadoop cluster running with
Kerberos security, select this check box, then, enter the
Kerberos principal name for the NameNode in the field
displayed. This enables you to use your user name to
authenticate against the credentials stored in Kerberos.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab file.
A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used
in the Principal field and the access path to the keytab file
itself in the Keytab field. This keytab file must be stored in
the machine in which your Job actually runs, for example, on
a Talend Jobserver.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the
right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the
principal to be used is guest; in this situation, ensure that
user1 has the right to read the keytab file to be used.

1562
tHDFSRowCount

User name The User name field is available when you are not using
Kerberos to authenticate. In the User name field, enter the
login user name for your distribution. If you leave it empty,
the user name of the machine hosting the Studio will be
used.

Group Enter the membership including the authentication user


under which the HDFS instances were started. This field is
available depending on the distribution you are using.

File name Browse to, or enter the path pointing to the data to be used
in the file system.

Row separator The separator used to identify the end of a row.

Ignore empty rows Select this check box to skip the empty rows.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for database
data handling. The supported encodings depend on the
JVM that you are using. For more information, see https://
docs.oracle.com.

Compression Select the Uncompress the data check box to uncompress t


he input data.
Hadoop provides different compression formats that help
reduce the space needed for storing files and speed up data
transfer. When reading a compressed file, the Studio needs
to uncompress it before being able to feed it to the input
flow.

Advanced settings

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

1563
tHDFSRowCount

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables COUNT: the number of rows in a file. This is a Flow variable
and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tHDFSRowCount is a standalone component; it must be


used with a OnSubjobOk connection to tJava in order to
return the row count.
The valid code for tJava to get this count could be:

System.out.pri
nt(((Integer)globalMap.get("
tHDFSRowCount_1_COUNT")));

In this example, tHDFSRowCount_1 is the label of this


component in a Job, so it may vary among different use
cases; COUNT is the global variable of tHDFSRowCount,
representing the integer flow of the row count.
For further information about how to label a component or
how to use a global variable in a Job, see the Talend Studio
User Guide.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your HDFS
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to access
files in different HDFS systems or different distributions,
especially when you are working in an environment where
you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of
Talend Studio .
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.

1564
tHDFSRowCount

For examples on using dynamic parameters, see Reading


data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Limitation JRE 1.6+ is required.

Related scenarios
No scenario is available for the Standard version of this component yet.

1565
tHiveClose

tHiveClose
Closes connection to a Hive database.
tHiveClose closes an active connection to a database.

tHiveClose Standard properties


These properties are used to configure tHiveClose running in the Standard Job framework.
The Standard tHiveClose component belongs to the Big Data and the Databases families.
The component in this framework is available in all Talend products.

Basic settings

Component list If there is more than one connection used in the Job, select
tHiveConnection from the list.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at a
component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is to be used along with other Hive


components, especially with tHiveConnection as
tHiveConnection allows you to open a connection for the
transaction which is underway.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an

1566
tHiveClose

environment where you cannot change your Job settings, for


example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenarios
No scenario is available for the Standard version of this component yet.

1567
tHiveConnection

tHiveConnection
Establishes a Hive connection to be reused by other Hive components in your Job.
tHiveConnection opens a connection to a Hive database.

tHiveConnection Standard properties


These properties are used to configure tHiveConnection running in the Standard Job framework.
The Standard tHiveConnection component belongs to the Big Data, the Databases and the ELT
families.
The component in this framework is available in all Talend products.

Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:

API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.

Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.

Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.

• When you use this component with Google Dataproc:

Project identifier Enter the ID of your Google Cloud Platform project.


If you are not certain about your project ID, check it in the
Manage Resources page of your Google Cloud Platform
services.

Cluster identifier Enter the ID of your Dataproc cluster to be used.

1568
tHiveConnection

Region From this drop-down list, select the Google Cloud region
to be used.

Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.

Database Fill this field with the name of the database.

Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.

• When you use this component with HDInsight:

WebHCat configuration Enter the address and the authentication information


of the Microsoft HD Insight cluster to be used. For
example, the address could be your_hdinsight
_cluster_name.azurehdinsight.net and the
authentication information is your Azure account name:
ychen. The Studio uses this service to submit the Job to
the HD Insight cluster.
In the Job result folder field, enter the location in which
you want to store the execution result of a Job in the
Azure Storage to be used.

HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.
• The Password is defined when creating your
HDInsight cluster for authentication to this cluster.

Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.
In the Hostname field, enter the Primary Blob Service
Endpoint of your Azure Storage account without the
https:// part. You can find this endpoint in the Properties
blade of this storage account.

1569
tHiveConnection

In the Username field, enter the name of the Azure


Storage account to be used.
In the Password field, enter the access key of the Azure
Storage account to be used. This key can be found in the
Access keys blade of this storage account.

Database Fill this field with the name of the database.

• When you use the other distributions:

Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.

Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.

Host Database server IP address.

Port Listening port number of DB server.

Database Fill this field with the name of the database.

Note:
This field is not available when you select Embedded
from the Connection mode list.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter
the password between double quotes and click OK to
save the settings.

Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave
both the Force MapR ticket authentication check box
and the Use Kerberos authentication check box clear,
and then MapR should be able to automatically find
that ticket on the fly.

1570
tHiveConnection

The values of the following parameters can be found in


the hive-site.xml file of the Hive system to be used.
1. Hive principal uses the value of hive.metastore
.kerberos.principal. This is the service principal of the
Hive Metastore.
2. HiveServer2 local user principal uses the value of
hive.server2.authentication.kerberos.principal.
3. HiveServer2 local user keytab uses the value of
hive.server2.authentication.kerberos.keytab
4. Metastore URL uses the value of javax.jdo.opti
on.ConnectionURL. This is the JDBC connection string
to the Hive Metastore.
5. Driver class uses the value of javax.jdo.opti
on.ConnectionDriverName. This is the name of the
driver for the JDBC connection.
6. Username uses the value of javax.jdo.opti
on.ConnectionUserName. This, as well as the
Password parameter, is the user credential for
connecting to the Hive Metastore.
7. Password uses the value of javax.jdo.opti
on.ConnectionPassword.
For the other parameters that are displayed, please
consult the Hadoop configuration files they belong to.
For example, the Namenode principal can be found in
the hdfs-site.xml file or the hdfs-default.xml file of the
distribution you are using.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.
For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the
keytab file to be used.

Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog
box enter the password between double quotes and
click OK to save the settings.
This feature is available only to the HiveServer2 in the
Standalone mode of the following distributions:

1571
tHiveConnection

• Hortonworks Data Platform 2.0 +


• Cloudera CDH4 +
• Pivotal HD 2.0 +
• Amazon EMR 4.0.0 +

Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their
hostnames. This actually sets the dfs.client.use
.datanode.hostname property to true. When
connecting to a S3N filesystem, you must select this
check box.
For further information about these parameters, see
the documentation or contact the administrator of the
Hadoop cluster to be used.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.

Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you
have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.

1572
tHiveConnection

For further information about the Hadoop Map/Reduce


framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.

The other properties:

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if

1573
tHiveConnection

you have sufficient Hadoop experience to handle any


issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Inspect the classpath for configurations Select this check box to allow the component to check the
configuration files in the directory you have set with the
$HADOOP_CONF_DIR variable and directly read parameters
from these files in this directory. This feature allows you to
easily change the Hadoop configuration for the component
to switch between different environments, for example,
from a test environment to a production environment.
In this situation, the fields or options used to configure
Hadoop connection and/or Kerberos security are hidden.
If you want to use certain parameters such as the Kerberos
parameters but these parameters are not included in these
Hadoop configuration files, you need to create a file called
talend-site.xml and put this file into the same directory
defined with $HADOOP_CONF_DIR. This talend-site.xml file
should read as follows:

<!-- Put site-specific property


overrides in this file. -->
<configuration>
<property>
<name>talend.k
erberos.authentication </name>
<value>kinit </value>
<description> Set the
Kerberos authentication method to
use. Valid values are: kinit or
keytab. </description>
</property>
<property>
<name>talend.k
erberos.keytab.principal </name>
<value>[email protected] </
value>
<description> Set the
keytab's principal name. </
description>
</property>
<property>
<name>talend.k
erberos.keytab.path </name>
<value>/kdc/user.keytab </
value>
<description> Set the
keytab's path. </description>

1574
tHiveConnection

</property>
<property>
<name>talend.encryption </
name>
<value>none </value>
<description> Set the
encryption method to use. Valid
values are: none or ssl. </
description>
</property>
<property>
<name>talend.s
sl.trustStore.path </name>
<value>ssl </value>
<description> Set
SSL trust store path. </
description>
</property>
<property>
<name>talend.s
sl.trustStore.password </name>
<value>ssl </value>
<description> Set SSL
trust store password. </
description>
</property>
</configuration>
The parameters read from these configuration files override
the default ones used by the Studio. When a parameter
does not exist in these configuration files, the default one is
used.
Note that this option is available only in Hive Standalone
mode with Hive 2.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.

Execution engine Select this check box and from the drop-down list, select
the framework you need to use to run the Job.
This list is available only when you are using the Embedded
mode for the Hive connection and the distribution you are
working with is:
• Custom: this option allows you connect to a
distribution supporting Tez but not officially supported
by Talend .
Before using Tez, ensure that the Hadoop cluster you are
using supports Tez. You will need to configure the access to

1575
tHiveConnection

the relevant Tez libraries via the Advanced settings view of


this component.
For further information about Hive on Tez, see Apache's
related documentation in https://cwiki.apache.org/con
fluence/display/Hive/Hive+on+Tez. Some examples are
presented there to show how Tez can be used to gain
performance over MapReduce.

Store by HBase Select this check box to display the parameters to be set to
allow the Hive components to access HBase tables:
• Once this access is configured, you will be able to use,
in tHiveRow and tHiveInput, the Hive QL statements to
read and write data in HBase.
• If you are using the Kerberos authentication, you
need to define the HBase related principals in the
corresponding fields that are displayed.
For further information about this access involving Hive and
HBase, see Apache's Hive documentation about Hive/HBase
integration.

Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.

Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.

Define the jars to register for HBase Select this check box to display the Register jar for HBase
table, in which you can register any missing jar file required
by HBase, for example, the Hive Storage Handler, by default,
registered along with your Hive installation.

Register jar for HBase Click the [+] button to add rows to this table, then, in the
Jar name column, select the jar file(s) to be registered and
in the Jar path column, enter the path(s) pointing to that or
those jar file(s).

Advanced settings

Tez lib Select how the Tez libraries are accessed:


• Auto install: at runtime, the Job uploads and deploys
the Tez libraries provided by the Studio into the
directory you specified in the Install folder in HDFS
field, for example, /tmp/usr/tez.
If you have set the tez.lib.uris property in the properties
table, this directory overrides the value of that
property at runtime. But the other properties set in the
properties table are still effective.
• Use exist: the Job accesses the Tez libraries already
deployed in the Hadoop cluster to be used. You need
to enter the path pointing to those libraries in the Lib
path (folder or file) field.

1576
tHiveConnection

• Lib jar: this table appears when you have selected Auto
install from the Tez lib list and the distribution you are
using is Custom. In this table, you need to add the Tez
libraries to be uploaded.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones. For further information for
Hive dedicated properties, see https://cwiki.apache.org/con
fluence/display/Hive/AdminManual+Configuration.
• If you need to use Tez to run your Hive Job, add
hive.execution.engine to the Properties column and
Tez to the Value column, enclosing both of these
strings in double quotation marks.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.

Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are
both 1000 which are normally appropriate for running the
computations.

1577
tHiveConnection

Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.

tStatCatcher Statistics Select this check box to collect the log data at a component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is generally used with other Hive


components, particularly tHiveClose.
If the Studio used to connect to a Hive database is operated
on Windows, you must manually create a folder called tmp
in the root of the disk where this Studio is installed.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This

1578
tHiveConnection

allows the subscription-based users to make full use of


the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Connecting to a custom Hadoop distribution


As explained in the properties table, when you select the Custom option from the Distribution drop-
down list, you are connecting to a Hadoop distribution different from any of the Hadoop distributions
provided on that Distribution list in the Studio.

After selecting this Custom option, click the button to display the Import custom definition dialog
box and proceed as follows:

Procedure
1. Depending on your situation, select Import from existing version or Import from zip to configure
the custom Hadoop distribution to be connected to.
• If you have the zip file of the custom Hadoop distribution you need to connect to, select
Import from zip. Talend community provides this kind of zip files that you can download
from http://www.talendforge.org/exchange/index.php.
• Otherwise, select Import from existing version to import an officially supported Hadoop
distribution as base so as to customize it by following the wizard.

Note that the check boxes in the wizard allow you to select the Hadoop element(s) you need to
import. All the check boxes are not always displayed in your wizard depending on the context in
which you are creating the connection. For example, if you are creating this connection for a Hive
component, then only the Hive check box appears.
2. Whether you have selected Import from existing version or Import from zip, verify that each check
box next to the Hadoop element you need to import has been selected..
3. Click OK and then in the pop-up warning, click Yes to accept overwriting any custom setup of jar
files previously implemented.

1579
tHiveConnection

Once done, the Custom Hadoop version definition dialog box becomes active.

This dialog box lists the Hadoop elements and their jar files you are importing.
4. If you have selected Import from zip, click OK to validate the imported configuration.
If you have selected Import from existing version as base, you should still need to add more jar
files to customize that version. Then from the tab of the Hadoop element you need to customize,
for example, the HDFS/HCatalog tab, click the [+] button to open the Select libraries dialog box.
5. Select the External libraries option to open its view.
6. Browse to and select any jar file you need to import.
7. Click OK to validate the changes and to close the Select libraries dialog box.
Once done, the selected jar file appears on the list in the tab of the Hadoop element being
configured.
Note that if you need to share the custom Hadoop setup with another Studio, you can
export this custom connection from the Custom Hadoop version definition window using the

1580
tHiveConnection

button.
8. In the Custom Hadoop version definition dialog box, click OK to validate the customized
configuration. This brings you back to the Distribution list in the Basic settings view of the
component.

Results
Now that the configuration of the custom Hadoop version has been set up and you are back to the
Distribution list, you are able to continue to enter other parameters required by the connection.
If the custom Hadoop version you need to connect to contains YARN and you want to use it, select the
Use YARN check box next to the Distribution list.
A video is available in the following link to demonstrate, by taking HDFS as example, how to set up
the connection to a custom Hadoop cluster, also referred to as an unsupported Hadoop distribution:
How to add an unsupported Hadoop distribution to the Studio.

1581
tHiveConnection

Creating a partitioned Hive table


This scenario illustrates how to use tHiveConnection, tHiveCreateTable and tHiveLoad to create a
partitioned Hive table and write data in it.
Note that tHiveCreateTable and tHiveLoad are available only when you are using one of the Talend
solutions with Big Data.

The sample data to be used in this scenario is employee information of a company, reading as follows:

1;Lyndon;Fillmore;21-05-2008;US
2;Ronald;McKinley;15-08-2008
3;Ulysses;Roosevelt;05-10-2008
4;Harry;Harrison;23-11-2007
5;Lyndon;Garfield;19-07-2007
6;James;Quincy;15-07-2008
7;Chester;Jackson;26-02-2008
8;Dwight;McKinley;16-07-2008
9;Jimmy;Johnson;23-12-2007
10;Herbert;Fillmore;03-04-2008

The information contains some employees' names and the dates when they are registered in a HR
system. Since these employees work for the US subsidiary of the company, you will create a US
partition for this sample data.
Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to
access the Hive database to be used.
Note that if you are using the Windows operating system, you have to create a tmp folder at the root
of the disk where the Studio is installed.
Then proceed as follows:

Linking the components


Procedure
1. In the Integration perspective of the Studio, create an empty Job from the Job Designs node in
the Repository tree view.

1582
tHiveConnection

For further information about how to create a Job, see the chapter describing how to designing a
Job in Talend Studio User Guide.
2. Drop tHiveConnection, tHiveCreateTable and tHiveLoad onto the workspace.
3. Connect them using the Trigger > On Subjob OK link.

Configuring the connection to Hive


About this task
Configuring tHiveConnection

Procedure
1. Double-click tHiveConnection to open its Component view.

2. From the Property type list, select Built-in. If you have created the connection to be used in
Repository, then select Repository, click the button to open the Repository content dialog
box and select that connection. This way, the Studio will reuse that set of connection information
for this Job.
For further information about how to create a Hadoop connection in Repository, see the chapter
describing the Hadoop cluster node of the Talend Open Studio for Big Data Getting Started Guide .
3. In the Version area, select the Hadoop distribution to be used and its version. If you cannot find
from the list the distribution corresponding to yours, select Custom so as to connect to a Hadoop
distribution not officially supported in the Studio.
For a step-by-step example about how to use this Custom option, see Connecting to a custom
Hadoop distribution on page 1579.
4. In the Connection area, enter the connection parameters to the Hive database to be used.

1583
tHiveConnection

5. In the Name node field, enter the location of the master node, the NameNode, of the distribution
to be used. For example, talend-hdp-all:50300. If you are using WebHDFS, the location
should be webhdfs://masternode:portnumber; WebHDFS with SSL is not supported yet.
6. In the Job tracker field, enter the location of the JobTracker of your distribution. For example,
hdfs://talend-hdp-all:8020.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs
described in Apache's documentation on http://hadoop.apache.org/.

Configuring tHiveConnection

Procedure
1. Double-click tHiveConnection to open its Component view.

2. From the Property type list, select Built-in. If you have created the connection to be used in
Repository, then select Repository, click the button to open the Repository content dialog
box and select that connection. This way, the Studio will reuse that set of connection information
for this Job.
For further information about how to create a Hadoop connection in Repository, see the chapter
describing the Hadoop cluster node of the Talend Open Studio for Big Data Getting Started Guide .
3. In the Version area, select the Hadoop distribution to be used and its version. If you cannot find
from the list the distribution corresponding to yours, select Custom so as to connect to a Hadoop
distribution not officially supported in the Studio.
For a step-by-step example about how to use this Custom option, see Connecting to a custom
Hadoop distribution on page 1579.
4. In the Connection area, enter the connection parameters to the Hive database to be used.

1584
tHiveConnection

5. In the Name node field, enter the location of the master node, the NameNode, of the distribution
to be used. For example, talend-hdp-all:50300. If you are using WebHDFS, the location should be
webhdfs://masternode:portnumber; WebHDFS with SSL is not supported yet.
6. In the Job tracker field, enter the location of the JobTracker of your distribution. For example,
hdfs://talend-hdp-all:8020.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs
described in Apache's documentation on http://hadoop.apache.org/.

Creating the Hive table


Defining the schema

Procedure
1. Double-click tHiveCreateTable to open its Component view.

2. Select the Use an existing connection check box and from Component list, select the connection
configured in the tHiveConnection component you are using for this Job.
3. Click the button next to Edit schema to open the schema editor.
4. Click the button four times to add four rows and in the Column column, rename them to Id,
FirstName, LastName and Reg_date, respectively.

1585
tHiveConnection

Note that you cannot use the Hive reserved keywords to name the columns, such as location or
date.
5. In the Type column, select the type of the data in each column. In this scenario, Id is of the Integer
type, Reg_date is of the Date type and the others are of the String type.
6. In the DB type column, select the Hive type of each column corresponding to their data types you
have defined. For example, Id is of INT and Reg_date is of TIMESTAMP.
7. In the Data pattern column, define the pattern corresponding to that of the raw data. In this
example, use the default one.
8. Click OK to validate these changes.

Defining the table settings

Procedure
1. In Table name field, enter the name of the Hive table to be created. In this scenario, it is
employees.
2. From the Action on table list, select Create table if not exists.
3. From the Format list, select the data format that this Hive table in question is created for. In this
scenario, it is TEXTFILE.
4. Select the Set partitions check box to add the US partition as explained at the beginning of this
scenario. To define this partition, click the button next to Edit schema that appears.
5. Leave the Set file location check box clear to use the default path for Hive table.
6. Select the Set Delimited row format check box to display the available options of row format.
7. Select the Field check box and enter a semicolon (;) as field separator in the field that appears.
8. Select the Line check box and leave the default value as line separator.

Writing data to the table


About this task
Configuring tHiveLoad

1586
tHiveConnection

Procedure
1. Double-click tHiveLoad to open its Component view.

2. Select the Use an existing connection check box and from Component list, select the connection
configured in the tHiveConnection component you are using for this Job.
3. From the Load action field, select LOAD to write data from the file holding the sample data that is
presented at the beginning of this scenario.
4. In the File path field, enter the directory where the sample data is stored. In this example, the
data is stored in the HDFS system to be used.
In the real-world practice, you can use tHDFSOutput to write data into the HDFS system and you
need to ensure that the Hive application has the appropriate rights and permissions to read or
even move the data.
For further information about tHDFSOutput, see tHDFSOutput on page 1528.
for further information about the related rights and permissions, see the documentation or
contact the administrator of the Hadoop cluster to be used.
Note if you need to read data from a local file system other than the HDFS system, ensure that the
data to be read is stored in the local file system of the machine in which the Job is run and then
select the Local check box in this Basic settings view. For example, when the connection mode to
Hive is Standalone, the Job is run in the machine where the Hive application is installed and thus
the data should be stored in that machine.
5. In the Table name field, enter the name of the target table you need to load data in. In this
scenario, it is employees.
6. From the Action on file list, select APPEND.
7. Select the Set partitions check box and in the field that appears, enter the partition you need to
add data to. In this scenario, this partition is country='US'.

Configuring tHiveLoad

Procedure
1. Double-click tHiveLoad to open its Component view.

1587
tHiveConnection

2. Select the Use an existing connection check box and from Component list, select the connection
configured in the tHiveConnection component you are using for this Job.
3. From the Load action field, select LOAD to write data from the file holding the sample data that is
presented at the beginning of this scenario.
4. In the File path field, enter the directory where the sample data is stored. In this example,
the data is stored in the HDFS system to be used. In the real-world practice, you can use
tHDFSOutput to write data into the HDFS system and you need to ensure that the Hive application
has the appropriate rights and permissions to read or even move the data.
For further information about the related rights and permissions, see the documentation or
contact the administrator of the Hadoop cluster to be used.
Note if you need to read data from a local file system other than the HDFS system, ensure that the
data to be read is stored in the local file system of the machine in which the Job is run and then
select the Local check box in this Basic settings view. For example, when the connection mode to
Hive is Standalone, the Job is run in the machine where the Hive application is installed and thus
the data should be stored in that machine.
5. In the Table name field, enter the name of the target table you need to load data in. In this
scenario, it is employees.
6. From the Action on file list, select APPEND.
7. Select the Set partitions check box and in the field that appears, enter the partition you need to
add data to. In this scenario, this partition is country='US'.

Executing the Job


Then you can press F6 to run this Job.
Once done, the Run view is opened automatically, where you can check the execution process.
You can as well verify the results in the web console of the Hadoop distribution used.

1588
tHiveConnection

If you need to obtain more details about the Job, it is recommended to use the web console of the
Jobtracker provided by the Hadoop distribution you are using.

Creating a JDBC Connection to Azure HDInsight Hive


This scenario illustrates how to use tHiveConnection, tHiveInput and tHiveClose to create a JDBC
Connection to HDInsight Hive.

1589
tHiveConnection

Prerequisites
Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to
access the Hive database to be used.

Configuring a DataBase Connection to Hive


About this task
This example uses version 3.6 of Azure HDInsight.

Procedure
1. In the Repository view, extend the Metadata drop-down menu.
2. Click Db Connections, and then right-click Create Connection .
3. Give a name to your connection.

4. Click Next.
5. Set up the connection configuration similarly to the following table:

1590
tHiveConnection

DB Type Select Hive.

Hadoop Cluster Select None.

Distribution Select Horton Works.


HDInsight is leveraging Horton Works distribution on the
backend. This will allow you to use Horton Works libraries
to connect to HDInsighs.

DB Type Select Hive.

1591
tHiveConnection

Version Select Hortonworks Data Platform V2.6.0.3-8 [Built in].

Hive Model Select Standalone.

Login Fill in the fields as required.


Password
Server

Port Input 443.


You will be able to communicate through the proxy port
since the HDInsight cluster sits behind a proxy by default.

DataBase Leave default.

Additional JDBC Setting Input transportMode=http;ssl=true;


httpPath=/hive2, where:
• transportMode=http sets the transport mode
to HTTP instead of the default Hive JDBC transport
mode.
• SSL=true enables SSL.
• httpPath=/hive2 sets the HTTP endpoint.

6. Click Test Connection to ensure the Talend Studio connects successfully to the cluster.

Building the Job


Procedure
1. From the Repository view of the Talend Studio, right-click Job Designs, and then click Create
Standard Job.
2. Give a name to your Job.
3. Click Finish.

1592
tHiveConnection

4. Add a tPreJob component to your workspace.


5. Add a tHiveConnection component to your workspace.
6. Double click the tHiveConnection component and choose Repository as the Property Type and the
Database Connection created above.

7. Right-click the tPreJob component.


8. Select Trigger > On Component Ok and connect the tPreJob to the tHiveConnection.

1593
tHiveConnection

9. Add a tHiveInput component to your workspace.


10. Select it and check the box Use an existing connection, then select the tHiveConnection
component in the Component List drop-down menu.
11. In the Query field, input show tables to run a query displaying the available tables in the
database.

12. Add a tLogRow component to your workspace.


13. Right-click the tHiveInput component and select Row > Main.
14. Click the tLogRow component to connect both components. They will display the information
from the query above.
15. From the Component tab of the tLogRow, select Table (print values in celles of a table).

16. Add a tPostJob component to your workspace.


17. Add a tHiveClose component to your workspace.

1594
tHiveConnection

18. Connect the tPostJob component to the tHiveClose component using an On Component Ok
connection to close the connection opened.

19. From the Run tab, click Run to run the Job and ensure of a successful connection to Hive on
HDInsight and of the readability of the table data.

1595
tHiveCreateTable

tHiveCreateTable
Creates Hive tables that fit a wide range of Hive data formats.
A proper Hive data format such as RC or ORC allows you to obtain a better performance in processing
data with Hive.
tHiveCreateTable connects to the Hive database to be used and creates a Hive table that is dedicated
to data of the format you specify.

tHiveCreateTable Standard properties


These properties are used to configure tHiveCreateTable running in the Standard Job framework.
The Standard tHiveCreateTable component belongs to the Big Data and the Databases families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:

API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.

Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.

Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.

• When you use this component with Google Dataproc:

Project identifier Enter the ID of your Google Cloud Platform project.


If you are not certain about your project ID, check it in the
Manage Resources page of your Google Cloud Platform
services.

1596
tHiveCreateTable

Cluster identifier Enter the ID of your Dataproc cluster to be used.

Region From this drop-down list, select the Google Cloud region
to be used.

Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.

Database Fill this field with the name of the database.

Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.

• When you use this component with HDInsight:

WebHCat configuration Enter the address and the authentication information


of the Microsoft HD Insight cluster to be used. For
example, the address could be your_hdinsight
_cluster_name.azurehdinsight.net and the
authentication information is your Azure account name:
ychen. The Studio uses this service to submit the Job to
the HD Insight cluster.
In the Job result folder field, enter the location in which
you want to store the execution result of a Job in the
Azure Storage to be used.

HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.
• The Password is defined when creating your
HDInsight cluster for authentication to this cluster.

Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.
In the Hostname field, enter the Primary Blob Service
Endpoint of your Azure Storage account without the
https:// part. You can find this endpoint in the Properties
blade of this storage account.

1597
tHiveCreateTable

In the Username field, enter the name of the Azure


Storage account to be used.
In the Password field, enter the access key of the Azure
Storage account to be used. This key can be found in the
Access keys blade of this storage account.

Database Fill this field with the name of the database.

• When you use the other distributions:

Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.

Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.

Host Database server IP address.

Port Listening port number of DB server.

Database Fill this field with the name of the database.

Note:
This field is not available when you select Embedded
from the Connection mode list.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter
the password between double quotes and click OK to
save the settings.

Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave
both the Force MapR ticket authentication check box
and the Use Kerberos authentication check box clear,
and then MapR should be able to automatically find
that ticket on the fly.

1598
tHiveCreateTable

The values of the following parameters can be found in


the hive-site.xml file of the Hive system to be used.
1. Hive principal uses the value of hive.metastore
.kerberos.principal. This is the service principal of the
Hive Metastore.
2. HiveServer2 local user principal uses the value of
hive.server2.authentication.kerberos.principal.
3. HiveServer2 local user keytab uses the value of
hive.server2.authentication.kerberos.keytab
4. Metastore URL uses the value of javax.jdo.opti
on.ConnectionURL. This is the JDBC connection string
to the Hive Metastore.
5. Driver class uses the value of javax.jdo.opti
on.ConnectionDriverName. This is the name of the
driver for the JDBC connection.
6. Username uses the value of javax.jdo.opti
on.ConnectionUserName. This, as well as the
Password parameter, is the user credential for
connecting to the Hive Metastore.
7. Password uses the value of javax.jdo.opti
on.ConnectionPassword.
For the other parameters that are displayed, please
consult the Hadoop configuration files they belong to.
For example, the Namenode principal can be found in
the hdfs-site.xml file or the hdfs-default.xml file of the
distribution you are using.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.
For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the
keytab file to be used.

Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog
box enter the password between double quotes and
click OK to save the settings.
This feature is available only to the HiveServer2 in the
Standalone mode of the following distributions:

1599
tHiveCreateTable

• Hortonworks Data Platform 2.0 +


• Cloudera CDH4 +
• Pivotal HD 2.0 +
• Amazon EMR 4.0.0 +

Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their
hostnames. This actually sets the dfs.client.use
.datanode.hostname property to true. When
connecting to a S3N filesystem, you must select this
check box.
For further information about these parameters, see
the documentation or contact the administrator of the
Hadoop cluster to be used.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.

Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you
have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.

1600
tHiveCreateTable

For further information about the Hadoop Map/Reduce


framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.

The other properties:

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.

1601
tHiveCreateTable

2. Select Import from zip to import the configuration zip


for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

1602
tHiveCreateTable

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Table Name Name of the table to be created.

Action on table Select the action to be carried out for creating a table.

Format Select the data format to which the table to be created is


dedicated.
The available data formats vary depending on the version of
the Hadoop distribution you are using.
Note that when the file format to be used is PARQUET, you
might be prompted to find the specific PARQUET jar file and
install it into the Studio.
• When the connection mode to Hive is Embedded,
the Job is run in your local machine and calls this jar
installed in the Studio.
• When the connection mode to Hive is Standalone,
the Job is run in the server hosting Hive and this jar
file is sent to the HDFS system of the cluster you are
connecting to. Therefore, ensure that you have properly
defined the NameNode URI in the corresponding field
of the Basic settings view.
This jar file can be downloaded from Apache's site. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Inputformat class and Outputformat class These fields appear only when you have selected
INPUTFORMAT and OUTPUTFORMAT from the Format list.
These fields allow you to enter the name of the jar files to
be used for the data formats not available in the Format list.

Storage class Enter the name of the storage handler to be used for
creating a non-native table (Hive table stored and managed
in other systems than Hive, for example, Cassandra or
MongoDB).
This field is available only when you have selected
STORAGE from the Format list.
For further information about a storage handler, see https://
cwiki.apache.org/confluence/display/Hive/StorageHandlers.

Set partitions Select this check box to add partition columns to the table
to be created. Once selecting it, you need to define the
schema of the partition columns you need to add.

1603
tHiveCreateTable

Set file location If you want to create a Hive table in a directory other
than the default one, select this check box and enter the
directory in HDFS you want to use to hold the table content.
This is typical useful when you need to create an external
Hive table by selecting the Create an external table check
box in the Advanced settings tab.

Use S3 endpoint The Use S3 endpoint check box is displayed when you
have selected the Set file location check box to create an
external Hive table.
Once this Use S3 endpoint check box is selected, you need
to enter the following parameters in the fields that appear:
• S3 bucket: enter the name of the bucket in which you
need to create the table.
• Bucket name: enter the name of the bucket in which
you want to store the dependencies of your Job. This
bucket must already exist on S3.
• Temporary resource folder: enter the directory in
which you want to store the dependencies of your Job.
For example, enter temp_resources to write the
dependencies in the /temp_resources folder in the
bucket.
If this folder already exists at runtime, its contents are
overwritten by the upcoming dependencies; otherwise,
this folder is automatically created.
• Access key and Secret key: enter the authentication
information required to connect to the Amazon S3
bucket to be used.
To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog box
enter the password between double quotes and click
OK to save the settings.
Note that the format of the S3 file is S3N (S3 Native
Filesystem).
Since a Hive table created in S3 is actually an external
table, this Use S3 endpoint check box must be used with
the Create an external table case being selected.

Advanced settings

Like table Select this check box and enter the name of the Hive table
you want to copy. This allows you to copy the definition of
an existing table without copying its data.
For further information about the Like parameter, see
Apache's information about Hive's Data Definition
Language.

Create an external table Select this check box to make the table to be created an
external Hive table. This kind of Hive table leaves the raw
data where it is if the data is in HDFS.
An external table is usually the better choice for accessing
shared data existing in a file system.
For further information about an external Hive table, see
Apache's documentation about Hive.

1604
tHiveCreateTable

Table comment Enter the description you want to use for the table to be cre
ated.

As select Select this check box and enter the As select state
ment for creating a Hive table that is based on a Select
statement.

Set clustered_by or skewed_by statement Enter the Clustered by statement to cluster the data of
a table or a partition into buckets, or/and enter the Skewed
by statement to allow Hive to extract the heavily skewed
data and put it into separate files. This is typically used for
obtaining better performance during queries.

SerDe properties If you are using the SerDe row format, you can add any
custom SerDe properties to override the default ones used
by the Hadoop engine of the Studio.

Table properties Add any custom Hive table properties you want to override
the default ones used by the Hadoop engine of the Studio.

Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones. For further information for
Hive dedicated properties, see https://cwiki.apache.org/con
fluence/display/Hive/AdminManual+Configuration.

1605
tHiveCreateTable

• If you need to use Tez to run your Hive Job, add


hive.execution.engine to the Properties column and
Tez to the Value column, enclosing both of these
strings in double quotation marks.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.

Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are
both 1000 which are normally appropriate for running the
computations.
The memory parameters to be set are Map (in Mb), Reduce
(in Mb) and ApplicationMaster (in Mb). These fields allow
you to dynamically allocate memory to the map and the
reduce computations and the ApplicationMaster of YARN.

Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component works standalone.

1606
tHiveCreateTable

If the Studio used to connect to a Hive database is operated


on Windows, you must manually create a folder called tmp
in the root of the disk where this Studio is installed.

Row format Set Delimited row format

Set SerDe row format

Die on error

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.

1607
tHiveCreateTable

For further information about how to install a Hadoop


distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenario
For a related scenario, see Creating a partitioned Hive table on page 1582.

1608
tHiveInput

tHiveInput
Extracts data from Hive and sends the data to the component that follows.
tHiveInput is the dedicated component to the Hive database (the Hive data warehouse system). It can
execute a given HiveQL query in order to extract the data from Hive.
When ACID is enabled on the Hive side, a Spark Job cannot delete or update a table and unless data is
compacted, this Job cannot correctly read aggregated data from a Hive table, either. This is a known
limitation described in the Spark bug tracking system: https://issues.apache.org/jira/browse/SPAR
K-15348.

tHiveInput Standard properties


These properties are used to configure tHiveInput running in the Standard Job framework.
The Standard tHiveInput component belongs to the Big Data and the Databases families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:

API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.

Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.

Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.

• When you use this component with Google Dataproc:

Project identifier Enter the ID of your Google Cloud Platform project.

1609
tHiveInput

If you are not certain about your project ID, check it in the
Manage Resources page of your Google Cloud Platform
services.

Cluster identifier Enter the ID of your Dataproc cluster to be used.

Region From this drop-down list, select the Google Cloud region
to be used.

Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.

Database Fill this field with the name of the database.

Access Key and Secret Key Enter the authentication information obtained from
Google for tHiveInput to read temporary data from
Google Storage.
These keys can be consulted on the Interoperable Access
tab view under the Google Cloud Storage tab of the
project from the Google APIs Console.
To enter the secret key, click the [...] button next to the
secret key field, and then in the pop-up dialog box enter
the password between double quotes and click OK to
save the settings.
For more information about the access key and secret
key, go to https://developers.google.com/storage/docs
/reference/v1/getting-startedv1?hl=en/ and see the
description about developer keys.

Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.

• When you use this component with HDInsight:

WebHCat configuration Enter the address and the authentication information


of the Microsoft HD Insight cluster to be used. For
example, the address could be your_hdinsight
_cluster_name.azurehdinsight.net and the
authentication information is your Azure account name:
ychen. The Studio uses this service to submit the Job to
the HD Insight cluster.
In the Job result folder field, enter the location in which
you want to store the execution result of a Job in the
Azure Storage to be used.

HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.

1610
tHiveInput

• The Password is defined when creating your


HDInsight cluster for authentication to this cluster.

Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.
In the Hostname field, enter the Primary Blob Service
Endpoint of your Azure Storage account without the
https:// part. You can find this endpoint in the Properties
blade of this storage account.
In the Username field, enter the name of the Azure
Storage account to be used.
In the Password field, enter the access key of the Azure
Storage account to be used. This key can be found in the
Access keys blade of this storage account.

Database Fill this field with the name of the database.

• When you use the other distributions:

Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.

Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.

Host Database server IP address.

Port Listening port number of DB server.

Database Fill this field with the name of the database.

Note:
This field is not available when you select Embedded
from the Connection mode list.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter

1611
tHiveInput

the password between double quotes and click OK to


save the settings.

Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave
both the Force MapR ticket authentication check box
and the Use Kerberos authentication check box clear,
and then MapR should be able to automatically find
that ticket on the fly.
The values of the following parameters can be found in
the hive-site.xml file of the Hive system to be used.
1. Hive principal uses the value of hive.metastore
.kerberos.principal. This is the service principal of the
Hive Metastore.
2. HiveServer2 local user principal uses the value of
hive.server2.authentication.kerberos.principal.
3. HiveServer2 local user keytab uses the value of
hive.server2.authentication.kerberos.keytab
4. Metastore URL uses the value of javax.jdo.opti
on.ConnectionURL. This is the JDBC connection string
to the Hive Metastore.
5. Driver class uses the value of javax.jdo.opti
on.ConnectionDriverName. This is the name of the
driver for the JDBC connection.
6. Username uses the value of javax.jdo.opti
on.ConnectionUserName. This, as well as the
Password parameter, is the user credential for
connecting to the Hive Metastore.
7. Password uses the value of javax.jdo.opti
on.ConnectionPassword.
For the other parameters that are displayed, please
consult the Hadoop configuration files they belong to.
For example, the Namenode principal can be found in
the hdfs-site.xml file or the hdfs-default.xml file of the
distribution you are using.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.

1612
tHiveInput

For example, the user name you are using to execute


a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the
keytab file to be used.

Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog
box enter the password between double quotes and
click OK to save the settings.
This feature is available only to the HiveServer2 in the
Standalone mode of the following distributions:
• Hortonworks Data Platform 2.0 +
• Cloudera CDH4 +
• Pivotal HD 2.0 +
• Amazon EMR 4.0.0 +

Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their

1613
tHiveInput

hostnames. This actually sets the dfs.client.use


.datanode.hostname property to true. When
connecting to a S3N filesystem, you must select this
check box.
For further information about these parameters, see
the documentation or contact the administrator of the
Hadoop cluster to be used.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.

Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you
have chosen a machine called masternode as the
NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.

The other properties:

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster

1614
tHiveInput

and the Windows Azure Storage service of that


cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next

1615
tHiveInput

component. When you create a Spark Job, avoid the reserved


word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Table Name Name of the table to be processed.

Query type Either Built-in or Repository.

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.

Guess schema Click this button to retrieve the schema from the table.

This query uses Parquet objects When available, select this check box to indicate that the
table to be handled uses the PARQUET format and thus
make the component to call the required jar file.
Note that when the file format to be used is PARQUET, you
might be prompted to find the specific PARQUET jar file and
install it into the Studio.
• When the connection mode to Hive is Embedded,
the Job is run in your local machine and calls this jar
installed in the Studio.
• When the connection mode to Hive is Standalone,
the Job is run in the server hosting Hive and this jar
file is sent to the HDFS system of the cluster you are
connecting to. Therefore, ensure that you have properly
defined the NameNode URI in the corresponding field
of the Basic settings view.
This jar file can be downloaded from Apache's site. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

1616
tHiveInput

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.
For further information about the Hive query language, see
https://cwiki.apache.org/confluence/display/Hive/Languag
eManual.

Note: Compressed data in the form of Gzip or Bzip2 can


be processed through the query statements. For details,
see https://cwiki.apache.org/confluence/display/Hive/
CompressedStorage.
Hadoop provides different compression formats that help
reduce the space needed for storing files and speed up
data transfer. When reading a compressed file, the Studio
needs to uncompress it before being able to feed it to
the input flow.

Execution engine Select this check box and from the drop-down list, select
the framework you need to use to run the Job.
This list is available only when you are using the Embedded
mode for the Hive connection and the distribution you are
working with is:
• Custom: this option allows you connect to a
distribution supporting Tez but not officially supported
by Talend .
Before using Tez, ensure that the Hadoop cluster you are
using supports Tez. You will need to configure the access to
the relevant Tez libraries via the Advanced settings view of
this component.
For further information about Hive on Tez, see Apache's
related documentation in https://cwiki.apache.org/con
fluence/display/Hive/Hive+on+Tez. Some examples are
presented there to show how Tez can be used to gain
performance over MapReduce.

Advanced settings

Tez lib Select how the Tez libraries are accessed:


• Auto install: at runtime, the Job uploads and deploys
the Tez libraries provided by the Studio into the
directory you specified in the Install folder in HDFS
field, for example, /tmp/usr/tez.
If you have set the tez.lib.uris property in the properties
table, this directory overrides the value of that
property at runtime. But the other properties set in the
properties table are still effective.
• Use exist: the Job accesses the Tez libraries already
deployed in the Hadoop cluster to be used. You need
to enter the path pointing to those libraries in the Lib
path (folder or file) field.
• Lib jar: this table appears when you have selected Auto
install from the Tez lib list and the distribution you are
using is Custom. In this table, you need to add the Tez
libraries to be uploaded.

1617
tHiveInput

Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Remove leading and trailing whitespace from defined


columns.

Note:
Clear the Trim all the String/Char columns check box to
enable Trim column in this field.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones. For further information for
Hive dedicated properties, see https://cwiki.apache.org/con
fluence/display/Hive/AdminManual+Configuration.
• If you need to use Tez to run your Hive Job, add
hive.execution.engine to the Properties column and
Tez to the Value column, enclosing both of these
strings in double quotation marks.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.

1618
tHiveInput

Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are
both 1000 which are normally appropriate for running the
computations.
The memory parameters to be set are Map (in Mb), Reduce
(in Mb) and ApplicationMaster (in Mb). These fields allow
you to dynamically allocate memory to the map and the
reduce computations and the ApplicationMaster of YARN.

Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the benefit of flexible DB queries


and covers all possible Hive QL queries.
If the Studio used to connect to a Hive database is operated
on Windows, you must manually create a folder called tmp
in the root of the disk where this Studio is installed.

HBase Configuration Store by HBase

1619
tHiveInput

Note:
Available only when the Use an existing connection
check box is clear

Zookeeper quorum

Zookeeper client port

Define the jars to register for HBase

Register jar for HBase

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of

1620
tHiveInput

the Data viewer to view locally in the Studio the data


stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenarios
For a scenario about how an input component is used in a Job, see Writing columns from a MySQL
database to an output file using tMysqlInput on page 2440.
You need to keep in mind the parameters required by Hadoop, such as NameNode and Jobtracker,
when configuring this component since the component needs to connect to a Hadoop distribution.

1621
tHiveLoad

tHiveLoad
Writes data of different formats into a given Hive table or to export data from a Hive table to a
directory.
tHiveLoad connects to a given Hive database and copies or moves data into an existing Hive table or
a directory you specify.
The tHiveLoad component first prepares the lines to be written to Hive before eventually writing
them to Hive. This approach is more efficient with regard to Hive than the line-bye-line approach
typically employed by an output component. For this reason, tHiveOutput does not exist in a Job
designed in the Standard framework.

tHiveLoad Standard properties


These properties are used to configure tHiveLoad running in the Standard Job framework.
The Standard tHiveLoad component belongs to the Big Data and the Databases families.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:

API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.

Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.

Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.

• When you use this component with Google Dataproc:

Project identifier Enter the ID of your Google Cloud Platform project.

1622
tHiveLoad

If you are not certain about your project ID, check it in the
Manage Resources page of your Google Cloud Platform
services.

Cluster identifier Enter the ID of your Dataproc cluster to be used.

Region From this drop-down list, select the Google Cloud region
to be used.

Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.

Database Fill this field with the name of the database.

Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.

• When you use this component with HDInsight:

WebHCat configuration Enter the address and the authentication information


of the Microsoft HD Insight cluster to be used. For
example, the address could be your_hdinsight
_cluster_name.azurehdinsight.net and the
authentication information is your Azure account name:
ychen. The Studio uses this service to submit the Job to
the HD Insight cluster.
In the Job result folder field, enter the location in which
you want to store the execution result of a Job in the
Azure Storage to be used.

HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.
• The Password is defined when creating your
HDInsight cluster for authentication to this cluster.

Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.

1623
tHiveLoad

In the Hostname field, enter the Primary Blob Service


Endpoint of your Azure Storage account without the
https:// part. You can find this endpoint in the Properties
blade of this storage account.
In the Username field, enter the name of the Azure
Storage account to be used.
In the Password field, enter the access key of the Azure
Storage account to be used. This key can be found in the
Access keys blade of this storage account.

Database Fill this field with the name of the database.

• When you use the other distributions:

Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.

Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.

Host Database server IP address.

Port Listening port number of DB server.

Database Fill this field with the name of the database.

Note:
This field is not available when you select Embedded
from the Connection mode list.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter
the password between double quotes and click OK to
save the settings.

Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave

1624
tHiveLoad

both the Force MapR ticket authentication check box


and the Use Kerberos authentication check box clear,
and then MapR should be able to automatically find
that ticket on the fly.
The values of the following parameters can be found in
the hive-site.xml file of the Hive system to be used.
1. Hive principal uses the value of hive.metastore
.kerberos.principal. This is the service principal of the
Hive Metastore.
2. HiveServer2 local user principal uses the value of
hive.server2.authentication.kerberos.principal.
3. HiveServer2 local user keytab uses the value of
hive.server2.authentication.kerberos.keytab
4. Metastore URL uses the value of javax.jdo.opti
on.ConnectionURL. This is the JDBC connection string
to the Hive Metastore.
5. Driver class uses the value of javax.jdo.opti
on.ConnectionDriverName. This is the name of the
driver for the JDBC connection.
6. Username uses the value of javax.jdo.opti
on.ConnectionUserName. This, as well as the
Password parameter, is the user credential for
connecting to the Hive Metastore.
7. Password uses the value of javax.jdo.opti
on.ConnectionPassword.
For the other parameters that are displayed, please
consult the Hadoop configuration files they belong to.
For example, the Namenode principal can be found in
the hdfs-site.xml file or the hdfs-default.xml file of the
distribution you are using.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.
For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the
keytab file to be used.

Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog

1625
tHiveLoad

box enter the password between double quotes and


click OK to save the settings.
This feature is available only to the HiveServer2 in the
Standalone mode of the following distributions:
• Hortonworks Data Platform 2.0 +
• Cloudera CDH4 +
• Pivotal HD 2.0 +
• Amazon EMR 4.0.0 +

Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their
hostnames. This actually sets the dfs.client.use
.datanode.hostname property to true. When
connecting to a S3N filesystem, you must select this
check box.
For further information about these parameters, see
the documentation or contact the administrator of the
Hadoop cluster to be used.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.

Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you

1626
tHiveLoad

have chosen a machine called masternode as the


NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.

The other properties:

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add

1627
tHiveLoad

other required jar files which the base distribution does


not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Load action Select the action you need to carry for writing data into the
specified destination.
• When you select LOAD, you are moving or copying data
from a directory you specify.
• When you select INSERT, you are moving or copying
data based on queries.

Execution engine Select this check box and from the drop-down list, select
the framework you need to use to perform the INSERT
action.
This list is available only when you are using the Embedded
mode for the Hive connection and the distribution you are
working with is:
• Custom: this option allows you connect to a
distribution supporting Tez but not officially supported
by Talend .

1628
tHiveLoad

Before using Tez, ensure that the Hadoop cluster you are
using supports Tez. You will need to configure the access to
the relevant Tez libraries via the Advanced settings view of
this component.
For further information about Hive on Tez, see Apache's
related documentation in https://cwiki.apache.org/con
fluence/display/Hive/Hive+on+Tez. Some examples are
presented there to show how Tez can be used to gain
performance over MapReduce.

Target type This drop-down list appears only when you have selected
INSERT from the Load action list.
Select from this list the type of the location you need to
write data in.
• If you select Table as destination, you can still choose
to append data to or overwrite the contents in the
specified table.
• If you select Directory as destination, you are
overwriting the contents in the specified directory

Table name Enter the name of the Hive table you need to write data in.
Note that with the INSERT action, this field is available only
when you have selected Table from the Target type list.

File path Enter the directory you need to read data from or write data
in, depending on the action you have selected from the
Load action list.
• If you have selected LOAD: this is the path to the data
you want to copy or move into the specified Hive table.
• If you have selected INSERT: this is the directory to
which you want to export data from a Hive table. With
this action, the File path field is available only when
you have selected Directory from the Target type list.

The target table uses the Parquet format If the table in which you need to write data is a PARQUET
table, select this check box.
Note that when the file format to be used is PARQUET, you
might be prompted to find the specific PARQUET jar file and
install it into the Studio.
• When the connection mode to Hive is Embedded,
the Job is run in your local machine and calls this jar
installed in the Studio.
• When the connection mode to Hive is Standalone,
the Job is run in the server hosting Hive and this jar
file is sent to the HDFS system of the cluster you are
connecting to. Therefore, ensure that you have properly
defined the NameNode URI in the corresponding field
of the Basic settings view.
This jar file can be downloaded from Apache's site. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).
Then from the Compression list that appears, select the
compression mode you need to use to handle the PARQUET
file. The default mode is Uncompressed.

Action on file Select the action to be carried out for writing data.

1629
tHiveLoad

This list is available only when the target is a Hive


table; if the target is a directory, the action to be used is
automatically OVERWRITE.

Query This field appears when you have selected INSERT from the
Load action list.
Enter the appropriate query for selecting the data to be
exported to the specified Hive table or directory.

Local Select this check box to use the Hive LOCAL statement for
accessing a local directory. Note that this local directory is
actually in the machine in which the Job is run. Therefore,
when the connection mode to Hive is Standalone, the Job is
run in the machine where the Hive application is installed
and thus this local directory is in that machine.
This statement is used along with the directory you have
defined in the File path field. Therefore, this Local check
box is available only when the File path field is available.
• If you are using the LOAD action, tHiveLoad copies the
local data to the target table.
• If you are using the INSERT action, tHiveLoad copies
data to a local directory.
• If you leave this Local check box clear, the directory
defined in the File path field is assumed to be in the
HDFS system to be used and data will be moved to the
target location.
For further information about this LOCAL statement, see
Apache's documentation about Hive's Language.

Set partitions Select this check box to use the Hive Partition clause in
loading or inserting data in a Hive table. You need to enter
the partition keys and their values to be used in the field
that appears.
For example, enter contry='US', state='CA'. This makes a
partition clause reading Partition (contry='US',
state='CA'), that is to say, a US and CA partition.
Also, it is recommended to select the Create partition if not
exist check box that appears to ensure that you will not
create a duplicate partition.

Die on error Select this check box to kill the Job when an error occurs.

Advanced settings

Tez lib Select how the Tez libraries are accessed:


• Auto install: at runtime, the Job uploads and deploys
the Tez libraries provided by the Studio into the
directory you specified in the Install folder in HDFS
field, for example, /tmp/usr/tez.
If you have set the tez.lib.uris property in the properties
table, this directory overrides the value of that
property at runtime. But the other properties set in the
properties table are still effective.
• Use exist: the Job accesses the Tez libraries already
deployed in the Hadoop cluster to be used. You need
to enter the path pointing to those libraries in the Lib
path (folder or file) field.

1630
tHiveLoad

• Lib jar: this table appears when you have selected Auto
install from the Tez lib list and the distribution you are
using is Custom. In this table, you need to add the Tez
libraries to be uploaded.

Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.
• Apache also provides a page to list the Hive-related
properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones. For further information for
Hive dedicated properties, see https://cwiki.apache.org/con
fluence/display/Hive/AdminManual+Configuration.
• If you need to use Tez to run your Hive Job, add
hive.execution.engine to the Properties column and
Tez to the Value column, enclosing both of these
strings in double quotation marks.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.

Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce

1631
tHiveLoad

memory mb fields, respectively. By default, the values are


both 1000 which are normally appropriate for running the
computations.
The memory parameters to be set are Map (in Mb), Reduce
(in Mb) and ApplicationMaster (in Mb). These fields allow
you to dynamically allocate memory to the map and the
reduce computations and the ApplicationMaster of YARN.

Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component works standalone and supports writing a


wide range of data formats such as RC, ORC or AVRO.
If the Studio used to connect to a Hive database is operated
on Windows, you must manually create a folder called tmp
in the root of the disk where this Studio is installed.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the

1632
tHiveLoad

Component List box in the Basic settings view becomes


unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenario
For a related scenario, see Creating a partitioned Hive table on page 1582

1633
tHiveRow

tHiveRow
Acts on the actual DB structure or on the data without handling data itself, depending on the nature
of the query and the database.
tHiveRow executes the HiveQL query stated in the specified database. The row suffix means the
component implements a flow in the Job design although it does not provide output.
The SQLBuilder tool helps you write your HiveQL statements easily.
This component can also perform queries in a HBase database once the Store by HBase check box is
available and you have selected this check box.

tHiveRow Standard properties


These properties are used to configure tHiveRow running in the Standard Job framework.
The Standard tHiveRow component belongs to the Big Data and the Databases families.
The component in this framework is available in all Talend products.

Basic settings
Connection configuration:
• When you use this component with Qubole on AWS:

API Token Click the ... button next to the API Token field to enter
the authentication token generated for the Qubole user
account to be used. For further information about how to
obtain this token, see Manage Qubole account from the
Qubole documentation.
This token allows you to specify the user account you
want to use to access Qubole. Your Job automatically uses
the rights and permissions granted to this user account in
Qubole.

Cluster label Select the Cluster label check box and enter the name of
the Qubole cluster to be used. If leaving this check box
clear, the default cluster is used.
If you need details about your default cluster, ask the
administrator of your Qubole service. You can also read
this article from the Qubole documentaiton to find more
information about configuring a default Qubole cluster.

Change API endpoint Select the Change API endpoint check box and select
the region to be used. If leaving this check box clear, the
default region is used.
For further information about the Qubole Endpoints
supported on QDS-on-AWS, see Supported Qubole
Endpoints on Different Cloud Providers.

• When you use this component with Google Dataproc:

Project identifier Enter the ID of your Google Cloud Platform project.

1634
tHiveRow

If you are not certain about your project ID, check it in the
Manage Resources page of your Google Cloud Platform
services.

Cluster identifier Enter the ID of your Dataproc cluster to be used.

Region From this drop-down list, select the Google Cloud region
to be used.

Google Storage staging bucket As a Talend Job expects its dependent jar files for
execution, specify the Google Storage directory to which
these jar files are transferred so that your Job can access
these files at execution.
The directory to be entered must end with a slash (/). If
not existing, the directory is created on the fly but the
bucket to be used must already exist.

Database Fill this field with the name of the database.

Provide Google Credentials in file Leave this check box clear, when you launch your Job
from a given machine in which Google Cloud SDK has
been installed and authorized to use your user account
credentials to access Google Cloud Platform. In this
situation, this machine is often your local machine.
For further information about this Google Credentials file,
see the administrator of your Google Cloud Platform or
visit Google Cloud Platform Auth Guide.

• When you use this component with HDInsight:

WebHCat configuration Enter the address and the authentication information


of the Microsoft HD Insight cluster to be used. For
example, the address could be your_hdinsight
_cluster_name.azurehdinsight.net and the
authentication information is your Azure account name:
ychen. The Studio uses this service to submit the Job to
the HD Insight cluster.
In the Job result folder field, enter the location in which
you want to store the execution result of a Job in the
Azure Storage to be used.

HDInsight configuration • The Username is the one defined when creating your
cluster. You can find it in the SSH + Cluster login
blade of your cluster.
• The Password is defined when creating your
HDInsight cluster for authentication to this cluster.

Windows Azure Storage configuration Enter the address and the authentication information
of the Azure Storage account to be used. In this
configuration, you do not define where to read or write
your business data but define where to deploy your Job
only. Therefore always use the Azure Storage system for
this configuration.
In the Container field, enter the name of the container to
be used. You can find the available containers in the Blob
blade of the Azure Storage account to be used.
In the Deployment Blob field, enter the location in which
you want to store the current Job and its dependent
libraries in this Azure Storage account.

1635
tHiveRow

In the Hostname field, enter the Primary Blob Service


Endpoint of your Azure Storage account without the
https:// part. You can find this endpoint in the Properties
blade of this storage account.
In the Username field, enter the name of the Azure
Storage account to be used.
In the Password field, enter the access key of the Azure
Storage account to be used. This key can be found in the
Access keys blade of this storage account.

Database Fill this field with the name of the database.

• When you use the other distributions:

Connection mode Select a connection mode from the list. The options vary
depending on the distribution you are using.

Hive server Select the Hive server through which you want the Job
using this component to execute queries on Hive.
This Hive server list is available only when the Hadoop
distribution to be used such as HortonWorks Data
Platform V1.2.0 (Bimota) supports HiveServer2. It allows
you to select HiveServer2 (Hive 2), the server that better
support concurrent connections of multiple clients than
HiveServer (Hive 1).
For further information about HiveServer2, see https://
cwiki.apache.org/confluence/display/Hive/Setting+Up
+HiveServer2.

Host Database server IP address.

Port Listening port number of DB server.

Database Fill this field with the name of the database.

Note:
This field is not available when you select Embedded
from the Connection mode list.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter
the password between double quotes and click OK to
save the settings.

Use kerberos authentication If you are accessing a Hive Metastore running with
Kerberos security, select this check box and then enter
the relevant parameters in the fields that appear.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a
security-enabled MapR on page 1646.
Keep in mind that this configuration generates a
new MapR security ticket for the username defined
in the Job in each execution. If you need to reuse an
existing ticket issued for the same username, leave

1636
tHiveRow

both the Force MapR ticket authentication check box


and the Use Kerberos authentication check box clear,
and then MapR should be able to automatically find
that ticket on the fly.
The values of the following parameters can be found in
the hive-site.xml file of the Hive system to be used.
1. Hive principal uses the value of hive.metastore
.kerberos.principal. This is the service principal of the
Hive Metastore.
2. HiveServer2 local user principal uses the value of
hive.server2.authentication.kerberos.principal.
3. HiveServer2 local user keytab uses the value of
hive.server2.authentication.kerberos.keytab
4. Metastore URL uses the value of javax.jdo.opti
on.ConnectionURL. This is the JDBC connection string
to the Hive Metastore.
5. Driver class uses the value of javax.jdo.opti
on.ConnectionDriverName. This is the name of the
driver for the JDBC connection.
6. Username uses the value of javax.jdo.opti
on.ConnectionUserName. This, as well as the
Password parameter, is the user credential for
connecting to the Hive Metastore.
7. Password uses the value of javax.jdo.opti
on.ConnectionPassword.
For the other parameters that are displayed, please
consult the Hadoop configuration files they belong to.
For example, the Namenode principal can be found in
the hdfs-site.xml file or the hdfs-default.xml file of the
distribution you are using.
This check box is available depending on the Hadoop
distribution you are connecting to.

Use a keytab to authenticate Select the Use a keytab to authenticate check box to log
into a Kerberos-enabled system using a given keytab
file. A keytab file contains pairs of Kerberos principals
and encrypted keys. You need to enter the principal to
be used in the Principal field and the access path to the
keytab file itself in the Keytab field. This keytab file must
be stored in the machine in which your Job actually runs,
for example, on a Talend Jobserver.
Note that the user that executes a keytab-enabled Job
is not necessarily the one a principal designates but
must have the right to read the keytab file being used.
For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the
keytab file to be used.

Use SSL encryption Select this check box to enable the SSL or TLS encrypted
connection.
Then in the fields that are displayed, provide the
authentication information:
• In the Trust store path field, enter the path, or
browse to the TrustStore file to be used. By default,
the supported TrustStore types are JKS and PKCS 12.
• To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog

1637
tHiveRow

box enter the password between double quotes and


click OK to save the settings.
This feature is available only to the HiveServer2 in the
Standalone mode of the following distributions:
• Hortonworks Data Platform 2.0 +
• Cloudera CDH4 +
• Pivotal HD 2.0 +
• Amazon EMR 4.0.0 +

Set Resource Manager Select this check box and in the displayed field, enter the
location of the ResourceManager of your distribution. For
example, tal-qa114.talend.lan:8050.
Then you can continue to set the following parameters
depending on the configuration of the Hadoop cluster to
be used (if you leave the check box of a parameter clear,
then at runtime, the configuration about this parameter in
the Hadoop cluster to be used will be ignored ):
1. Select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the
field that appears.
2. Select the Set jobhistory address check box and
enter the location of the JobHistory server of the
Hadoop cluster to be used. This allows the metrics
information of the current Job to be stored in that
JobHistory server.
3. Select the Set staging directory check box and
enter this directory defined in your Hadoop cluster
for temporary files created by running programs.
Typically, this directory can be found under the
yarn.app.mapreduce.am.staging-dir property in the
configuration files such as yarn-site.xml or mapred-
site.xml of your distribution.
4. Allocate proper memory volumes to the Map and the
Reduce computations and the ApplicationMaster of
YARN by selecting the Set memory check box in the
Advanced settings view.
5. Select the Set Hadoop user check box and enter the
user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific
owner with appropriate read or write rights, this field
allows you to execute the Job directly under the user
name that has the appropriate rights to access the
file or directory to be processed.
6. Select the Use datanode hostname check box
to allow the Job to access datanodes via their
hostnames. This actually sets the dfs.client.use
.datanode.hostname property to true. When
connecting to a S3N filesystem, you must select this
check box.
For further information about these parameters, see
the documentation or contact the administrator of the
Hadoop cluster to be used.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.

Set NameNode URI Select this check box and in the displayed field, enter
the URI of the Hadoop NameNode, the master node
of a Hadoop system. For example, assuming that you

1638
tHiveRow

have chosen a machine called masternode as the


NameNode, then the location is hdfs://mastern
ode:portnumber. If you are using WebHDFS, the
location should be webhdfs://masternode:portnumber;
WebHDFS with SSL is not supported yet.
For further information about the Hadoop Map/Reduce
framework, see the Map/Reduce tutorial in Apache's
Hadoop documentation on http://hadoop.apache.org.

The other properties:

Property type Either Built-In or Repository.


Built-In: No property data stored centrally.
Repository: Select the repository file where the properties
are stored.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add

1639
tHiveRow

other required jar files which the base distribution does


not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Hive version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Execution engine Select this check box and from the drop-down list, select
the framework you need to use to run the Job.
This list is available only when you are using the Embedded
mode for the Hive connection and the distribution you are
working with is:
• Custom: this option allows you connect to a
distribution supporting Tez but not officially supported
by Talend .
Before using Tez, ensure that the Hadoop cluster you are
using supports Tez. You will need to configure the access to
the relevant Tez libraries via the Advanced settings view of
this component.
For further information about Hive on Tez, see Apache's
related documentation in https://cwiki.apache.org/con
fluence/display/Hive/Hive+on+Tez. Some examples are

1640
tHiveRow

presented there to show how Tez can be used to gain


performance over MapReduce.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Table Name Name of the table to be processed.

Query type Either Built-in or Repository.

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.

This query uses Parquet objects When available, select this check box to indicate that the
table to be handled uses the PARQUET format and thus
make the component to call the required jar file.
Note that when the file format to be used is PARQUET, you
might be prompted to find the specific PARQUET jar file and
install it into the Studio.
• When the connection mode to Hive is Embedded,
the Job is run in your local machine and calls this jar
installed in the Studio.
• When the connection mode to Hive is Standalone,
the Job is run in the server hosting Hive and this jar
file is sent to the HDFS system of the cluster you are
connecting to. Therefore, ensure that you have properly
defined the NameNode URI in the corresponding field
of the Basic settings view.

1641
tHiveRow

This jar file can be downloaded from Apache's site. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.
For further information about the Hive query language, see
https://cwiki.apache.org/confluence/display/Hive/Languag
eManual.

Note: Compressed data in the form of Gzip or Bzip2 can


be processed through the query statements. For details,
see https://cwiki.apache.org/confluence/display/Hive/
CompressedStorage.
Hadoop provides different compression formats that help
reduce the space needed for storing files and speed up
data transfer. When reading a compressed file, the Studio
needs to uncompress it before being able to feed it to
the input flow.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Store by HBase Select this check box to display the parameters to be set to
allow the Hive components to access HBase tables:
• Once this access is configured, you will be able to use,
in tHiveRow and tHiveInput, the Hive QL statements to
read and write data in HBase.
• If you are using the Kerberos authentication, you
need to define the HBase related principals in the
corresponding fields that are displayed.
For further information about this access involving Hive and
HBase, see Apache's Hive documentation about Hive/HBase
integration.

Zookeeper quorum Type in the name or the URL of the Zookeeper service you
use to coordinate the transaction between your Studio and
your database. Note that when you configure the Zookeeper,
you might need to explicitly set the zookeeper.znode.parent
property to define the path to the root znode that contains
all the znodes created and used by your database; then
select the Set Zookeeper znode parent check box to define
this property.

Zookeeper client port Type in the number of the client listening port of the
Zookeeper service you are using.

Define the jars to register for HBase Select this check box to display the Register jar for HBase
table, in which you can register any missing jar file required
by HBase, for example, the Hive Storage Handler, by default,
registered along with your Hive installation.

Register jar for HBase Click the [+] button to add rows to this table, then, in the
Jar name column, select the jar file(s) to be registered and
in the Jar path column, enter the path(s) pointing to that or
those jar file(s).

1642
tHiveRow

Advanced settings

Tez lib Select how the Tez libraries are accessed:


• Auto install: at runtime, the Job uploads and deploys
the Tez libraries provided by the Studio into the
directory you specified in the Install folder in HDFS
field, for example, /tmp/usr/tez.
If you have set the tez.lib.uris property in the properties
table, this directory overrides the value of that
property at runtime. But the other properties set in the
properties table are still effective.
• Use exist: the Job accesses the Tez libraries already
deployed in the Hadoop cluster to be used. You need
to enter the path pointing to those libraries in the Lib
path (folder or file) field.
• Lib jar: this table appears when you have selected Auto
install from the Tez lib list and the distribution you are
using is Custom. In this table, you need to add the Tez
libraries to be uploaded.

Temporary path If you do not want to set the Jobtracker and the
NameNode when you execute the query select * from
your_table_name, you need to set this temporary path.
For example, /C:/select_all in Windows.

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component is
usually followed by tParseRecordSet.

Hadoop properties Talend Studio uses a default configuration for its engine to
perform operations in a Hadoop distribution. If you need to
use a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.
For further information about the properties required by
Hadoop and its related systems such as HDFS and Hive,
see the documentation of the Hadoop distribution you are
using or see Apache's Hadoop documentation on http://
hadoop.apache.org/docs and then select the version of the
documentation you want. For demonstration purposes, the
links to some properties are listed below:
• Typically, the HDFS-related properties can be found in
the hdfs-default.xml file of your distribution, such as
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-
dist/hadoop-hdfs/hdfs-default.xml.

1643
tHiveRow

• Apache also provides a page to list the Hive-related


properties: https://cwiki.apache.org/confluence/display/
Hive/Configuration+Properties.

Hive properties Talend Studio uses a default configuration for its engine to
perform operations in a Hive database. If you need to use
a custom configuration in a specific situation, complete
this table with the property or properties to be customized.
Then at runtime, the customized property or properties will
override those default ones. For further information for
Hive dedicated properties, see https://cwiki.apache.org/con
fluence/display/Hive/AdminManual+Configuration.
• If you need to use Tez to run your Hive Job, add
hive.execution.engine to the Properties column and
Tez to the Value column, enclosing both of these
strings in double quotation marks.
• Note that if you are using the centrally stored metadata
from the Repository, this table automatically inherits
the properties defined in that metadata and becomes
uneditable unless you change the Property type from
Repository to Built-in.

Mapred job map memory mb and Mapred job reduce You can tune the map and reduce computations by selecting
memory mb the Set memory check box to set proper memory allocations
for the computations to be performed by the Hadoop
system.
In that situation, you need to enter the values you need in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are
both 1000 which are normally appropriate for running the
computations.

Path separator in server Leave the default value of the Path separator in server as
it is, unless you have changed the separator used by your
Hadoop distribution's host machine for its PATH variable
or in other words, that separator is not a colon (:). In that
situation, you must change this value to the one you are
using in that host.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

1644
tHiveRow

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule This component offers the benefit of flexible DB queries


and covers all possible Hive QL queries.
tHiveRow can capture the Application_ID values and write
them in the Job logs once you have activated Log4j and
set the Log4j output level to Info for your Job involving
tHiveRow.
• For further information about how to define the Log4j
output level at an individual Job level, search for
customizing log4j output level at runtime on Talend
Help Center (https://help.talend.com).
• For further information about how to configure Log4j
at the Studio level so as to apply the configuration to
all Jobs, search for configuring Log4j on Talend Help
Center (https://help.talend.com).
If the Studio used to connect to a Hive database is operated
on Windows, you must manually create a folder called tmp
in the root of the disk where this Studio is installed.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-

1645
tHiveRow

on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Connecting to a security-enabled MapR


When designing a Job, set up the authentication configuration in the component you are using
depending on how your MapR cluster is secured.
MapR supports the two following methods of authenticating a user and generating a MapR security
ticket for this user: a username/password pair and Kerberos.
For further information about the MapR security mechanism, see MapR security architecture.
For a scenario about how to secure a MapR cluster, see Getting started with MapR security.
The different security scenarios you may face with your MapR cluster:
• When your MapR cluster is secured with Kerberos only, you only need to set up the typical
Hadoop Kerberos configuration for your Job in the Studio.
• When your MapR cluster is secured with both the Kerberos mechanism and the MapR ticket
security mechanism, you need to accordingly set up the configuration for both of them in your Job
in the Studio.
For details about how to configure the MapR ticket security mechanism in the Studio, see Setting
up the MapR ticket authentication on page 1646.
• When your MapR cluster is secured with the MapR ticket security mechanism only, proceed
as explained in Setting up the MapR ticket authentication on page 1646 to set up the MapR
authentication configuration for your Job in the Studio.
For an example of how to configure Kerberos authentication for a Talend Job, see How to use
Kerberos in Talend Studio with Big Data.
Although this example uses Cloudera for demonstration, the operations it describes are generic and
thus applicable to MapR as well.

Setting up the MapR ticket authentication


Before you begin
• The MapR distribution you are using is from version 4.0.1 onwards and you have selected it as the
cluster to connect to in the component to be configured.
• The MapR cluster has been properly installed and is running.

1646
tHiveRow

• Ensure that you have installed the MapR client in the machine where the Studio is, and added the
MapR client library to the PATH variable of that machine. According to MapR's documentation,
the library or libraries of a MapR client corresponding to each OS version can be found under
MAPR_INSTALL\ hadoop\hadoop-VERSION\lib\native. For example, the library for Windows is
\lib\native\MapRClient.dll in the MapR client jar file. For further information, see the following
link from MapR: http://www.mapr.com/blog/basic-notes-on-configuring-eclipse-as-a-hadoop-
development-environment-for-mapr.
Without adding the specified library or libraries, you may encounter the following error: no
MapRClient in java.library.path.
• This section explains only the authentication parameters to be used to connect to MapR. You still
need to define the other parameters required by your Job.
For further information, see the documentation about each component you are using.

About this task


In a Standard Job, you need to set up this configuration in the Basic settings tab of a Hadoop-related
component to be used by your Job.
In the tab, you need to proceed as follows:

Procedure
1. Select the Force MapR ticket authentication check box to display the related parameters to be
defined.
2. In the Username field, enter the username to be authenticated and in the Password field, specify
the password used by this user.
To enter the password, click the [...] button next to the password field, and then in the pop-up
dialog box enter the password between double quotes and click OK to save the settings.
A MapR security ticket is generated for this user by MapR and stored in the machine where the Job
you are configuring is executed.
3. If the Group field is available in this tab, you need to enter the name of the group to which the
user to be authenticated belongs.
4. In the Cluster name field, enter the name of the MapR cluster you want to use this username to
connect to.
This cluster name can be found in the mapr-clusters.conf file located in /opt/mapr/conf of the
cluster.
5. In the Ticket duration field, enter the length of time (in seconds) during which the ticket is valid.

Setting the environment variable for a custom MapR ticket location (optional)

If the default MapR ticket location, /tmp/maprticket_<uid>, has been changed, set
MAPR_TICKETFILE_LOCATION environment variable accordingly in the machine in which your Job is
executed.
As MapR does not provide any API to specify a MapR ticket, setting the environment variable is the
only way to use a custom MapR ticket location in your Job. For further information about this issue,
see this post from the MapR forum.
This procedure is necessary only when you are storing the MapR tickets in a custom location. If you
use the default MapR ticket location, skip this procedure.

1647
tHiveRow

Setting the environment variable for a custom MapR ticket location on Mac (optional)

About this task


This procedure is relevant only when you are storing the MapR tickets in a custom location and you
are using Mac to run your Studio.

Procedure
1. In the machine in which your Job is executed, add these lines to ~/.bashrc:

Example

export MAPR_TICKETFILE_LOCATION=/Users/$USER/maprticket_$UID
launchctl setenv MAPR_TICKETFILE_LOCATION /Users/$USER/maprticket_$UID

2. Shutdown your Studio if it is open and each and every time you boot your Mac workstation, open a
terminal session before starting the Studio.
Setting the environment variable for a custom MapR ticket location on other operating systems
(optional)

About this task


This procedure is relevant only when you are storing the MapR tickets in a custom location and you
are not using Mac to run your Studio. If you use the default MapR ticket location, skip this procedure.

Procedure
1. In the machine in which your Job is executed, run the following command in a commandline
terminal to set the MAPR_TICKETFILE_LOCATION variable in memory.

Example

set MAPR_TICKETFILE_LOCATION=<your_custom_location>

2. Shutdown your Studio if it is open and use the same terminal to restart your Studio.
If you use a Talend JobServer to run your Job, use the same terminal to restart this JobServer.
This way, your Job retrieves this custom location from memory.

Using a custom MapR security configuration in the mapr.login.conf file (optional)

If the default security configuration of your MapR cluster has been changed, you need to configure the
Job to be executed to take this custom security configuration into account.
MapR specifies its security configuration in the mapr.login.conf file located in /opt/mapr/conf of the
cluster. For further information about this configuration file and the Java service it uses behind, see
mapr.login.conf and JAAS.
If no change has been made in the mapr.login.conf file, skip this procedure.

About this task


To configure your Job, you need to define the related parameters in the Basic settings tab and the
Advanced settings tab of the Component view of the component you want your Job to use to connect
to MapR.
Proceed as follows to do the configuration:

1648
tHiveRow

Procedure
1. Verify what has been changed about this mapr.login.conf file.
You should be able to obtain the related information from the administrator or the developer of
your MapR cluster.
2. If the location of the MapR configuration files has been changed to somewhere else in the
cluster, that is to say, the MapR Home directory has been changed, select the Set the MapR Home
directory check box and enter the new Home directory. Otherwise, leave this check box clear and
the default Home directory is used.
3. If the login module to be used in the mapr.login.conf file has been changed, select the Specify the
Hadoop login configuration check box and enter the module to be called from the mapr.login.conf
file. Otherwise, leave this check box clear and the default login module is used.
For example, enter kerberos to call the hadoop_kerberos module or hybrid to call the hadoop_hybrid
module.

Related scenarios
For related topics, see:
• Combining two flows for selective output on page 2503.
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.
You need to keep in mind the parameters required by Hadoop, such as NameNode and Jobtracker,
when configuring this component since the component needs to connect to a Hadoop distribution.

1649
tHSQLDbInput

tHSQLDbInput
Executes a DB query with a strictly defined order which must correspond to the schema definition and
then it passes on the field list to the next component via a Main row link.
tHSQLDbInput reads a database and extracts fields based on a query.

tHSQLDbInput Standard properties


These properties are used to configure tHSQLDbInput running in the Standard Job framework.
The Standard tHSQLDbInput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Running Mode Select on the list the Server Mode corresponding to your DB
setup among the four propositions :
HSQLDb Server, HSQLDb WebServer, HSQLDb In Process
Persistent, HSQLDb In Memory.

Use TLS/SSL sockets Select this check box to enable the secured mode if req
uired.

Host Database server IP address.

Port Listening port number of DB server.

Database Alias Alias name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

DB path Specify the directory to the database you want to connect


to. This field is available only to the HSQLDb In Process
Persistent running mode.

1650
tHSQLDbInput

Note:
By default, if the database you specify in this field does
not exist, it will be created automatically. If you want
to change this default setting, modify the connection
parameter set in the Additional JDBC parameter field in
the Advanced settings view

Db name Enter the database name that you want to connect to. This
field is available only to the HSQLDb In Process Persistent
running mode and the HSQLDb In Memory running mode.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. When the running mode is
HSQLDb In Process Persistent , you can set the connection
property ifexists=true to allow connection to an existing
database only and avoid creating a new database.

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Remove leading and trailing whitespace from defined


columns.

1651
tHSQLDbInput

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: Indicates the number of lines processed. This is an


After variable and it returns an integer.
QUERY: Indicates the query to be processed. This is a Flow
variable and it returns a string.
For further information about variables, see Talend Studio
User Guide.

Note:
A Flow variable means it functions during the execution
of a component while an After variable means it
functions after the execution of a component.

Usage

Usage rule This component covers all possible SQL queries for
HSQLDb databases.

Connections Outgoing links (from this component to another):


Row: Main; Iterate
Trigger: Run if; On Component Ok; On Component Error; On
Subjob Ok; On Subjob Error.

Incoming links (from one component to this one):


Row: Iterate;
Trigger: Run if; On Component Ok; On Component Error; On
Subjob Ok; On Subjob Error.

For further information regarding connections, see Talend


Studio User Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:

1652
tHSQLDbOutput

tHSQLDbOutput
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tHSQLDbOutput writes, updates, makes changes or suppresses entries in a database.

tHSQLDbOutput Standard properties


These properties are used to configure tHSQLDbOutput running in the Standard Job framework.
The Standard tHSQLDbOutput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Running Mode Select on the list the Server Mode corresponding to your DB
setupamong the four propositions :
HSQLDb Server, HSQLDb WebServer, HSQLDb In Process
Persistent, HSQLDb In Memory.

Use TLS/SSL sockets Select this check box to enable the secured mode if req
uired.

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

DB path Specify the directory to the database you want to connect


to. This field is available only to the HSQLDb In Process
Persistent running mode.

1653
tHSQLDbOutput

Note:
By default, if the database you specify in this field does
not exist, it will be created automatically. If you want
to change this default setting, modify the connection
parameter set in the Additional JDBC parameter field in
the Advanced settings view

Db name Enter the database name that you want to connect to. This
field is available only to the HSQLDb In Process Persistent
running mode and the HSQLDb In Memory running mode.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.

Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next

1654
tHSQLDbOutput

component. When you create a Spark Job, avoid the reserved


word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. When the running mode is
HSQLDb In Process Persistent , you can set the connection
property ifexists=true to allow connection to an existing
database only and avoid creating a new database.

Note:
You can press Ctrl+Space to access a list of predefined
global variables.

Commit every Enter the number of rows to be completed before


committing batches of rows together into the DB. This
option ensures transaction quality (but not rollback) and,
above all, better performance at execution.

Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not

1655
tHSQLDbOutput

insert, nor update or delete actions, or action that require


particular preprocessing.

  Name: Type in the name of the schema column to be


altered or inserted as new column

  SQL expression: Type in the SQL statement to be executed


in order to alter or insert the relevant column data.

  Position: Select Before, Replace or After following the


action to be performed on the reference column.

  Reference column: Type in a column of reference that the


tDBOutput can use to place or replace the new or altered
column.

Use field options Select this check box to customize a request, especially
when there is double action on data.

Debug query mode Select this check box to display each step during processing
entries in a database.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
QUERY: the query statement processed. This is an After
variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of

1656
tHSQLDbOutput

a table in a MySQL database. It also allows you to create a


reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.

Connections Outgoing links (from this component to another):


Row: Main; Reject
Trigger: Run if; On Component Ok; On Component Error; On
Subjob Ok; On Subjob Error.

Incoming links (from one component to this one):


Row: Main;
Trigger: Run if; On Component Ok; On Component Error; On
Subjob Ok; On Subjob Error.

For further information regarding connections, see Talend


Studio User Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see
• Inserting a column and altering data using tMysqlOutput on page 2466.

1657
tHSQLDbRow

tHSQLDbRow
Acts on the actual DB structure or on the data (although without handling data), depending on the
nature of the query and the database.
The SQLBuilder tool helps you write easily your SQL statements.
tHSQLDbRow is the specific component for this database query. It executes the SQL query stated onto
the specified database. The row suffix means the component implements a flow in the job design
although it doesn't provide output.

tHSQLDbRow Standard properties


These properties are used to configure tHSQLDbRow running in the Standard Job framework.
The Standard tHSQLDbRow component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Running Mode Select on the list the Server Mode corresponding to your DB
setup among the four propositions :
HSQLDb Server, HSQLDb WebServer, HSQLDb In Process
Persistent, HSQLDb In Memory.

Use TLS/SSL sockets Select this check box to enable the secured mode if req
uired.

Host Database server IP address

Port Listening port number of DB server.

Database Alias Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

DB path Specify the directory to the database you want to connect


to. This field is available only to the HSQLDb In Process
Persistent running mode.

1658
tHSQLDbRow

Note:
By default, if the database you specify in this field does
not exist, it will be created automatically. If you want
to change this default setting, modify the connection
parameter set in the Additional JDBC parameter field in
the Advanced settings view

Database Enter the database name that you want to connect to. This
field is available only to the HSQLDb In Process Persistent
running mode and the HSQLDb In Memory running mode.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type Either Built-in or Repository .

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

1659
tHSQLDbRow

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. When the running mode is
HSQLDb In Process Persistent , you can set the connection
property ifexists=true to allow connection to an existing
database only and avoid creating a new database.

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables QUERY: Indicates the query to be processed. This is a Flow


variable and it returns a string.
For further information about variables, see Talend Studio
User Guide.

Note:
A Flow variable means it functions during the execution
of a component while an After variable means it
functions after the execution of a component.

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.

Connections Outgoing links (from this component to another):


Row: Main; Reject; Iterate
Trigger: Run if; On Component Ok; On Component Error; On
Subjob Ok; On Subjob Error.

Incoming links (from one component to this one):


Row: Main; Iterate
Trigger: Run if; On Component Ok; On Component Error; On
Subjob Ok; On Subjob Error.

For further information regarding connections, see Talend


Studio User Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can

1660
tHSQLDbRow

find more details about how to install external modules in


Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.

1661
tHttpRequest

tHttpRequest
Sends an HTTP request to the server and outputs the response information locally.
tHttpRequest sends an HTTP request to the server end and gets the corresponding response
information from the server end.

tHttpRequest Standard properties


These properties are used to configure tHttpRequest running in the Standard Job framework.
The Standard tHttpRequest component belongs to the Internet family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number
of fields to be processed and passed on to the next
component. The schema is either Built-in or stored
remotely in the Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User
Guide.

  Repository: You have already created the schema and


stored it in the Repository. You can reuse it in various
projects and Job designs. Related topic: see Talend Studio
User Guide.

Sync columns Click this button to retrieve the schema from the preceding
component.

URI Type in the Uniform Resource Identifier (URI) that identifies


the data resource on the server. A URI is similar to a URL,
but more general.

Method Select an HTTP method to define the action to be


performed:
Post: Sends data (for example HTML form data) to the server
end.

1662
tHttpRequest

Get: Retrieves data from the server end.

Post parameters from file Browse to, or enter the path to the file that is used to
provide parameters (request body) to the POST method.

Write response content to file Select this check box to save the HTTP response to a local
file. You can either type in the file path in the input field or
click the three-dot button to browse to the file path.

Create directory if not exists Select this check box to create the directory defined in the
Write response content to file field if it does not exist.
This check box appears only when the Write response
content to file check box is selected and is cleared by
default.

Headers Type in the name-value pair(s) for HTTP headers to define


the parameters of the requested HTTP operation.
Key: Fill in the name of the header field of an HTTP header.
Value: Fill in the content of the header field of an HTTP
header.
For more information about definition of HTTP headers,
please refer to:
en.wikipedia.org/wiki/List_of_HTTP_headers.

Need authentication Select this check box to fill in a user name and a password
in the corresponding fields if authentication is needed:
user: Fill in the user name for the authentication.
password: Fill in the password for the authentication.
To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows.

Advanced settings

Set timeout Select this check box to specify the connect and read
timeout values in the following two fields:
• Connect timeout(s): Enter the connect timeout value in
seconds. An exception will occur if the timeout expires
before the connection can be established. The value of
0 indicates an infinite time out. By default, the connect
timeout value is 30.
• Read timeout(s): Enter the read timeout value in
seconds. An exception will occur if the timeout expires
before there is data available for read. By default, the
read timeout value is 0, which indicates an infinite
time out.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level and at each component level.

1663
tHttpRequest

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
CONNECTED: the result of whether a connection to the
server established. This is an After variable and it returns a
boolean.
RESPONSE_CODE: the response code returned by the remote
HTTP server. This is an After variable and it returns an
integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component can be used in sending HTTP requests to


server and saving the response information. This component
can be used as a standalone component.

Sending a HTTP request to the server and saving the


response information to a local file
This scenario describes a two-component Job that uses the GET method to retrieve information from
the server end and writes the response to a local file as well as to the console.

Linking the components


Procedure
1. In the Integration perspective of the Studio, create a Job from the Job Designs node in the
Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2. Drop the following components from the Palette onto the design workspace: tHttpRequest and
tLogRow.

3. Connect the tHttpRequest component to the tLogRow component using a Row > Main connection.

1664
tHttpRequest

Configuring the GET request


Procedure
1. Double-click the tHttpRequest component to open its Basic settings view and define the
component properties.

2. Fill in the URI field with "http://192.168.0.63:8081/testHttpRequest/build.xml". Note that this URI is
for demonstration purposes only and it is not a live address.
3. From the Method list, select GET.
4. Select the Write response content to file check box and fill in the input field on the right with the
file path by manual entry, D:/test.txt for this use case.
5. Select the Need authentication check box and fill in the user and password, both tomcat in this
use case.

Executing the Job


About this task
Then you can run this Job.
The tLogRow component is used to present the execution result of the Job.

Procedure
1. If you want to configure how the result is presented by tLogRow, double-click the component to
open its Component view and in the Mode area, select the Table (print values in cells of a table)
check box.
2. Press F6 to run this Job.

Results
Once done, the response information from the server is saved and displayed.

1665
tHttpRequest

Sending a POST request from a local JSON file


In this scenario, a four-component Job is used to read parameters from a given JSON file and send it in
a POST request to a web site.

The JSON file to be used reads as follows:

{"echo":
[
{
"data":"e=hello"
}
]
}

From that file, tFileInputJSON reads the e parameter and its value hello and tHttpRequest sends
the pair to http://echo.itcuties.com/, an URL provided for demonstration by an online programming
community, www.itcuties.com.
Note that the e parameter is required by http://echo.itcuties.com/.

Linking the components


Procedure
1. In the Integration perspective of the Studio, create an empty Job, named httpRequestPostDemo
for example, from the Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2. Drop tFileInputJSON, tFileOutputDelimited, tHttpRequest and tLogRow onto the workspace.
3. Connect tFileInputJSON to tHttpRequest using the Trigger > On Subjob Ok link.
4. Connect the other components using the Row > Main link.

1666
tHttpRequest

Reading the JSON file


Procedure
1. Double-click tFileInputJSON to open its Component view.

2. Select JsonPath without loop from the Read By drop-down list.


3. Click the [...] button next to Edit schema to open the schema editor.

4. Click the [+] button to add one row and name it, for example, to data.
5. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog
box.
6. In the Filename field, browse, or enter the path to the source JSON file in which the parameter to
be sent is stored.
7. In the Mapping table, the data column you defined in the previous step in the component schema
has been automatically added. In the JSONPath query column of this table, enter the JSON path,
in double quotation marks, to extract the parameter to be sent. In this scenario, the path is
echo[0].data.

Writing the parameter to a flat file


Procedure
1. Double-click tFileOutputDelimited to open its Component view.

1667
tHttpRequest

2. In the File name field, browse, or enter the path to the flat file in which you want to write the
extracted parameter. This file will be created if it does not exist. In this example, it is C:/tmp/
postParamsFile.txt.

Posting the parameter


Procedure
1. Double-click tHttpRequest to open its Component view.

2. In the URI field, enter the server address to which the parameter is to be sent. In this scenario, it is
http://echo.itcuties.com/.
3. From the Method list, select POST.
4. In the Post parameters from file field, browse, or enter the path to the flat file that contains the
parameter to be used. As defined earlier with the tFileOutputDelimited component, this path is C:/
tmp/postParamsFile.txt.

Executing the Job


Press F6 to run this Job.
The tLogRow component is used to present the execution result of the Job.
Once done, the Run view is opened automatically, where you can check the execution result.

1668
tHttpRequest

You can read that the site receiving the parameter returns answers.

1669
tImpalaClose

tImpalaClose
Closes connection to an Impala database.
tImpalaClose closes an active connection to a given Impala database.

tImpalaClose Standard properties


These properties are used to configure tImpalaClose running in the Standard Job framework.
The Standard tImpalaClose component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Component list If there is more than one connection used in the Job, select
tImpalaConnection from the list.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at a
component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is to be used along with the other Impala
components, especially with tImpalaConnection as
tImpalaConnection allows you to open a connection for the
transaction which is underway.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in

1670
tImpalaClose

different databases, especially when you are working in an


environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenarios
No scenario is available for the Standard version of this component yet.

1671
tImpalaConnection

tImpalaConnection
Establishes an Impala connection to be reused by other Impala components in your Job.
tImpalaConnection opens a connection to an Impala database.

tImpalaConnection Standard properties


These properties are used to configure tImpalaConnection running in the Standard Job framework.
The Standard tImpalaConnection component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your

1672
tImpalaConnection

connection accordingly. However, because of


the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Impala version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Host Database server IP address.

Port DB server listening port.

Database Fill this field with the name of the database.

Username DB user authentication data.

Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

1673
tImpalaConnection

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at a
component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is generally used with other Impala


components, particularly tImpalaClose.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

1674
tImpalaConnection

Related scenario
This component is used in the similar way as a tHiveConnection component is. For further
information, see Creating a partitioned Hive table on page 1582.

1675
tImpalaCreateTable

tImpalaCreateTable
Creates Impala tables that fit a wide range of Impala data formats.
tImpalaCreateTable connects to the Impala database to be used and creates an Impala table that is
dedicated to data of the format you specify.

tImpalaCreateTable Standard properties


These properties are used to configure tImpalaCreateTable running in the Standard Job framework.
The Standard tImpalaCreateTable component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that
cluster in the areas that are displayed. For detailed
explanation about these parameters, search for

1676
tImpalaCreateTable

configuring the connection manually on Talend Help


Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Impala version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Host Database server IP address.

Port Listening port number of DB server.

Database Fill this field with the name of the database.

1677
tImpalaCreateTable

Username and Password DB user authentication data.

Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Table Name Name of the table to be created.

Action on table Select the action to be carried out for creating a table.

1678
tImpalaCreateTable

Format Select the data format to which the table to be created is


dedicated.
The available data formats vary depending on the version of
the Hadoop distribution you are using.
Note that when the file format to be used is PARQUET, you
might be prompted to find the specific PARQUET jar file and
install it into the Studio.
• When the connection mode to Hive is Embedded,
the Job is run in your local machine and calls this jar
installed in the Studio.
• When the connection mode to Hive is Standalone,
the Job is run in the server hosting Hive and this jar
file is sent to the HDFS system of the cluster you are
connecting to. Therefore, ensure that you have properly
defined the NameNode URI in the corresponding field
of the Basic settings view.
This jar file can be downloaded from Apache's site. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Set partitions Select this check box to add partition columns to the table
to be created. Once selecting it, you need to define the
schema of the partition columns you need to add.

Set file location If you want to create an Impala table in a directory other
than the default one, select this check box and enter the
directory in HDFS you want to use to hold the table content.
This is typical useful when you need to create an external
Impala table by selecting the Create an external table
check box in the Advanced settings tab.

Use S3 endpoint The Use S3 endpoint check box is displayed when you
have selected the Set file location check box to create an
external Impala table.
Once this Use S3 endpoint check box is selected, you need
to enter the following parameters in the fields that appear:
• S3 bucket: enter the name of the bucket in which you
need to create the table.
• Bucket name: enter the name of the bucket in which
you want to store the dependencies of your Job. This
bucket must already exist on S3.
• Temporary resource folder: enter the directory in
which you want to store the dependencies of your Job.
For example, enter temp_resources to write the
dependencies in the /temp_resources folder in the
bucket.
If this folder already exists at runtime, its contents are
overwritten by the upcoming dependencies; otherwise,
this folder is automatically created.
• Access key and Secret key: enter the authentication
information required to connect to the Amazon S3
bucket to be used.
To enter the password, click the [...] button next to
the password field, and then in the pop-up dialog box
enter the password between double quotes and click
OK to save the settings.

1679
tImpalaCreateTable

Note that the format of the S3 file is S3N (S3 Native


Filesystem).
Since an Impala table created in S3 is actually an external
table, this Use S3 endpoint check box must be used with the
Create an external table case being selected.

Advanced settings

Like table Select this check box and enter the name of the Impala
table you want to copy. This allows you to copy the
definition of an existing table without copying its data.
For further information about the Like parameter, see
Cloudera's information about Impala's Data Definition
Language.

Create an external table Select this check box to make the table to be created an
external Impala table. This kind of Impala table leaves the
raw data where it is if the data is in HDFS.
An external table is usually the better choice for accessing
shared data existing in a file system.
For further information about an external Impala table, see
Cloudera's documentation about Impala.

Table comment Enter the description you want to use for the table to be cre
ated.

As select Select this check box and enter the As select statement
for creating an Impala table that is based on a Select
statement.

Table properties Add any custom Impala table properties you want to
override the default ones used by the Hadoop engine of the
Studio.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

1680
tImpalaCreateTable

Usage

Usage rule This component works standalone.

Row format Set Delimited row format

Die on error

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

1681
tImpalaCreateTable

Related scenario
This component is used in the similar way as a tHiveCreateTable component is. For further
information, see Creating a partitioned Hive table on page 1582.

1682
tImpalaInput

tImpalaInput
Executes the select queries to extract the corresponding data and sends the data to the component
that follows.
tImpalaInput is the dedicated component to the Impala database (the Impala data warehouse system).
It executes the given Impala SQL query in order to extract the data of interest from Impala. It provides
the SQLBuilder tool to help you write your Impala SQL statements easily.

tImpalaInput Standard properties


These properties are used to configure tImpalaInput running in the Standard Job framework.
The Standard tImpalaInput component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

Repository : Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster

1683
tImpalaInput

and the Windows Azure Storage service of that


cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Impala version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Host Database server IP address.

Port Listening port number of DB server.

1684
tImpalaInput

Database Fill this field with the name of the database.

Username DB user authentication data.

Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Table Name Name of the table to be processed.

Query type Either Built-in or Repository.

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

1685
tImpalaInput

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.

Guess schema Click this button to retrieve the schema from the table.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Advanced settings

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Remove leading and trailing whitespace from defined


columns.

Note:
Clear the Trim all the String/Char columns check box to
enable Trim column in this field.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the benefit of flexible DB queries


and covers all possible Impala SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an

1686
tImpalaInput

environment where you cannot change your Job settings, for


example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenarios
For a scenario about how an input component is used in a Job, see Writing columns from a MySQL
database to an output file using tMysqlInput on page 2440.

1687
tImpalaLoad

tImpalaLoad
Writes data of different formats into a given Impala table or to export data from an Impala table to a
directory.
tImpalaLoad connects to a given Impala database and copies or moves data into an existing Impala
table or a directory you specify.

tImpalaLoad Standard properties


These properties are used to configure tImpalaLoad running in the Standard Job framework.
The Standard tImpalaLoad component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that

1688
tImpalaLoad

cluster in the areas that are displayed. For detailed


explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Impala version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Host Database server IP address.

Port Listening port number of DB server.

1689
tImpalaLoad

Database Fill this field with the name of the database.

Username DB user authentication data.

Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Load action Select the action you need to carry for writing data into the
specified destination.
• When you select LOAD, you are moving or copying data
from a directory you specify.
• When you select INSERT, you are moving or copying
data based on queries.

Target type This drop-down list appears only when you have selected
INSERT from the Load action list.
Select from this list the type of the location you need to
write data in.
• If you select Table as destination, you can still choose
to append data to or overwrite the contents in the
specified table. This is the only option in the current
release.

Action Select whether you want to OVERWRITE the old data


already existing in the destination or only APPEND the new
data to the existing one.

Table name Enter the name of the Hive table you need to write data in.
Note that with the INSERT action, this field is available only
when you have selected Table from the Target type list.

File path Enter the directory you need to read data from.

Query This field appears when you have selected INSERT from the
Load action list.
Enter the appropriate query for selecting the data to be
exported to the specified Impala table or directory.

Set partitions Select this check box to use the Impala Partition clause
in loading or inserting data in a Impala table. You need to
enter the partition keys and their values to be used in the
field that appears.

1690
tImpalaLoad

For example, enter contry='US', state='CA'. This makes a


partition clause reading Partition (contry='US',
state='CA'), that is to say, a US and CA partition.
Also, it is recommended to select the Create partition if
not exist check box that appears to ensure that you will not
create a duplicate partition.

Die on error Select this check box to kill the Job when an error occurs.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component works standalone.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic

1691
tImpalaLoad

settings and context variables, see Talend Studio User


Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenario
This component is used in the similar way as a tHiveLoad component is. For further information, see
Creating a partitioned Hive table on page 1582.

1692
tImpalaOutput

tImpalaOutput
Executes the action defined on the data contained in the table, based on the flow incoming from the
preceding component in the Job.
tImpalaOutput connects to an Impala database (the Impala data warehouse system) and writes data in
an Impala table.

tImpalaOutput Standard properties


These properties are used to configure tImpalaOutput running in the Standard Job framework.
The Standard tImpalaOutput component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

Repository : Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster
and the Windows Azure Storage service of that

1693
tImpalaOutput

cluster in the areas that are displayed. For detailed


explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Impala version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Host Database server IP address.

Port Listening port number of DB server.

1694
tImpalaOutput

Database Fill this field with the name of the database.

Username DB user authentication data.

Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Table Name Name of the table you need to write data in.

Action Select whether you want to OVERWRITE the old data


already existing in the destination or only APPEND the new
data to the existing one.

1695
tImpalaOutput

Extended insert Select this check box to combine multiple rows of data
into one single INSERT action. This can speed up the insert
operation.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the benefit of flexible DB queries


and covers all possible Impala SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.

1696
tImpalaOutput

• Ensure that you have installed the MapR client in the


machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenarios
For a scenario about how an output component is used in a Job, see Inserting a column and altering
data using tMysqlOutput on page 2466.

1697
tImpalaRow

tImpalaRow
Acts on the actual DB structure or on the data (although without handling data).
The SQLBuilder tool helps you write your Impala SQL statements easily. tImpalaRow is the dedicated
component for this database. It executes the Impala SQL query stated in the specified database. The
Row suffix means the component implements a flow in the Job design although it does not provide
output.

tImpalaRow Standard properties


These properties are used to configure tImpalaRow running in the Standard Job framework.
The Standard tImpalaRow component belongs to the Big Data family.
The component in this framework is available in all Talend products with Big Data and in Talend Data
Fabric.

Basic settings

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

Repository : Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Distribution Select the cluster you are using from the drop-down list.
The options in the list vary depending on the component
you are using. Among these options, the following ones
requires specific configuration:
• If available in this Distribution drop-down list, the
Microsoft HD Insight option allows you to use a
Microsoft HD Insight cluster. For this purpose, you need
to configure the connections to the HD Insightcluster

1698
tImpalaRow

and the Windows Azure Storage service of that


cluster in the areas that are displayed. For detailed
explanation about these parameters, search for
configuring the connection manually on Talend Help
Center (https://help.talend.com).
• If you select Amazon EMR, find more details about
Amazon EMR getting started in Talend Help Center
(https://help.talend.com).
• The Custom option allows you to connect to a cluster
different from any of the distributions given in this
list, that is to say, to connect to a cluster not officially
supported by Talend .
1. Select Import from existing version to import an
officially supported distribution as base and then add
other required jar files which the base distribution does
not provide.
2. Select Import from zip to import the configuration zip
for the custom distribution to be used. This zip file
should contain the libraries of the different Hadoop
elements and the index file of these libraries.
In Talend Exchange, members of Talend community
have shared some ready-for-use configuration zip
files which you can download from this Hadoop
configuration list and directly use them in your
connection accordingly. However, because of
the ongoing evolution of the different Hadoop-
related projects, you might not be able to find the
configuration zip corresponding to your distribution
from this list; then it is recommended to use the
Import from existing version option to take an existing
distribution as base to add the jars required by your
distribution.
Note that custom versions are not officially supported
by Talend . Talend and its community provide you
with the opportunity to connect to custom versions
from the Studio but cannot guarantee that the
configuration of whichever version you choose will
be easy, due to the wide range of different Hadoop
distributions and versions that are available. As such,
you should only attempt to set up such a connection if
you have sufficient Hadoop experience to handle any
issues on your own.

Note:
In this dialog box, the active check box must be
kept selected so as to import the jar files pertinent
to the connection to be created between the custom
distribution and this component.

For a step-by-step example about how to connect to


a custom distribution and share this connection, see
Hortonworks.

Impala version Select the version of the Hadoop distribution you are using.
The available options vary depending on the component
you are using.

Host Database server IP address.

Port Listening port number of DB server.

1699
tImpalaRow

Database Fill this field with the name of the database.

Username DB user authentication data.

Use kerberos authentication If you are accessing an Impala system running with
Kerberos security, select this check box and then enter the
Kerberos principal of this Impala system.
• If this cluster is a MapR cluster of the version 5.0.0
or later, you can set the MapR ticket authentication
configuration in addition or as an alternative by
following the explanation in Connecting to a security-
enabled MapR on page 1646.
Keep in mind that this configuration generates a new
MapR security ticket for the username defined in the
Job in each execution. If you need to reuse an existing
ticket issued for the same username, leave both the
Force MapR ticket authentication check box and the
Use Kerberos authentication check box clear, and then
MapR should be able to automatically find that ticket
on the fly.
This check box is available depending on the Hadoop
distribution you are connecting to.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Table Name Name of the table to be processed.

Query type Either Built-in or Repository.

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

1700
tImpalaRow

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Guess Query Click the Guess Query button to generate the query which
corresponds to your table schema in the Query field.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the benefit of flexible DB queries


and covers all possible Impala SQL queries.

1701
tImpalaRow

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Prerequisites The Hadoop distribution must be properly installed, so as to


guarantee the interaction with Talend Studio . The following
list presents MapR related information for example.
• Ensure that you have installed the MapR client in the
machine where the Studio is, and added the MapR
client library to the PATH variable of that machine.
According to MapR's documentation, the library or
libraries of a MapR client corresponding to each OS
version can be found under MAPR_INSTALL\ hadoop\ha
doop-VERSION\lib\native. For example, the library for
Windows is \lib\native\MapRClient.dll in the MapR clien
t jar file. For further information, see the following link
from MapR: http://www.mapr.com/blog/basic-notes-
on-configuring-eclipse-as-a-hadoop-development-
environment-for-mapr.
Without adding the specified library or libraries, you
may encounter the following error: no MapRClient
in java.library.path.
• Set the -Djava.library.path argument, for
example, in the Job Run VM arguments area of the
Run/Debug view in the Preferences dialog box in the
Window menu. This argument provides to the Studio
the path to the native library of that MapR client. This
allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data
stored in MapR.
For further information about how to install a Hadoop
distribution, see the manuals corresponding to the Hadoop
distribution you are using.

Related scenarios
For related topics, see:
• Combining two flows for selective output on page 2503.

1702
tImpalaRow

• Procedure on page 622.


• Removing and regenerating a MySQL table index on page 2497.

1703
tInfiniteLoop

tInfiniteLoop
Executes a task or a Job automatically, based on a loop.
tInfiniteLoop runs an infinite loop on a task.

tInfiniteLoop Standard properties


These properties are used to configure tInfiniteLoop running in the Standard Job framework.
The Standard tInfiniteLoop component belongs to the Orchestration family.
The component in this framework is available in all Talend products.

Basic settings

Wait at each iteration (in milliseconds) Enter the time delay between iterations.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at a component
level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
CURRENT_ITERATION: the sequence number of the current
iteration. This is a Flow variable and it returns an integer.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule tInifniteLoop is an input component and requires an Iterate


link to connect it to the following component.

Connections Outgoing links (from this component to another):


Row: Iterate
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error.

Incoming links (from one component to this one):

1704
tInfiniteLoop

Row: Iterate;
Trigger: On Subjob Ok; On Subjob Error; Run if; On
Component Ok; On Component Error; Synchronize;
Parallelize.

For further information regarding connections, see Talend


Studio User Guide.

Related scenario
For an example of the kind of scenario in which tInifniteLoop might be used, see Procedure on page
1980, regarding the tLoop component.

1705
tInformixBulkExec

tInformixBulkExec
Executes Insert operations in Informix databases.
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT operation used
to feed a database. These two steps are fused together in the tInformixOutputBulkExec component.
The advantage of using two components is that data can be transformed before it is loaded in the
database.

tInformixBulkExec Standard properties


These properties are used to configure tInformixBulkExec running in the Standard Job framework.
The Standard tInformixBulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Execution Platform Select the operating system you are using.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

1706
tInformixBulkExec

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address.

Port DB server listening port.

Database Name of the database.

Schema Name of the schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Instance Name of the Informix instance to be used. This information


can generally be found in the SQL hosts file.

Table Name of the table to be written. Note that only one table
can be written at a time.

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

1707
tInformixBulkExec

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Informix Directory Informix installation directory, e.g. " C:\Program Files\IBM


\IBM Informix Dynamic Server\11.50\".

Data file Name of the file to be loaded.

Action on data On the data of the table defined, you can perform the
following operations:
Insert: Add new data to the table. If duplicates are found,
the job stops.
Update: Update the existing table data.
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Delete the entry data which corresponds to the
input flow.

Warning:
You must specify at least one key upon which the Update
and Delete operations are to be based. It is possible to
define the columns which should be used as the key from
the schema, from both the Basic Settings and the Advanced
Settings, to optimise these operations.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if

1708
tInformixBulkExec

you have selected the Use an existing connection check box


in the Basic settings.

Field terminated by Character, string or regular expression which separates the


fields.

Set DBMONEY Select this check box to define the decimal separator in the
Decimal separator field.

Set DBDATE Select the date format that you want to apply.

Rows Before Commit Enter the number of rows to be processed before the
commit.

Bad Rows Before Abort Enter the number of rows in error at which point the Job
should stop.

tStat Catcher Statistics Select this check box to colelct the log data at component
level.

Output Where the output should go.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
QUERY: the query statement processed. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers database query flexibility and covers
all possible DB2 queries which may be required.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned

1709
tInformixBulkExec

in your Job. This feature is useful when you need to acces


s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation The database server/client must be installed on the same


machine where the Studio is installed or where the Job
using tInformixBulkExec is deployed, so that the component
functions properly.
This component requires installation of its related jar files.

Related scenario
For a scenario in which tInformixBulkExec might be used, see:
• Inserting transformed data in MySQL database on page 2482.
• Truncating and inserting file data into an Oracle database on page 2681.

1710
tInformixClose

tInformixClose
Closes connection to Informix databases.
tInformixClose closes an active connection to a database.

tInformixClose Standard properties


These properties are used to configure tInformixClose running in the Standard Job framework.
The Standard tInformixClose component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list If there is more than one connection used in the Job, select
tInformixConnection from the list.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at a component
level.

Usage

Usage rule This component is generally used as an input component. It


requires an output component.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

1711
tInformixClose

Related scenario
This component is for use with tInformixConnection and tInformixRollback. They are generally used
along with tInformixConnection as the latter allows you to open a connection for the transaction
which is underway.
To see a scenario in which tInformixClose might be used, see tMysqlConnection on page 2425.

1712
tInformixCommit

tInformixCommit
Makes a global commit just once instead of commiting every row or batch of rows separately.
This component improves performance and is closely related to tInformixConnection and
tInformixRollback. They are generally used to execute transactions together.
tInformixCommit validates data processed in a job from a connected database.

tInformixCommit Standard properties


These properties are used to configure tInformixCommit running in the Standard Job framework.
The Standard tInformixCommit component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list If there is more than one connection in the Job, select
tInformixConnection from the list.

Close connection This check box is selected by default. It means that the
database conenction will be closed once the commit has
been made. Clear the check box to continue using the
connection once the component has completed its task.

Warning:
If you are using a Row > Main type connection to link
tInformixCommit to your Job, your data will be committed
row by row. If this is the case, do not select this check bx
otherwise the conenction will be closed before the commit
of your first row is finalized.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at a component
level.

Usage

Usage rule This component is generally used along with Informix


components, particularly tInformixConnection and
tInformixRollback.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database

1713
tInformixCommit

connection dynamically from multiple connections planned


in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related Scenario
This component is for use with tInformixConnection and tInformixRollback. They are generally used
along with tInformixConnection as the latter allows you to open a connection for the transaction
which is underway
To see a scenario in which tInformixCommit might be used, see Inserting data in mother/daughter
tables on page 2426.

1714
tInformixConnection

tInformixConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tInformixConnection is closely related to tInformixCommit and tInformixRollback. They are generally
used along with tInformixConnection, with tInformixConnection opening the connection for the
transaction.

tInformixConnection Standard properties


These properties are used to configure tInformixConnection running in the Standard Job framework.
The Standard tInformixConnection component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Host Database server IP address.

Port DB server listening port.

Database Name of the database.

Schema Name of the schema

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Instance Name of the Informix instance to be used. This information


can generally be found in the SQL hosts file.

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

1715
tInformixConnection

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.

Advanced settings

Use Transaction Clear this check box when the database is configured in
NO_LOG. mode. If the check box is selected, you can choose
whether to activate the Auto Commit option.

Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.

tStatCatcher Statistics Select this check box to collect the log data at a
component level.

Usage

Usage rule This component is generally used with other Informix


components, particularly tInformixCommit and
tInformixRollback.

Database Family Databases/Informix

Limitation This component requires installation of its related jar files.

Related scenario
For a scenario in which the tInformixConnection, might be used, see Inserting data in mother/
daughter tables on page 2426.

1716
tInformixInput

tInformixInput
Reads a database and extracts fields based on a query.
tInformixInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.

tInformixInput Standard properties


These properties are used to configure tInformixInput running in the Standard Job framework.
The Standard tInformixInput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

DB server Name of the database server

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next

1717
tInformixInput

component. When you create a Spark Job, avoid the reserved


word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component covers all possible SQL queries for DB2 da
tabases.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned

1718
tInformixInput

in your Job. This feature is useful when you need to acces


s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation This component requires installation of its related jar files.

Related scenarios
For related topics, see:
See also scenario for tContextLoad: Reading data from different MySQL databases using dynamically
loaded connection parameters on page 497.

1719
tInformixOutput

tInformixOutput
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tInformixOutput writes, updates, makes changes or suppresses entries in a database.

tInformixOutput Standard properties


These properties are used to configure tInformixOutput running in the Standard Job framework.
The Standard tInformixOutput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

1720
tInformixOutput

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

DB server Name of the database server

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.
Truncate table: Truncate the table.

Warning:
A commit operation will be carried out after the table is t
runcated.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries

1721
tInformixOutput

Insert or update: Insert a new record. If the record with the


given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.

Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-

1722
tInformixOutput

free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Use alternate schema Select this option to use a schema other than the one
specified by the component that establishes the database
connection (that is, the component selected from the
Component list drop-down list in Basic settings view).
After selecting this option, provide the name of the desired
schema in the Schema field.
This option is available when Use an existing connection is
selected in Basic settings view.

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

Note:
You can press Ctrl+Space to access a list of predefined
global variables.

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and, above all, better
performance at executions.

Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.

  Name: Type in the name of the schema column to be


altered or inserted as new column

  SQL expression: Type in the SQL statement to be executed


in order to alter or insert the relevant column data.

  Position: Select Before, Replace or After following the


action to be performed on the reference column.

  Reference column: Type in a column of reference that the


tDBOutput can use to place or replace the new or altered
column.

Use field options Select this check box to customize a request, especially
when there is double action on data.

Debug query mode Select this check box to display each step during processing
entries in a database.

Use Batch Select this check box to activate the batch mode for data
processing.

Batch Size Specify the number of records to be processed in each


batch.

1723
tInformixOutput

This field appears only when the Use batch mode check box
is selected.

Optimize the batch insertion Ensure the check box is selected, to optimize the insertion
of batches of data.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
QUERY: the query statement processed. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a Informix database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for

1724
tInformixOutput

example, when your Job has to be deployed and executed


independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation This component requires installation of its related jar files.

Related scenarios
For tInformixOutput related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.

1725
tInformixOutputBulk

tInformixOutputBulk
Prepares the file to be used as a parameter in the INSERT query used to feed Informix databases.
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT operation used
to feed a database. These two steps are fused together in the tInformixOutputBulkExec component.
The advantage of using two components is that data can be transformed before it is loaded in the
database.
Writes a file composed of columns, based on a defined delimiter and on Informix standards.

tInformixOutputBulk Standard properties


These properties are used to configure tInformixOutputBulk running in the Standard Job framework.
The Standard tInformixOutputBulk component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

File Name Name of the file to be generated.

Append Select this check box to append new rows to the end of the
file.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are

1726
tInformixOutputBulk

not enclosed within quotation marks. If they are, you must


remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Advanced settings

Row separator String (ex: "\n"on Unix) to distinguish rows.

Field separator Character, string or regular expression used to separate


fields

Set DBMONEY Select this box if you want to define the decimal separator
in the corresponding field.

Set DBDATE Select the date format that you want to apply.

Create directory if not exists This check box is selected automatically. The option allows
you to create a folder for the output file if it doesn't already
exist.

Custom the flush buffer size Select this box in order to customize the memory size used
to store the data temporarily. In the Row number field enter
the number of rows at which point the memory should be
freed.

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.

1727
tInformixOutputBulk

A Flow variable functions during the execution of a


component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is generally used along with tInformixBulkE


xec. Together, they improve performance levels when
adding data to an Informix database.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenario
For a scenario in which tInformixOutputBulk might be used, see:
• Inserting transformed data in MySQL database on page 2482.
• Inserting data in bulk in MySQL database on page 2489.

1728
tInformixOutputBulkExec

tInformixOutputBulkExec
Carries out Insert operations in Informix databases using the data provided.
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the
first step, an output file is generated. In the second step, this file is used in the INSERT operation used
to feed a database. These two steps are fused together in the tInformixOutputBulkExec component.

tInformixOutputBulkExec Standard properties


These properties are used to configure tInformixOutputBulkExec running in the Standard Job
framework.
The Standard tInformixOutputBulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Either Built-in or Repository .

  No properties stored centrally

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Execution platform Select the operating system you are using.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

1729
tInformixOutputBulkExec

Host Database server IP address.

Port DB server listening port.

Database Name of the database.

Schema Name of the schema.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Instance Name of the Informix instance to be used. This information


can generally be found in the SQL hosts file.

Table Name of the table to be written. Note that only one table
can be written at a time and the table must already exist for
the insert operation to be authorised.

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:

1730
tInformixOutputBulkExec

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Informix Directory Informix installation directory, e.g. " C:\Program Files\IBM


\IBM Informix Dynamic Server\11.50\".

Data file Name of the file to be generated and loaded.

Append Select this check box to add rows to the end of the file.

Action on data Select the operation you want to perform:


Bulk insert Bulk update The details asked will be different
according to the action chosen.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

Note:
You can press Ctrl+Space to access a list of predefined
global variables.

Row separator String (ex: "\n"on Unix) to distinguish rows.

Fields terminated by Character, string or regular expression used to separate the


fields

Set DBMONEY Select this check box to define the decimal separator used
in the corresponding field.

Set DBDATE Select the date format you want to apply.

Rows Before Commit Enter the number of rows to be processed before the
commit.

Bad Rows Before Abort Enter the number of rows in error at which point the Job
should stop.

Create directory if not exists This check box is selected by default. It creates a directory
to hold the output table if required.

Custom the flush buffer size Select this box in order to customize the memory size used
to store the data temporarily. In the Row number field enter
the number of rows at which point the memory should be
freed.

1731
tInformixOutputBulkExec

Encoding Select the encoding from the list or select Custom and
define it manually. This field is compulsory for DB data
handling.

tStatCatcher Statistics Select this check box to collect the log data at a
component level.

Output Where the output should go.

Usage

Usage rule This component is generally used when no particular


transformation is required on the data to be inserted in the
database.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation The database server/client must be installed on the same


machine where the Studio is installed or where the Job
using tInformixOutputBulkExec is deployed, so that the
component functions properly.

Related scenario
For a scenario in which tInformixOutputBulkExec might be used, see:
• Inserting transformed data in MySQL database on page 2482.
• Inserting data in bulk in MySQL database on page 2489.

1732
tInformixRollback

tInformixRollback
Prevents involuntary transaction commits by canceling transactions in connected databases.
tInformixRollback is closely related to tInformixCommit and tInformixConnection. They are generally
used together to execute transactions.

tInformixRollback Standard properties


These properties are used to configure tInformixRollback running in the Standard Job framework.
The Standard tInformixRollback component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tInformixConnection component from the list if


you plan to add more than one connection to the Job.

Close Connection Clear this check box if you want to continue to use the
connection once the component has completed its task.

Advanced settings

tStatCatcher Statistics Select this check box to collect the log data at a component
level.

Usage

Usage rule This component must be used with other Informix


components, particularly tInformixConnection and
tInformixCommit.

Famille de composant Databases/Informix

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.

1733
tInformixRollback

For examples on using dynamic parameters, see Reading


data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related Scenario
For a scenario in which tInformixRollback might be used, see Rollback from inserting data in mother/
daughter tables on page 2429.

1734
tInformixRow

tInformixRow
Acts on the actual DB structure or on the data (although without handling data) thanks to the
SQLBuilder that helps you write easily your SQL statements.
tInformixRow executes the SQL query stated onto the specified database. The Row suffix means the
component implements a flow in the job design although it doesn't provide output.

tInformixRow Standard properties


These properties are used to configure tInformixRow running in the Standard Job framework.
The Standard tInformixRow component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address

Port Listening port number of DB server.

1735
tInformixRow

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type Either Built-in or Repository .

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder.

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

1736
tInformixRow

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.

Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.

Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

1737
tInformixRow

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation This component requires installation of its related jar files.

Related scenarios
For related topics, see:
• Combining two flows for selective output on page 2503
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.

1738
tInformixSCD

tInformixSCD
Tracks and shows changes which have been made to Informix SCD dedicated tables
tInformixSCD addresses Slowly Changing Dimension transformation needs, by regularly reading a data
source and listing the modifications in an SCD dedicated table.

tInformixSCD Standard properties


These properties are used to configure tInformixSCD running in the Standard Job framework.
The Standard tInformixSCD component belongs to the Business Intelligence and the Databases
families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the Repository file where properties are


stored. The following fields are pre-filled in using fetched
data

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address.

Port DB server listening port.

1739
tInformixSCD

Database Name of the database.

Schema Name of the schema.

Username and Password User authentication information.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Instance Name of the Informix instance to be used. This information


can generally be found in the SQL hosts file.

Table Name of the table to be created

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

SCD Editor The SCD editor helps to build and configure the data flow
for slowly changing dimension outputs.
For more information, see SCD management methodology
on page 2511.

Use memory saving Mode Select this check box to improve system performance.

Source keys include Null Select this check box to allow the source key columns to
have Null values.

Warning:
Special attention should be paid to the uniqueness of the
source key(s) values when this option is selected.

1740
tInformixSCD

Use Transaction Select this check box when the database is configured in
NO_LOG mode.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

Advanced settings

End date time details Specify the time value of the SCD end date time setting in
the format of HH:mm:ss. The default value for this field is
12:00:00.
This field appears only when SCD Type 2 is used and Fixed
year value is selected for creating the SCD end date.

Debug mode Select this check box to display each step of the process by
which data is written in the database.

tStatCatcher Statistics Select this check box to collect the log data at a
component level.

Global Variables

Global Variables NB_LINE_UPDATED: the number of rows updated. This is an


After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is an output component. Consequently, it


requires an input component and a connection of the Row >
Main type.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an

1741
tInformixSCD

environment where you cannot change your Job settings, for


example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation This component does not support using SCD type 0 together
with other SCD types.

Related scenario
For a scenario in which tInformixSCD might be used, see tMysqlSCD on page 2508.

1742
tInformixSP

tInformixSP
Centralises and calls multiple and complex queries in a database.
tInformixSP calls procedures stored in a database.

tInformixSP Standard properties


These properties are used to configure tInformixSP running in the Standard Job framework.
The Standard tInformixSP component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No properties stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address.

Port Listening port number of DB server.

Database Name of the database.

1743
tInformixSP

Schema Name of the schema.

Username and Password User authentication information.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Instance Name of the Informix instance to be used. This information


can generally be found in the SQL hosts file.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

SP Name Enter the exact name of the stored procedure (SP).

Is Function / Return result in Select this check box if only one value must be returned.
From the list, select the the schema column upon which the
value to be obtained is based.

Parameters Click the Plus button and select the various Schema
Columns that will be required by the procedures. Note
that the SP schema can hold more columns than there are
parameters used in the procedure.
Select the Type of parameter:
IN: Input parameter.
OUT: Output parameter/return value.
IN OUT: Input parameters is to be returned as value, likely
after modification through the procedure (function).
RECORDSET: Input parameters is to be returned as a set of
values, rather than single value.

1744
tInformixSP

Note:
Check Inserting data in mother/daughter tables on page
2426, if you want to analyze a set of records from a
database table or DB query and return single records.

Use Transaction Clear this check box if the database is configured in the
NO_LOG mode.

Advanced settings

Additional JDBC parameters Specify additional connection properties for the DB


connection you are creating. This option is not available if
you have selected the Use an existing connection check box
in the Basic settings.

tStatCatcher Statistics Select this check box to collect log data at a component
level.

Usage

Usage rule This is an intermediary component. It can also be used as an


entry component. In this case, only the entry parameters are
authorized.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation The stored procedure syntax must correspond to that of the


database.
This component requires installation of its related jar files.

Related scenarios
For related scenarios, see:

1745
tInformixSP

• Retrieving personal information using a stored procedure on page 2404.


• Using tMysqlSP to find a State Label using a stored procedure on page 2528.
• Checking number format using a stored procedure on page 2735.
• Executing a stored procedure using tMDMSP on page 2180.
Also, see Inserting data in mother/daughter tables on page 2426 if you want to analyse a set of
records in a table or SQL query.

1746
tIngresBulkExec

tIngresBulkExec
Inserts data in bulk to a table in the Ingres DBMS for performance gain.
tIngresOutputBulk and tIngresBulkExec are generally used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used
to feed a database. These two steps are fused together in the tIngresOutputBulkExec component.
The advantage of using two components is that data can be transformed before it is loaded in the
database.

tIngresBulkExec Standard properties


These properties are used to configure tIngresBulkExec running in the Standard Job framework.
The Standard tIngresBulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Table Name of the table to be filled.

VNode Name of the virtual node.

Database Name of the database.

Action on table Actions that can be taken on the table defined:


None: No operation made to the table.
Truncate: Delete all the rows in the table and release the
file space back to the operating system.

File name Name of the file to be loaded.

Warning:
This file should be located on the same machine as the d
atabase server.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next

1747
tIngresBulkExec

component. When you create a Spark Job, avoid the reserved


word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Delete Working Files After Use Select this check box to delete the files that are created
during the execution.

Advanced settings

Field Separator Character, string or regular expression to separate fields.

Row Separator String (ex: "\n"on Unix) to separate rows

Null Indicator Value of the null indicator.

Session User User of the defined session (the connection to the da


tabase).

Rollback Enable or disable rollback.

On Error Policy of error handling:


Continue: Continue the execution.
Terminate: Terminate the execution.

Reject Row File Path and name of the file that holds the rejected rows.
Available when Continue is selected from the On Error list.

Error Count Number of errors to trigger the termination of the ex


ecution.

1748
tIngresBulkExec

Available when Terminate is selected from the On Error list.

Allocation Number of pages initially allocated to the table or index.

Extend Number of pages by which a table or index grows.

Fill Factor Specify the percentage (from 1 to 100) of each primary data
page that must be filled with rows, under ideal conditions.
For example, if you specify a fillfactor of 40, the DBMS
Server fills 40% of each of the primary data pages in the
restructured table with rows.

Min Pages/Max Pages Specify the minimum/maximum number of primary pages a


hash table must have. The Min. pages and Max. pages must
be at least 1.

Leaf Fill A bulk copy from can specify a leaffill value. This clause
specifies the percentage (from 1 to 100) of each B-tree leaf
page that must be filled with rows during the copy. This
clause can be used only on tables with a B-tree storage
structure.

Non Leaf Fill A bulk copy from can specify a nonleaffill value. This clause
specifies the percentage (from 1 to 100) of each B-tree non-
leaf index page that must be filled with rows during the
copy. This clause can be used only on tables with a B-tree
storage structure.

Row Estimate Specify the estimated number of rows to be copied from a


file to a table during a bulk copy operation.

Trailing WhiteSpace Selected by default, this check box is designed to trim the
trailing white spaces and applies only to such data types as
VARCHAR, NVARCHAR and TEXT.

Encoding List of the encoding schemes.

Output Where to output the error message:


to console: Message output to the console.
to global variable: Message output to the global variable.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE_DATA: the number of rows read. This is an After


variable and it returns an integer.
NB_LINE_BAD: the number of rows rejected. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

1749
tIngresBulkExec

To fill up a field or expression with a variable, press Ctrl +


Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule Deployed along with tIngresOutputBulk, tIngresBulkExe


c feeds the given data in bulk to the Ingres database for
performance gain.

Limitation The database server/client must be installed on the same


machine where the Studio is installed or where the Job
using tIngresBulkExec is deployed, so that the component
functions properly.
Due to license incompatibility, one or more JARs required
to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:
• Loading data to a table in the Ingres DBMS on page 1772

1750
tIngresClose

tIngresClose
Closes the transaction committed in the connected Ingres database.

tIngresClose Standard properties


These properties are used to configure tIngresClose running in the Standard Job framework.
The Standard tIngresClose component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tIngresConnection component in the list if more


than one connection are planned for the current Job.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is to be used along with Ingres


components, especially with tIngresConnection and
tIngresCommit.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

1751
tIngresClose

Related scenarios
No scenario is available for the Standard version of this component yet.

1752
tIngresCommit

tIngresCommit
Commits in one go, using a unique connection, a global transaction instead of doing that on every row
or every batch and thus provides gain in performance.
tIngresCommit validates the data processed through the Job into the connected database.

tIngresCommit Standard properties


These properties are used to configure tIngresCommit running in the Standard Job framework.
The Standard tIngresCommit component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tIngresConnection component in the list if more


than one connection are planned for the current Job.

Close Connection This check box is selected by default. It allows you to close
the database connection once the commit is done. Clear this
check box to continue to use the selected connection once
the component has performed its task.

Warning:
If you want to use a Row > Main connection to link
tIngresCommit to your Job, your data will be committed
row by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other tIngres*
components, especially with the tIngresConnection and
tIngresRollback components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces

1753
tIngresCommit

s database tables having the same data structure but in


different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenario
For tIngresCommit related scenario, see Inserting data in mother/daughter tables on page 2426.

1754
tIngresConnection

tIngresConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tIngresConnection opens a connection to the database for a current transaction.

tIngresConnection Standard properties


These properties are used to configure tIngresConnection running in the Standard Job framework.
The Standard tIngresConnection component belongs to the Databases and the ELT families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Server Database server IP address.

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together

1755
tIngresConnection

with a tRunJob component with either of these two options


enabled will cause your Job to fail.

Advanced settings

Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.

Usage

Usage rule This component is more commonly used with other tIngres*
components, especially with the tIngresCommit and
tIngresRollback components.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For tIngresConnection related scenario, see Loading data to a table in the Ingres DBMS on page 1772.

1756
tIngresInput

tIngresInput
Reads an Ingres database and extracts fields based on a query.
tIngresInput executes a DB query with a strictly defined order which must correspond to the schema
definition. Then it passes on the field list to the next component via a Main row link.

tIngresInput Standard properties


These properties are used to configure tIngresInput running in the Standard Job framework.
The Standard tIngresInput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

1757
tIngresInput

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Server Database server IP address

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.

1758
tIngresInput

Advanced settings

Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Remove leading and trailing whitespace from defined


columns.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component covers all possible SQL queries for Ingres
databases.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:

1759
tIngresInput

See also the scenario for tContextLoad: Reading data from different MySQL databases using
dynamically loaded connection parameters on page 497.

1760
tIngresOutput

tIngresOutput
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tIngresOutput writes, updates, makes changes or suppresses entries in a database.

tIngresOutput Standard properties


These properties are used to configure tIngresOutput running in the Standard Job framework.
The Standard tIngresOutput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

1761
tIngresOutput

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address

Port Listening port number of DB server.

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop a table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.

1762
tIngresOutput

Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.

1763
tIngresOutput

This property is not available when the Use an existing


connection check box in the Basic settings view is selected.

Commit every Enter the number of rows to be completed before


committing batches of rows together into the DB. This
option ensures transaction quality (but not rollback) and,
above all, better performance at execution.

Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.

  Name: Type in the name of the schema column to be


altered or inserted as new column

  SQL expression: Type in the SQL statement to be executed


in order to alter or insert the relevant column data.

  Position: Select Before, Replace or After following the


action to be performed on the reference column.

  Reference column: Type in a column of reference that the


tDBOutput can use to place or replace the new or altered
column.

Use field options Select this check box to customize a request, especially
when there is double action on data.

Debug query mode Select this check box to display each step during processing
entries in a database.

Use Batch Select this check box to activate the batch mode for data
processing.

Note:
This check box is available only when you have selected
the Insert, Update, or Delete option in the Action on data
option.

Batch Size Specify the number of records to be processed in each


batch.
This field appears only when the Use batch mode check box
is selected.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.

1764
tIngresOutput

NB_LINE_DELETED: the number of rows deleted. This is an


After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
QUERY: the query statement processed. This is an After
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a Ingres database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.

1765
tIngresOutputBulk

tIngresOutputBulk
Prepares the file whose data is inserted in bulk to the Ingres DBMS for performance gain.
tIngresOutputBulk and tIngresBulkExec are generally used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used to
feed a database. These two steps are fused together in the tIngresOutputBulkExec component.
tIngresOutputBulk prepares a file with the schema defined and the data coming from the preceding
component.

tIngresOutputBulk Standard properties


These properties are used to configure tIngresOutputBulk running in the Standard Job framework.
The Standard tIngresOutputBulk component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

File Name Name of the file to be generated.

Warning:
This file is generated on the local machine or a shared
folder on the LAN.

Append the File Select this check box to add the new rows at the end of the
file.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

1766
tIngresOutputBulk

When the schema to be reused has default values that are


integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Advanced settings

Field Separator Character, string or regular expression to separate fields.

Row Separator String (ex: "\n"on Unix) to separate rows.

Include Header Select this check box to include the column header in the fi
le.

Encoding List of encoding schemes.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

1767
tIngresOutputBulk

Usage

Usage rule Deployed along with tIngresBulkExec, tIngresOutputBulk


is intended to save the incoming data to a file, whose
data is then inserted in bulk to an Ingres database by
tIngresBulkExec for performance gain.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:
• Loading data to a table in the Ingres DBMS on page 1772,

1768
tIngresOutputBulkExec

tIngresOutputBulkExec
Inserts data in bulk to a table in the Ingres DBMS for performance gain.
tIngresOutputBulk and tIngresBulkExec are generally used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used to
feed a database. These two steps are fused together in the tIngresOutputBulkExec component.
tIngresOutputBulkExec prepares an output file and uses it to feed a table in the Ingres DBMS.

tIngresOutputBulkExec Standard properties


These properties are used to configure tIngresOutputBulkExec running in the Standard Job framework.
The Standard tIngresOutputBulkExec component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Table Name of the table to be filled.

VNode Name of the virtual node.


The database server must be installed on the same machine
where the Studio is installed or where the Job using
tIngresOutputBulkExec is deployed.

Database Name of the database.

Action on table Actions that can be taken on the table defined:


None: No operation made to the table.
Truncate: Delete all the rows in the table and release the
file space back to the operating system.

File name Name of the file to be generated and loaded.

1769
tIngresOutputBulkExec

Warning:
This file is generated on the machine specified by the
VNode field so it should be on the same machine as the
database server.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Delete Working Files After Use Select this check box to delete the files that are created
during the execution.

Advanced settings

Field Separator Character, string or regular expression to separate fields.

Row Separator String (ex: "\n"on Unix) to separate rows

On Error Policy of error handling:


Continue: Continue the execution.
Terminate: Terminate the execution.

Reject Row File Path and name of the file that holds the rejected rows.
Available when Continue is selected from the On Error list.

1770
tIngresOutputBulkExec

Error Count Number of errors to trigger the termination of the ex


ecution.
Available when Terminate is selected from the On Error list.

Rollback Enable or disable rollback.

Null Indicator Value of the null indicator.

Session User User of the defined session (the connection to the da


tabase).

Allocation Number of pages initially allocated to the table or index.

Extend Number of pages by which a table or index grows.

Fill Factor Specify the percentage (from 1 to 100) of each primary data
page that must be filled with rows, under ideal conditions.
For example, if you specify a fillfactor of 40, the DBMS
Server fills 40% of each of the primary data pages in the
restructured table with rows.

Min Pages/Max Pages Specify the minimum/maximum number of primary pages a


hash table must have. The Min. pages and Max. pages must
be at least 1.

Leaf Fill A bulk copy from can specify a leaffill value. This clause
specifies the percentage (from 1 to 100) of each B-tree leaf
page that must be filled with rows during the copy. This
clause can be used only on tables with a B-tree storage
structure.

Non Leaf Fill A bulk copy from can specify a nonleaffill value. This clause
specifies the percentage (from 1 to 100) of each B-tree non-
leaf index page that must be filled with rows during the
copy. This clause can be used only on tables with a B-tree
storage structure.

Row Estimate Specify the estimated number of rows to be copied from a


file to a table during a bulk copy operation.

Trailing WhiteSpace Selected by default, this check box is designed to trim the
trailing white spaces and applies only to such data types as
VARCHAR, NVARCHAR and TEXT.

Output Where to output the error message:


to console: Message output to the console.
to global variable: Message output to the global variable.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule Usually deployed along with tIngresConnection or


tIngresRow, tIngresOutputBulkExec prepares an output
file and feeds its data in bulk to the Ingres DBMS for
performance gain.

1771
tIngresOutputBulkExec

Limitation The database server/client must be installed on the same


machine where the Studio is installed or where the Job
using tIngresOutputBulkExec is deployed, so that the
component functions properly.
Due to license incompatibility, one or more JARs required
to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Loading data to a table in the Ingres DBMS


In this scenario, a tIngresOutputBulkExec component is deployed to prepare an output file with the
employee data from a .csv file and then use that output file to feed a table in an Ingres database.

Dragging and dropping components


Procedure
1. Drop tIngresConnection, tFileInputDelimited and tIngresOutputBulkExec from the Palette onto
the workspace.
2. Rename tIngresOutputBulkExec as save_a_copy_and_load_to_DB.
3. Link tIngresConnection to tFileInputDelimited using an OnSubjobOk trigger.
4. Link tFileInputDelimited to tIngresOutputBulkExec using a Row > Main connection.

Configuring the components


Procedure
1. Double-click tIngresConnection to open its Basic settings view in the Component tab.

2. In the Server field, enter the address of the server where the Ingres DBMS resides, for example
"localhost".

1772
tIngresOutputBulkExec

Keep the default settings of the Port field.


3. In the Database field, enter the name of the Ingres database, for example "research".
4. In the Username and Password fields, enter the authentication credentials.
A context variable is used for the password here. For more information on context variables, see
Talend Studio User Guide.
5. Double-click tFileInputDelimited to open its Basic settings view in the Component tab.

6. Select the source file by clicking the [...] button next to the File name/Stream field.
7. Click the [...] button next to the Edit schema field to open the schema editor.

8. Click the [+] button to add four columns, for example name, age, job and dept, with the data type
as string, Integer, string and string respectively.
Click OK to close the schema editor.
Click Yes on the pop-up window that asks whether to propagate the changes to the subsequent
component.
Leave other default settings unchanged.
9. Double-click tIngresOutputBulkExec to open its Basic settings view in the Component tab.

1773
tIngresOutputBulkExec

10. In the Table field, enter the name of the table for data insertion.
11. In the VNode and Database fields, enter the names of the VNode and the database.
12. In the File Name field, enter the full path of the file that will hold the data of the source file.

Executing the Job


Procedure
1. Press Ctrl+S to save the Job.
2. Press F6 to run the Job.

As shown above, the employee data is written to the table employee in the database research on
the node talendbj. Meanwhile, the output file employee_research.csv has been generated at C:/
Users/talend/Desktop.

Related scenarios
For related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.

1774
tIngresRollback

tIngresRollback
Avoids to commit part of a transaction involuntarily by canceling the transaction committed in the
connected database.

tIngresRollback Standard properties


These properties are used to configure tIngresRollback running in the Standard Job framework.
The Standard tIngresRollback component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tIngresConnection component in the list if more


than one connection are planned for the current Job.

Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other tIngres*
components, especially with the tIngresConnection and
tIngresCommit components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection

1775
tIngresRollback

parameters on page 497. For more information on Dynamic


settings and context variables, see Talend Studio User
Guide.

Related scenarios
For tIngresRollback related scenario, see Rollback from inserting data in mother/daughter tables on
page 2429.

1776
tIngresRow

tIngresRow
Acts on the actual DB structure or on the data (although without handling data) using the SQLBuilder
tool to write easily your SQL statements.
tIngresRow executes the SQL query stated onto the specified database. The Row suffix means the
component implements a flow in the job design although it doesn't provide output.

tIngresRow Standard properties


These properties are used to configure tIngresRow running in the Standard Job framework.
The Standard tIngresRow component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address.

Port Listening port number of DB server.

1777
tIngresRow

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type Either Built-in or Repository .

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced Settings

Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.

1778
tIngresRow

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.

Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also

1779
tIngresRow

find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.

1780
tIngresSCD

tIngresSCD
Reflects and tracks changes in a dedicated Ingres SCD table.
tIngresSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and
logging the changes into a dedicated SCD table.

tIngresSCD Standard properties


These properties are used to configure tIngresSCD running in the Standard Job framework.
The Standard tIngresSCD component belongs to the Business Intelligence and the Databases families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the Repository file where properties are


stored. The fields to follow are pre-filled in using fetched
data.

Server Database server IP address.

Port Listening port number of DB server.

1781
tIngresSCD

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

SCD Editor The SCD editor helps to build and configure the data flow
for slowly changing dimension outputs.
For more information, see SCD management methodology
on page 2511.

Use memory saving Mode Select this check box to maximize system performance.

Source keys include Null Select this check box to allow the source key columns to
have Null values.

Warning:
Special attention should be paid to the uniqueness of the
source key(s) values when this option is selected.

Die on error This check box is cleared by default, meaning to skip the
row on error and to complete the process for error-free
rows.

1782
tIngresSCD

Advanced settings

Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.

End date time details Specify the time value of the SCD end date time setting in
the format of HH:mm:ss. The default value for this field is
12:00:00.
This field appears only when SCD Type 2 is used and Fixed
year value is selected for creating the SCD end date.

Debug mode Select this check box to display each step during
processing entries in a database.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE_UPDATED: the number of rows updated. This is an


After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is used as Output component. It requires an


Input component and Row main link as input.

Limitation This component does not support using SCD type 0 together
with other SCD types.

Related scenario
For related scenarios, see tMysqlSCD on page 2508.

1783
tInterbaseClose

tInterbaseClose
Closes the transaction committed in the connected Interbase database.

tInterbaseClose Standard properties


These properties are used to configure tInterbaseClose running in the Standard Job framework.
The Standard tInterbaseClose component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tInterbaseConnection component in the list if


more than one connection are planned for the current Job.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is to be used along with Interbase


components, especially with tInterbaseConnection and
tInterbaseCommit.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

1784
tInterbaseClose

Related scenarios
No scenario is available for the Standard version of this component yet.

1785
tInterbaseCommit

tInterbaseCommit
Commits in one go a global transaction instead of doing that on every row or every batch and thus
provides gain in performance.
tInterbaseCommit validates the data processed through the Job into the connected DB.

tInterbaseCommit Standard properties


These properties are used to configure tInterbaseCommit running in the Standard Job framework.
The Standard tInterbaseCommit component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tInterbaseConnection component in the list if


more than one connection are planned for the current Job.

Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.

Warning:
If you want to use a Row > Main connection to link
tInterbaseCommit to your Job, your data will be committed
row by row. In this case, do not select the Close connection
check box or your connection will be closed before the end
of your first row commit.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other


tInterbase* components, especially with the tInterbaseConn
ection and tInterbaseRollback components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an

1786
tInterbaseCommit

environment where you cannot change your Job settings, for


example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenario
For tInterbaseCommit related scenario, see Inserting data in mother/daughter tables on page 2426.

1787
tInterbaseConnection

tInterbaseConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.
tInterbaseConnection opens a connection to the database for a current transaction.

tInterbaseConnection Standard properties


These properties are used to configure tInterbaseConnection running in the Standard Job framework.
The Standard tInterbaseConnection component belongs to the Databases and the ELT families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository.

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved. 

Host name Database server IP address.

Database Name of the database.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.

1788
tInterbaseConnection

Advanced settings

Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.

tStatCatcher Statistics Select this check box to gather the job processing metadata
at a Job level as well as at each component level.

Usage

Usage rule This component is more commonly used with


other tInterbase* components, especially with the
tInterbaseCommit and tInterbaseRollback components.

Limitation This component requires installation of its related jar files.

Related scenarios
For tInterbaseConnection related scenario, see tMysqlConnection on page 2425

1789
tInterbaseInput

tInterbaseInput
Reads an Interbase database and extracts fields based on a query.
tInterbaseInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.

tInterbaseInput Standard properties


These properties are used to configure tInterbaseInput running in the Standard Job framework.
The Standard tInterbaseInput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

1790
tInterbaseInput

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.

1791
tInterbaseInput

Advanced settings

Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Remove leading and trailing whitespace from defined


columns.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component covers all possible SQL queries for
Interbase databases.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.

1792
tInterbaseInput

For examples on using dynamic parameters, see Reading


data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation This component requires installation of its related jar files.

Related scenarios
For related topics, see:
See also the related topic in tContextLoad: Reading data from different MySQL databases using
dynamically loaded connection parameters on page 497.

1793
tInterbaseOutput

tInterbaseOutput
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tInterbaseOutput writes, updates, makes changes or suppresses entries in a database.

tInterbaseOutput Standard properties


These properties are used to configure tInterbaseOutput running in the Standard Job framework.
The Standard tInterbaseOutput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

1794
tInterbaseOutput

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address

Database Name of the database

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.
Drop table if exists and create: The table is removed if it
already exists and created again.
Clear a table: The table content is deleted.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.

1795
tInterbaseOutput

Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.

Clear data in table Wipes out data from the selected table before action.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

1796
tInterbaseOutput

Advanced settings

Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.

Commit every Enter the number of rows to be completed before


committing batches of rows together into the DB. This
option ensures transaction quality (but not rollback) and,
above all, better performance at execution.

Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.

  Name: Type in the name of the schema column to be


altered or inserted as new column

  SQL expression: Type in the SQL statement to be executed


in order to alter or insert the relevant column data.

  Position: Select Before, Replace or After following the


action to be performed on the reference column.

  Reference column: Type in a column of reference that the


tDBOutput can use to place or replace the new or altered
column.

Use field options Select this check box to customize a request, especially
when there is double action on data.

Debug query mode Select this check box to display each step during processing
entries in a database.

Use Batch Select this check box to activate the batch mode for data
processing.

Note:
This check box is available only when you have selected
the Insert, Update, or Delete option in the Action on data
option.

Batch Size Specify the number of records to be processed in each


batch.
This field appears only when the Use batch mode check box
is selected.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.

1797
tInterbaseOutput

NB_LINE_UPDATED: the number of rows updated. This is an


After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.
NB_LINE_REJECTED: the number of rows rejected. This is an
After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of a
table in a Interbase database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation This component requires installation of its related jar files.

1798
tInterbaseOutput

Related scenarios
For related topics, see
• Inserting a column and altering data using tMysqlOutput on page 2466.

1799
tInterbaseRollback

tInterbaseRollback
Avoids to commit part of a transaction involuntarily by canceling the transaction committed in the
connected Interbase database.

tInterbaseRollback Standard properties


These properties are used to configure tInterbaseRollback running in the Standard Job framework.
The Standard tInterbaseRollback component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Component list Select the tInterbaseConnection component in the list if


more than one connection are planned for the current Job.

Close Connection Clear this check box to continue to use the selected
connection once the component has performed its task.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Usage

Usage rule This component is more commonly used with other


tInterbase* components, especially with the tInterbaseConn
ection and tInterbaseCommit components.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection

1800
tInterbaseRollback

parameters on page 497. For more information on Dynamic


settings and context variables, see Talend Studio User
Guide.

Related scenarios
For tInterbaseRollback related scenario, see Rollback from inserting data in mother/daughter tables
on page 2429.

1801
tInterbaseRow

tInterbaseRow
Acts on the actual database structure or on the data (although without handling data) using the
SQLBuilder tool to write easily your SQL statements.
tInterbaseRow executes the SQL query stated onto the specified database. The Row suffix means the
component implements a flow in the job design although it does not provide output.

tInterbaseRow Standard properties


These properties are used to configure tInterbaseRow running in the Standard Job framework.
The Standard tInterbaseRow component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

Note: When a Job contains the parent Job and the child
Job, if you need to share an existing connection between
the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very
database connection.
2. In the child level, use a dedicated connection
component to read that registered database
connection.
For an example about how to share a database
connection across Job levels, see Talend Studio User
Guide.

Host Database server IP address

Database Name of the database

1802
tInterbaseRow

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type Either Built-in or Repository .

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Additional JDBC Parameters Specify additional JDBC parameters for the database
connection created.
This property is not available when the Use an existing
connection check box in the Basic settings view is selected.

1803
tInterbaseRow

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Note:
This option allows the component to have a different
schema from that of the preceding component.
Moreover, the column that holds the QUERY's recordset
should be set to the type of Object and this component
is usually followed by tParseRecordSet.

Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.

Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

1804
tInterbaseRow

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use
an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the
Component List box in the Basic settings view becomes
unusable.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Limitation This component requires installation of its related jar files.

Related scenarios
For related scenarios, see:
• Combining two flows for selective output on page 2503
• For tDBSQLRow related scenario: see Procedure on page 622
• For tMySQLRow related scenario: see Removing and regenerating a MySQL table index on page
2497.

1805
tIntervalMatch

tIntervalMatch
Returns a value based on a Join relation.
tIntervalMatch receives a main flow and aggregates it based on join to a lookup flow. Then it matches
a specified value to a range of values and returns related information.

tIntervalMatch Standard properties


These properties are used to configure tIntervalMatch running in the Standard Job framework.
The Standard tIntervalMatch component belongs to the Data Quality family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and job
flowcharts. Related topic: see Talend Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Search Column Select the main flow column containing the values to be
matched with a range of values

Column (LOOKUP) Select the lookup flow column containing the values to be
returned when the Join is ok.

Lookup Column (min) / Include the bound (min) Select the column containing the minimum value of the
range. Select the check box to include the minimum value
of the range in the match.

1806
tIntervalMatch

Lookup Column (max) / Include the bound (max) Select the column containing the maximum value of the
range. Select the check box to include the maximum value
of the range in the match.

Advanced settings

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component handles flow of data therefore it requires


input and output, hence is defined as an intermediary step.

Identifying server locations based on their IP addresses


This scenario describes a four-component Job that checks the server IP addresses listed in the main
input file against a list of IP ranges given in a lookup file to identify the hosting country for each
server.

1807
tIntervalMatch

Setting up the Job


About this task
The Job requires two tFileInputDelimited components, a tIntervalMatch component and a tLogRow
component.

Procedure
1. Drop the components onto the design workspace.
2. Connect the components using Row > Main connection.
Note that the connection from the second tFileInputDelimited component to the tIntervalMatch
component will appear as a Lookup connection.

Configuring the components


Procedure
1. Double-click the first tFileInputDelimited component to open its Basic settings view.

2. Browse to the file to be used as the main input, which provides a list of servers and their IP
addresses:

Server;IP
Server1;057.010.010.010
Server2;001.010.010.100
Server3;057.030.030.030
Server4;053.010.010.100

3. Click the [...] button next to Edit schema to open the Schema dialog box and define the input
schema. According to the input file structure, the schema is made of two columns, respectively
Server and IP, both of type String. Then click OK to close the dialog box.

1808
tIntervalMatch

4. Define the number of header rows to be skipped, and keep the other settings as they are.
5. Define the properties of the second tFileInputDelimited component similarly.

The file to be used as the input to the lookup flow in this example lists some IP address ranges
and the corresponding countries:

StartIP;EndIP;Country
001.000.000.000;001.255.255.255;USA
002.006.190.056;002.006.190.063;UK
011.000.000.000;011.255.255.255;USA
057.000.000.000;057.255.255.255;France
012.063.178.060;012.063.178.063;Canada
053.000.000.000;053.255.255.255;Germany

Accordingly, the schema of the lookup flow should have the following structure:

1809
tIntervalMatch

6. Double-click the tIntervalMatch component to open its Basic settings view.

7. From the Search Column list, select the main flow column containing the values to be matched
with the range values. In this example, we want to match the servers' IP addresses with the range
values from the lookup flow.
8. From the Column (LOOKUP) list, select the lookup column that holds the values to be returned. In
this example, we want to get the names of countries where the servers are hosted.
9. Set the min and max lookup columns corresponding to the range bounds defined in the lookup
schema, StartIP and EndIP respectively in this example.

Executing the Job


Procedure
Press Ctrl+S to save your Job and press F6 to run it.
The name of the country where each server is hosted is displayed next to the IP address.

1810
tIterateToFlow

tIterateToFlow
Transforms non processable data into a processable flow. tIterateToFlow transforms a list into a data
flow that can be processed.

tIterateToFlow Standard properties


These properties are used to configure tIterateToFlow running in the Standard Job framework.
The Standard tIterateToFlow component belongs to the Orchestration family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description, it defines the number of
fields that will be processed and passed on to the next
component. The schema is either Built-in or remote in the
Repository.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema will be created and stored locally for


this component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused in various projects and Job
designs. Related topic: see Talend Studio User Guide.

Mapping Column: Enter a name for the column to be created


Value: Press Ctrl+Space to access all of the available
variables, be they global or user-defined.

Advanced Settings

tStatCatcher Statistics Select this check box to collect the log data at a component
level.

1811
tIterateToFlow

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is not startable (green background) and it


requires an output component.

Connections Outgoing links (from this component to another):


Row: Main.
Trigger: Run if; On Component Ok; On Component Error.

Incoming links (from one component to this one):


Row: Iterate;

For further information regarding connections, see Talend


Studio User Guide.

Transforming a list of files as data flow


The following scenario describes a Job that iterates on a list of files, picks up the filename and current
date and transforms this into a flow, that gets displayed on the console.

• Drop the following components: tFileList, tIterateToFlow and tLogRow from the Palette to the
design workspace.
• Connect the tFileList to the tIterateToFlow using an iterate link and connect the Job to the
tLogRow using a Row main connection.
• In the tFileList Component view, set the directory where the list of files is stored.

1812
tIterateToFlow

• In this example, the files are three simple .txt files held in one directory: Countries.
• No need to care about the case, hence clear the Case sensitive check box.
• Leave the Include Subdirectories check box unchecked.
• Then select the tIterateToFlow component et click Edit Schema to set the new schema

• Add two new columns: Filename of String type and Date of date type. Make sure you define the
correct pattern in Java.
• Click OK to validate.
• Notice that the newly created schema shows on the Mapping table.

• In each cell of the Value field, press Ctrl+Space bar to access the list of global and user-specific
variables.
• For the Filename column, use the global variable: tFileList_1CURRENT_FILEPATH. It
retrieves the current filepath in order to catch the name of each file, the Job iterates on.
• For the Date column, use the Talend routine: Talend Date.getCurrentDate() (in Java)
• Then on the tLogRow component view, select the Print values in cells of a table check box.
• Save your Job and press F6 to execute it.

1813
tIterateToFlow

The filepath displays on the Filename column and the current date displays on the Date column.

1814
tJasperOutput

tJasperOutput
Creates a report in rich formats using Jaspersoft's iReport.
This component is closely related to Jaspersoft's report designer -- iReport. It reads and processes
data from an input flow to create a report against a .jrxml report template defined via iReport.
tJasperOutput reads and processes data from an input flow to create a report against a .jrxml report
template defined via iReport.

tJasperOutput Standard properties


These properties are used to configure tJasperOutput running in the Standard Job framework.
The Standard tJasperOutput component belongs to the Business Intelligence family.
The component in this framework is available in all Talend products.

Basic settings

Jrxml file Report template file created via iReport.

Temp path Path of temporary files.

Destination path Path of the final report file.

File name/Stream Name of the final report.

Report type File type of the final report.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Edit schema to make changes to the schema. If the
current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see the Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

1815
tJasperOutput

Sync columns Click to synchronize the output file schema with the input
file schema. The Sync function only displays once the Row
connection is linked with the output component.

iReport Edit the command to provide the path of iReport's


execution file, e.g. replacing __IREPORT_PATH__\ with E:
\Program Files\Jaspersoft\iReport-4.1.1\bin\, or giving the
full path of the execution file such as "E:\Program Files\J
aspersoft\iReport-4.1.1\bin\iReport.exe".

Launch Click to run iReport.

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Specify Locale Select this check box to choose a locale from the Report
Locale list.

Note:
The first line of the Report Locale list is empty. You can
click it to customize a locale.

Encoding Select an encoding mode from this list. You can select
Custom from the list to enter an encoding method in the
field that appears.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is closely related to Jaspersoft's report


designer -- iReport. It reads and processes data from
an input flow to create a report against a .jrxml report
template defined via iReport.

1816
tJasperOutput

Generating a report against a .jrxml template


The following Job reads data from a .csv file and creates a .pdf report based on an existing .jrxml
report template. Note that the template file should be created via Jaspersoft's iReport based on a file
that shares the same schema with the source .csv file of this job.

Setting up the Job


Procedure
1. Drag and drop the following components from the Palette to the workspace: tFileInputDelimited
and tJasperOutput.
2. Connect tFileInputDelimited and tJasperOutput using a Row link.

Configuring the input component


Procedure
1. Double-click the tFileInputDelimited component to display its Basic settings view.

2. Select Built-In from the Property Type drop-down list.

Note:
You can select Repository from the Property Type drop-down list to fill in the relevant fields
automatically if the relevant metadata has been stored locally in the Repository. For more
information about Metadata, see the Talend Studio User Guide.

3. Fill in the File name/Stream field to give the path and name of the source file, e.g. "C:/Documents
and Settings/Andy ZHANG/nom.csv".
4. Keep the default settings for the Row Separator and Field Separator fields. You can also change
them as needed.

1817
tJasperOutput

5. Set 1 in the Header field and 0 in the Footer field. Leave the Limit field empty. You can also
change them as needed.
6. Select Built-In from the Schema drop-down list and click Edit schema to define the data structure
of the input file. In this case, the input file has 2 columns: Nom and Prenom.

Configuring the output component


Procedure
1. Double-click tJasperOutput to display its Basic settings view.

2. Enter the full path of the report template file created via Jaspersoft's iReport in the Jrxml file
field. You can click the three-dot button to browse.

Note:
The schema of the file, which is used to create a .jrxml template file via iReport, should be the
same as that of the source file that is used to create the report.

3. Enter the path for the temporary files generated during the job execution in the Temp path field.
You can click the three-dot button to browse.
4. Enter the path for the final report file generated during the job execution in the Destination path
field. You can click the three-dot button to browse.
5. Enter the name for the final report file generated during the job execution in the File name/
Stream field.
6. Select the format for the final report file generated during the job execution in the Report type
field.
7. Click Sync columns to retrieve the schema from the previous component.

1818
tJasperOutput

8. Enter the path of execution file of Jaspersoft's iReport in the iReport field, e.g. replacing
__IREPORT_PATH__\ with E:\Program Files\Jaspersoft\iReport-4.1.1\bin\. You can click the Launch
button to run iReport.

Note:
This step is not mandatory. Yet, this helps you conveniently access the iReport software for
relevant operations, e.g. creating a report template, etc.

Job execution
Procedure
1. Press CTRL+S to save your Job.
2. Press F6 to execute it.
You can find the file out.pdf in the folder specified in the Destination path field.

1819
tJasperOutputExec

tJasperOutputExec
Creates a report in rich formats using Jaspersoft's iReport and offers a performance gain as it functions
as a combination of an input component and a tJasperOutput component.
This component is closely related to Jaspersoft's report designer -- iReport. It reads and processes
data from a source file to create a report against a .jrxml report template defined via iReport.
tJasperOutputExec is used as a combination of an input component and a tJasperOutput component.
The advantage of using two separate components is that data can be transformed before being used
to generate a report and the input sources can be various and rich.
Reads and processes data from a source file to create a report against a .jrxml report template
defined via iReport.

tJasperOutputExec Standard properties


These properties are used to configure tJasperOutputExec running in the Standard Job framework.
The Standard tJasperOutputExec component belongs to the Business Intelligence family.
The component in this framework is available in all Talend products.

Basic settings

Jrxml file Report template file created via iReport.

Source file Name of the source file.

Record delimiter Delimiter of the records.

Destination path Path of the final report file.

Use Default Output Name Select this check box to use the default name for the report
generated, which takes the source file's name.

Output Name Name of the final report.

Note:
This field does not appear if the Use Default Output
Name box has been selected.

Report type File type of the final report.

iReport Edit the command to provide the path of iReport's


execution file, e.g. replacing __IREPORT_PATH__\ with E:
\Program Files\Jaspersoft\iReport-4.1.1\bin\, or giving the
full path of the execution file such as "E:\Program Files\J
aspersoft\iReport-4.1.1\bin\iReport.exe".

Launch Click to run iReport.

1820
tJasperOutputExec

Advanced settings

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Specify Locale Select this check box to choose a locale from the Report
Locale list.

Note:
The first line of the Report Locale list is empty. You can
click it to customize a locale.

Encoding Select an encoding mode from this list. You can select
Custom from the list to enter an encoding method in the
field that appears.

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is closely related to Jaspersoft's report


designer -- iReport. It reads and processes data from
a source file to create a report against a .jrxml report
template defined via iReport.

Related Scenario
For related scenarios, see Generating a report against a .jrxml template on page 1817.

1821
tJava

tJava
Extends the functionalities of a Talend Job using custom Java commands.
tJava enables you to enter personalized code in order to integrate it in Talend program. You can
execute this code only once.

tJava Standard properties


These properties are used to configure tJava running in the Standard Job framework.
The Standard tJava component belongs to the Custom Code family.
The component in this framework is available in all Talend products.

Basic settings

Code Type in the Java code you want to execute according to the
task you need to perform. For further information about Java
functions syntax specific to Talend , see Talend Studio
Help Contents (Help > Developer Guide > API Reference).
For a complete Java reference, check http://docs.oracle.com/
javaee/6/api/

Note: If your custom Java code references


org.talend.transform.runtime
.api.ExecutionStatus, change it to
org.talend.transform.runtime
.common.MapExecutionStatus.

Advanced settings

Import Enter the Java code to import, if necessary, external libraries


used in the Code field of the Basic settings view.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a Job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.

1822
tJava

For further information about variables, see Talend Studio


User Guide.

Usage

Usage rule This component is generally used as a one-component


subJob.

Limitation You should know Java language.

Printing out a variable content


The following scenario is a simple demo of the extended application of the tJava component. The
Job aims at printing out the number of lines being processed using a Java command and the global
variable provided in Talend Studio .

Setting up the Job


Procedure
1. Select and drop the following components from the Palette onto the design workspace:
tFileInputDelimited, tFileOutputExcel, tJava.
2. Connect the tFileInputDelimited to the tFileOutputExcel using a Row Main connection. The
content from a delimited txt file will be passed on through the connection to an xls-type of file
without further transformation.
3. Then connect the tFileInputDelimited component to the tJava component using a Trigger > On
Subjob Ok link. This link sets a sequence ordering tJava to be executed at the end of the main
process.

Configuring the input component


Procedure
1. Set the Basic settings of the tFileInputDelimited component.

1823
tJava

2. Define the path to the input file in the File name field.
The input file used in this example is a simple text file made of two columns: Names and their
respective Emails.
3. Click the Edit Schema button, and set the two-column schema. Then click OK to close the dialog
box.

4. When prompted, click OK to accept the propagation, so that the tFileOutputExcel component gets
automatically set with the input schema.

Configuring the output component


Set the output file to receive the input content without changes. If the file does not exist already, it
will get created.

1824
tJava

In this example, the Sheet name is Email and the Include Header box is selected.

Configuring the tJava component


Procedure
1. Then select the tJava component to set the Java command to execute.

2. In the Code area, type in the following command:

String var = "Nb of line processed: ";


var = var + globalMap.get("tFileInputDelimited_1_NB_LINE");
System.out.println(var);

In this use case, we use the NB_Line variable. To access the global variable list, press Ctrl + Space
bar on your keyboard and select the relevant global parameter.

Executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 to execute it.

1825
tJava

Results

The content gets passed on to the Excel file defined and the Number of lines processed are displayed
on the Run console.

1826
tJavaDBInput

tJavaDBInput
Reads a database and extracts fields based on a query
tJavaDBInput executes a DB query with a strictly defined order which must correspond to the schema
definition. Then it passes on the field list to the next component via a Main row link.

tJavaDBInput Standard properties


These properties are used to configure tJavaDBInput running in the Standard Job framework.
The Standard tJavaDBInput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Framework Select your Java database framework on the list

Database Name of the database

DB root path Browse to your database root.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

1827
tJavaDBInput

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type and Query Enter your DB query paying particularly attention to
properly sequence the fields in order to match the schema
definition.

Advanced settings

Trim all the String/Char columns Select this check box to remove leading and trailing
whitespace from all the String/Char columns.

Trim column Remove leading and trailing whitespace from defined


columns.

tStatCatcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component covers all possible SQL database queries.

1828
tJavaDBInput

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:
See also the related topic in tContextLoad: Reading data from different MySQL databases using
dynamically loaded connection parameters on page 497.

1829
tJavaDBOutput

tJavaDBOutput
Executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
tJavaDBOutput writes, updates, makes changes or suppresses entries in a database.

tJavaDBOutput Standard properties


These properties are used to configure tJavaDBOutput running in the Standard Job framework.
The Standard tJavaDBOutput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Framework Select your Java database framework on the list

Database Name of the database

DB root path Browse to your database root.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name of the table to be written. Note that only one table
can be written at a time

Action on table On the table defined, you can perform one of the following
operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created
again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not
exist.

1830
tJavaDBOutput

Drop table if exists and create: The table is removed if it


already exists and created again.
Clear a table: The table content is deleted.

Action on data On the data of the table defined, you can perform:
Insert: Add new entries to the table. If duplicates are found,
Job stops.
Update: Make changes to existing entries
Insert or update: Insert a new record. If the record with the
given reference already exists, an update would be made.
Update or insert: Update the record with the given
reference. If the record does not exist, a new record would
be inserted.
Delete: Remove entries corresponding to the input flow.

Warning:
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting the
check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.

1831
tJavaDBOutput

• Update repository connection: choose this option


to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Commit every Enter the number of rows to be completed before


committing batches of rows together into the DB. This
option ensures transaction quality (but not rollback) and,
above all, better performance at execution.

Additional Columns This option is not offered if you create (with or without
drop) the DB table. This option allows you to call SQL
functions to perform actions on columns, which are not
insert, nor update or delete actions, or action that require
particular preprocessing.

  Name: Type in the name of the schema column to be


altered or inserted as new column

  SQL expression: Type in the SQL statement to be executed


in order to alter or insert the relevant column data.

  Position: Select Before, Replace or After following the


action to be performed on the reference column.

  Reference column: Type in a column of reference that the


tDBOutput can use to place or replace the new or altered
column.

Use field options Select this check box to customize a request, especially
when there is double action on data.

Debug query mode Select this check box to display each step during processing
entries in a database.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

Global Variables

Global Variables  NB_LINE: the number of rows processed. This is an After


variable and it returns an integer.
NB_LINE_UPDATED: the number of rows updated. This is an
After variable and it returns an integer.
NB_LINE_INSERTED: the number of rows inserted. This is an
After variable and it returns an integer.
NB_LINE_DELETED: the number of rows deleted. This is an
After variable and it returns an integer.

1832
tJavaDBOutput

NB_LINE_REJECTED: the number of rows rejected. This is an


After variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility benefit of the DB query
and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of a
table in a Java database. It also allows you to create a reject
flow using a Row > Rejects link to filter data in error. For an
example of tMysqlOutput in use, see Retrieving data in error
with a Reject link on page 2474.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.

1833
tJavaDBRow

tJavaDBRow
Acts on the actual database structure or on the data (although without handling data) using the
SQLBuilder tool to write easily your SQL statements.
tJavaDBRow executes the SQL query stated onto the specified database. The Row suffix means the
component implements a flow in the job design although it doesn't provide output.

tJavaDBRow Standard properties


These properties are used to configure tJavaDBRow running in the Standard Job framework.
The Standard tJavaDBRow component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Property type Either Built-in or Repository .

  Built-in: No property data stored centrally.

  Repository: Select the repository file in which the


properties are stored. The fields that follow are completed
automatically using the data retrieved.

Framework Select your Java database framework on the list

Database Name of the database

DB root path Browse to your database root.

Username and Password DB user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-in: The schema is created and stored locally for this


component only. Related topic: see Talend Studio User
Guide.

  Repository: The schema already exists and is stored in the


Repository, hence can be reused. Related topic: see Talend
Studio User Guide.

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.

1834
tJavaDBRow

• Change to built-in property: choose this option to


change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Query type Either Built-in or Repository .

  Built-in: Fill in manually the query statement or build it


graphically using SQLBuilder

  Repository: Select the relevant query stored in the


Repository. The Query field gets accordingly filled in.

Query Enter your DB query paying particularly attention to


properly sequence the fields in order to match the schema
definition.

Die on error This check box is selected by default. Clear the check box
to skip the row on error and complete the process for error-
free rows. If needed, you can retrieve the rows on error via a
Row > Rejects link.

Advanced settings

Propagate QUERY's recordset Select this check box to insert the result of the query into
a COLUMN of the current flow. Select this column from the
use column list.

Use PreparedStatement Select this check box if you want to query the database
using a PreparedStatement. In the Set PreparedStatement
Parameter table, define the parameters represented by "?" in
the SQL instruction of the Query field in the Basic Settings
tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.

Note:
This option is very useful if you need to execute the
same query several times. Performance levels are
increased

Commit every Number of rows to be completed before committing


batches of rows together into the DB. This option ensures
transaction quality (but not rollback) and above all better
performance on executions.

tStat Catcher Statistics Select this check box to collect log data at the component
level.

1835
tJavaDBRow

Global Variables

Global Variables QUERY: the query statement being processed. This is a Flow
variable and it returns a string.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component offers the flexibility of the DB query and
covers all possible SQL queries.

Limitation Due to license incompatibility, one or more JARs required


to use this component are not provided. You can install the
missing JARs for this particular component by clicking the
Install button on the Component tab view. You can also
find out and add all missing JARs easily on the Modules
tab in the Integration perspective of your studio. You can
find more details about how to install external modules in
Talend Help Center (https://help.talend.com).

Related scenarios
For related topics, see:
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.

1836
tJavaFlex

tJavaFlex
Provides a Java code editor that lets you enter personalized code in order to integrate it in Talend
program.
tJavaFlex enables you to add Java code to the Start/Main/End code sections of this component itself.
With tJavaFlex, you can enter the three java-code parts (start, main and end) that constitute a kind of
component dedicated to do a desired operation.

tJavaFlex Standard properties


These properties are used to configure tJavaFlex running in the Standard Job framework.
The Standard tJavaFlex component belongs to the Custom Code family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
Click Sync columns to retrieve the schema from the previous
component in the Job.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

1837
tJavaFlex

Data Auto Propagate Select this check box to automatically propagate the data to
the component that follows.

Start code Enter the Java code that will be called during the
initialization phase.

Main code Enter the Java code to be applied for each line in the data
flow.

End code Enter the Java code that will be called during the closing
phase.

Advanced settings

Import Enter the Java code that helps to import, if necessary,


external libraries used in the Main code box of the Basic
settings view.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at a job level as well as at each component level.

Global Variables

Global Variables ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule You can use this component as a start, intermediate


or output component. You can as well use it as a one-
component subJob.

Limitation You should know the Java language.

Generating data flow


This scenario describes a two-components Job that generates a three-line data flow describing
different personal titles (Miss, Mrs, and Mr) and displaying them on the console.

1838
tJavaFlex

Setting up the Job


Procedure
1. Drop tJavaFlex and tLogRow from the Palette onto the design workspace.
2. Connect the components together using a Row > Main link.

Configuring the tJavaFlex component


Procedure
1. Double-click tJavaFlex to display its Basic settings view and define its properties.

2. Click the three-dot button next to Edit schema to open the corresponding dialog box where you
can define the data structure to pass to the component that follows.

3. Click the [+] button to add two columns: key and value and then set their types to Integer and
String respectively.
4. Click OK to validate your changes and close the dialog box.

1839
tJavaFlex

5. In the Basic settings view of tJavaFlex, select the Data Auto Propagate check box to automatically
propagate data to the component that follows.
In this example, we do not want to do any transformation on the retrieved data.
6. In the Start code field, enter the code to be executed in the initialization phase.
In this example, the code indicates the initialization of tJavaFlex by displaying the START
message and sets up the loop and the variables to be used afterwards in the Java code:

System.out.println("## START\n#");
String [] valueArray = {"Miss", "Mrs", "Mr"};

for (int i=0;i<valueArray.length;i++) {

7. In the Main code field, enter the code you want to apply on each of the data rows.
In this example, we want to display each key with its value:

row1.key = i;
row1.value = valueArray[i];

Warning:
In the Main code field, "row1" corresponds to the name of the link that comes out of tJavaFlex. If you
rename this link, you have to modify the code of this field accordingly.

8. In the End code field, enter the code that will be executed in the closing phase.
In this example, the brace (curly bracket) closes the loop and the code indicates the end of the
execution of tJavaFlex by displaying the END message:

}
System.out.println("#\n## END");

9. If needed, double-click tLogRow and in its Basic settings view, click the [...] button next to Edit
schema to make sure that the schema has been correctly propagated.

1840
tJavaFlex

Saving and executing the Job


Procedure
1. Save your Job by pressing Ctrl+S.
2. Execute the Job by pressing F6 or clicking Run on the Run tab.

The three personal titles are displayed on the console along with their corresponding keys.

Processing rows of data with tJavaFlex


This scenario describes a two-component Job that generates random data and then collects that data
and does some transformation on it line by line using Java code through the tJavaFlex component.

Setting up the Job


Procedure
1. Drop tRowGenerator and tJavaFlex from the Palette onto the design workspace.
2. Connect the components together using a Row Main link.

Configuring the input component


Procedure
1. Double-click tRowGenerator to display its Basic settings view and the RowGenerator Editor dialog
box where you can define the component properties.

1841
tJavaFlex

2. Click the plus button to add four columns: number, txt, date and flag.
3. Define the schema and set the parameters to the four columns according to the above capture.
4. In the Functions column, select the three-dot function [...] for each of the defined columns.
5. In the Parameters column, enter 10 different parameters for each of the defined columns.
These 10 parameters corresponds to the data that will be randomly generated when executing
tRowGenerator.
6. Click OK to validate your changes and close the editor.

Configuring the tJavaFlex component


Procedure
1. Double-click tJavaFlex to display its Basic settings view and define the components properties.

2. Click Sync columns to retrieve the schema from the preceding component.
3. In the Start code field, enter the code to be executed in the initialization phase.
In this example, the code indicates the initialization of the tJavaFlex component by displaying the
START message and defining the variable to be used afterwards in the Java code:

System.out.println("## START\n#");
int i = 0;

4. In the Main code field, enter the code to be applied on each line of data.

1842
tJavaFlex

In this example, we want to show the number of each line starting from 0 and then the number
and the random text transformed to upper case and finally the random date set in the editor of
tRowGenerator. Then, we create a condition to show if the status is true or false and we increment
the number of the line:

System.out.print(" row" + i + ":");


System.out.print("# number:" + row1.number);
System.out.print (" | txt:" + row1.txt.toUpperCase());
System.out.print(" | date:" + row1.date);
if(row1.flag) System.out.println(" | flag: true");
else System.out.println(" | flag: false");

i++;

Warning:
In the Main code field, "row1" corresponds to the name of the link that connects to tJavaFlex. If you
rename this link, you have to modify the code.

5. In the End code field, enter the code that will be executed in the closing phase.
In this example, the code indicates the end of the execution of tJavaFlex by displaying the END
message:

System.out.println("#\n## END");

Saving and executing the Job


Procedure
1. Save your Job by pressing Ctrl+S.
2. Execute the Job by pressing F6 or clicking Run on the Run tab.

1843
tJavaFlex

The console displays the randomly generated data that was modified by the java command set
through tJavaFlex.

1844
tJavaRow

tJavaRow
Provides a code editor that lets you enter the Java code to be applied to each row of the flow.
tJavaRow allows you to enter customized code which you can integrate in a Talend program.

tJavaRow Standard properties


These properties are used to configure tJavaRow running in the Standard Job framework.
The Standard tJavaRow component belongs to the Custom Code family.
The component in this framework is available in all Talend products.

Basic settings

Schema and Edit Schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

  Built-In: You create and store the schema locally for this
component only.

  Repository: You have already created the schema and stored


it in the Repository. You can reuse it in various projects and
Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).

Click Edit schema to make changes to the schema. If the


current schema is of the Repository type, three options are
available:
• View schema: choose this option to view the schema
only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.

Generate code Click this button to automatically generate the code in


the Code field to map the columns of the input schema
with those of the output schema. This generation does not
change anything in your schema.

1845
tJavaRow

The principle of this mapping is to relate the columns


that have the same column name. Then you can adapt the
generated code depending on the actual map you need.

Code Enter the Java code to be applied to each line of the data
flow.

Advanced settings

Import Enter the Java code to import, if necessary, external libraries


used in the Code field of the Basic settings view.

tStatCatcher Statistics Select this check box to collect the log data at a component
level..

Global Variables

Global Variables NB_LINE: the number of rows read by an input component


or transferred to an output component. This is an After
variable and it returns an integer.
ERROR_MESSAGE: the error message generated by the
component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.
To enter a global variable (for example COUNT of
tFileRowCount) in the Code box, you need to type in the
entire piece of code manually, that is to say ((Integer)glob
alMap.get("tFileRowCount_COUNT")).

Usage

Usage rule This component is used as an intermediary between two


other components. It must be linked to both an input and an
output component.

Function tJavaRow allows you to enter customized code which you


can integrate in a Talend programme. With tJavaRow, you
can enter the Java code to be applied to each row of the
flow.

Purpose tJavaRow allows you to broaden the functionality of Talend


Jobs, using the Java language.

Limitation Knowledge of Java language is necessary.

1846
tJavaRow

Transforming data line by line using tJavaRow


In this scenario, the information of a few cities read from an input delimited file is transformed using
Java code through the tJavaRow component and printed on the console.

Setting up the Job


Procedure
1. Drop a tFileInputDelimited component and a tJavaRow component from the Palette onto the
design workspace, and label them to better identify their roles in the Job.
2. Connect the two components using a Row > Main connection.

Configuring the components


Procedure
1. Double-click the tFileInputDelimited component to display its Basic settings view in the
Component tab.

2. In the File name/Stream field, type in the path to the input file in double quotation marks, or
browse to the path by clicking the [...] button, and define the first line of the file as the header.
In this example, the input file has the following content:

City;Population;LandArea;PopDensity
Beijing;10233000;1418;7620
Moscow;10452000;1081;9644
Seoul;10422000;605;17215
Tokyo;8731000;617;14151
New York;8310000;789;10452

3. Click the [...] button next to Edit schema to open the Schema dialog box, and define the data
structure of the input file. Then, click OK to validate the schema setting and close the dialog box.

1847
tJavaRow

4. Double-click the tJavaRow component to display its Basic settings view in the Component tab.

5. Click Sync columns to make sure that the schema is correctly retrieved from the preceding
component.
6. In the Code field, enter the code to be applied on each line of data based on the defined schema
columns.
In this example, we want to transform the city names to upper case, group digits of numbers
larger than 1000 using the thousands separator for ease of reading, and print the data on the
console:

System.out.print("\n" + input_row.City.toUpperCase() + ":");


System.out.print("\n - Population: "
+ FormatterUtils.format_Number(String.valueOf(input_row.Population), ',', '.') + "
people");
System.out.print("\n - Land area: "
+ FormatterUtils.format_Number(String.valueOf(input_row.LandArea), ',', '.')
+ " km2");
System.out.print("\n - Population density: "
+ FormatterUtils.format_Number(String.valueOf(input_row.PopDensity), ',', '.') + "
people/km2\n");

Note:
In the Code field, input_row refers to the link that connects to tJavaRow.

1848
tJavaRow

Saving and executing the Job


Procedure
1. Press Ctrl+S to save your Job.
2. Press F6 or click Run on the Run tab to execute the Job.
The city information is transformed by the Java code set through tJavaRow and displayed on the
console.

1849
tJDBCClose

tJDBCClose
Closes an active JDBC connection to release the occupied resources.

tJDBCClose Standard properties


These properties are used to configure tJDBCClose running in the Standard Job framework.
The Standard tJDBCClose component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Connection Component Select the component that opens the connection you need
to close from the drop-down list.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is to be used along with JDBC components,


especially with tJDBCConnection and tJDBCCommit.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic

1850
tJDBCClose

settings and context variables, see Talend Studio User


Guide.

Related scenarios
No scenario is available for the Standard version of this component yet.

1851
tJDBCColumnList

tJDBCColumnList
Lists all column names of a given JDBC table.
tJDBCColumList iterates on all columns of a given table through a defined JDBC connection.

tJDBCColumnList Standard properties


These properties are used to configure tJDBCColumnList running in the Standard Job framework.
The Standard tJDBCColumnList component belongs to the Databases family.
The component in this framework is available in all Talend products.

Basic settings

Database Type Select the type of the database to be accessed.

Component list Select the tJDBCConnection component in the list if more


than one connection are planned for the current Job.

Table name Enter the name of the table.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows. When errors are skipped, you
can collect the rows on error using a Row > Reject link.

Advanced settings

tStatCatcher Select this check box to collect log data at the component level.
Statistics

Global Variables

Global Variables CURRENT_COLUMN: the name of the column currently


iterated upon. This is a Flow variable and it returns a string.
CURRENT_COLUMN_TYPE: the ID of the type of the column
currently iterated upon. This is a Flow variable and it returns
an integer.
CURRENT_COLUMN_TYPE_NAME: the name of the type of
the column currently iterated upon. This is a Flow variable
and it returns a string.
CURRENT_COLUMN_PRECISION: the precision of the column
currently iterated upon. This is a Flow variable, and it
returns an integer.
CURRENT_COLUMN_SCALE: the scale of the column
currently iterated upon. This is a Flow variable, and it
returns an integer.
NB_COLUMN: the number of columns iterated upon so far.
This is an After variable and it returns an integer.

1852
tJDBCColumnList

ERROR_MESSAGE: the error message generated by the


component when an error occurs. This is an After variable
and it returns a string. This variable functions only if the
Die on error check box is cleared, if the component has this
check box.
A Flow variable functions during the execution of a
component while an After variable functions after the
execution of the component.
To fill up a field or expression with a variable, press Ctrl +
Space to access the variable list and choose the variable to
use from it.
For further information about variables, see Talend Studio
User Guide.

Usage

Usage rule This component is to be used along with JDBC components,


especially with tJDBCConnection.

Related scenario
For tJDBCColumnList related scenario, see Iterating on a DB table and listing its column names on
page 2419.

1853
tJDBCCommit

tJDBCCommit
Commits in one go a global transaction instead of doing that on every row or every batch and thus
provides gain in performance.
tJDBCCommit validates the data processed through the Job into the connected DB.

tJDBCCommit Standard properties


These properties are used to configure tJDBCCommit running in the Standard Job framework.
The Standard tJDBCCommit component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Connection Component Select the component that opens the database connection
to be reused by this component.

Close Connection Select this check box to close the database connection once
the component has performed its task.
Clear this check box to continue to use the selected
connection once the component has performed its task.
If this component is linked to your Job via a Row > Main
connection, your data will be committed row by row. In this
case, do not select the Close connection check box or your
connection will be closed before the end of the first row
commit.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is more commonly used with other tJDBC*
components, especially with the tJDBCConnection and
tJDBCRollback components.

1854
tJDBCCommit

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenario
For tJDBCCommit related scenario, see Inserting data in mother/daughter tables on page 2426.

1855
tJDBCConnection

tJDBCConnection
Opens a connection to the specified database that can then be reused in the subsequent subjob or
subjobs.

tJDBCConnection Standard properties


These properties are used to configure tJDBCConnection running in the Standard Job framework.
The Standard tJDBCConnection component belongs to the Databases and the ELT families.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.

Drivers Complete this table to load the driver JARs needed. To do


this, click the [+] button under the table to add as many
rows as needed, each row for a driver JAR, then select
the cell and click the [...] button at the right side of the
cell to open the Module dialog box from which you can
select the driver JAR to be used. For example, the driver jar
RedshiftJDBC41-1.1.13.1013.jar for the Redshift
database.
For more information, see Importing a database driver.

Driver Class Enter the class name for the specified driver between
double quotation marks. For example, for the
RedshiftJDBC41-1.1.13.1013.jar driver, the name
to be entered is com.amazon.redshift.jdbc41.D
river.

Use Id and Password The database user authentication data.

1856
tJDBCConnection

To enter the password, click the [...] button next to the


password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Use or register a shared DB Connection Select this check box to share your database connection
or fetch a database connection shared by a parent or child
Job, and in the Shared DB Connection Name field displayed,
enter the name for the shared database connection. This
allows you to share one single database connection (except
the database schema setting) among several database
connection components from different Job levels that can
be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared connection together
with a tRunJob component with either of these two options
enabled will cause your Job to fail.
This check box is not available when the Specify a data
source alias check box is selected.

Specify a data source alias Select this check box and in the Data source alias field
displayed, specify the alias of a data source created on
Talend Runtime side to use the shared connection pool
defined in the data source configuration. This option works
only when you deploy and run your Job in Talend Runtime.
This check box is not available when the Use or register a
shared DB Connection check box is selected.

Advanced settings

Use Auto-Commit Select this check box to activate the auto commit mode.

Auto Commit Select this check box to commit any changes to the
database automatically upon the transaction.
With this check box selected, you cannot use the
corresponding commit component to commit changes
to the database; likewise, when using the corresponding
commit component, this check box has to be cleared. By
default, the auto commit function is disabled and changes
must be committed explicitly using the corresponding
commit component.
Note that the auto commit function commits each SQL
statement as a single transaction immediately after the
statement is executed while the commit component does
not commit only until all of the statements are executed.
For this reason, if you need more room to manage your
transactions in a Job, it is recommended to use the commit
component.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

1857
tJDBCConnection

Usage

Usage rule This component is more commonly used with other


tJDBC* components, especially with the tJDBCCommit and
tJDBCRollback components.

Importing a database driver


To enable a JDBC component work with a specific database, you need to import the corresponding
data driver into the component.

Procedure
1. If the library to be imported isn't available on your machine, either download and install it using
the Modules view or download and store it in a local directory.
2. In the Drivers table, add one row to the table by clicking the [+] button.

3. Click the newly added row and click the [...] button to open the Module dialog box where you can
import the external library.

1858
tJDBCConnection

4. If you have installed the library using the Modules view:


• Select the Platform option and then select the library from the list.
• Select the Artifact repository (local m2/nexus) > Find by name or Artifact repository (local
m2/nexus) > Find by Maven URI option, then specify the full name or Maven URI of the libr
ary module, and click the Detect the module install status button to validate its installation
status.
5. If you have stored the library file in a local directory:
a) Select the Artifact repository (local m2/nexus) option.
b) Select the Install a new module option, and click the [...] button to browse to library file.
c) If you need to customize the Maven URI of the library, select the Custom MVN URI check box,
specify the new URI, and then click the Detect the module install status button to validate its
installation status.

Note:
Changing the Maven URI for an external module will affect all the components and
metadata connections that use that module within the project.
When working on a remote project, your custom Maven URI settings will be automatically
synchronized to the Talend Artifact Repository and will be used when other users working
on the same project install the external module.

6. Click OK to confirm your changes.


The imported library file is listed in the Drivers table.

1859
tJDBCConnection

Note: You can replace or delete the imported library, or import new libraries if needed.

Related scenario
For tJDBCConnection related scenario, see tMysqlConnection on page 2425

1860
tJDBCInput

tJDBCInput
Reads any database using a JDBC API connection and extracts fields based on a query.
tJDBCInput executes a database query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.

tJDBCInput Standard properties


These properties are used to configure tJDBCInput running in the Standard Job framework.
The Standard tJDBCInput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Connection Component Select the component that opens the database connection
to be reused by this component.

JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.

Drivers Complete this table to load the driver JARs needed. To do


this, click the [+] button under the table to add as many
rows as needed, each row for a driver JAR, then select
the cell and click the [...] button at the right side of the

1861
tJDBCInput

cell to open the Module dialog box from which you can
select the driver JAR to be used. For example, the driver jar
RedshiftJDBC41-1.1.13.1013.jar for the Redshift
database.
For more information, see Importing a database driver.

Driver Class Enter the class name for the specified driver between
double quotation marks. For example, for the
RedshiftJDBC41-1.1.13.1013.jar driver, the name
to be entered is com.amazon.redshift.jdbc41.D
river.

Use Id and Password The database user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.

Note: If you make changes, the schema automatically


becomes built-in.

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Table Name The name of the table from which data will be retrieved.

Query Type and Query Specify the database query statement paying particularly
attention to the properly sequence of the fields which must
correspond to the schema definition.
• Built-In: Fill in the query statement in the Query field
manually or click the [...] button next to the Query
field to build the statement graphically using the
SQLBuilder.
• Repository: Select the relevant query stored in the
Repository by clicking the [...] button next to it and
in the pop-up Repository Content dialog box, select
the query to be used, and the Query field will be
automatically filled in.

1862
tJDBCInput

Guess Query Click this button to generate query in the Query field based
on the defined table and schema.

Guess Schema Click this button to generate schema columns based on the
query defined in the Query field.

Specify a data source alias Select this check box and in the Data source alias field
displayed, specify the alias of a data source created on
Talend Runtime side to use the shared connection pool
defined in the data source configuration. This option works
only when you deploy and run your Job in Talend Runtime.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Advanced settings

Use cursor Select this check box to specify the number of rows you
want to work with at any given time. This option optimises
performance.

Trim all the String/Char columns Select this check box to remove leading whitespace and
trailing whitespace from all String/Char columns.

Check column to trim Select the check box for corresponding column to remove
leading whitespace and trailing whitespace from it.
This property is not available when the Trim all the String/
Char columns check box is selected.

Enable Mapping File for Dynamic Select this check box to use the specified metadata
mapping file when reading data from a dynamic type
column. This check box is cleared by default.
With this check box selected, you can specify the metadata
mapping file to use by selecting a type of database from the
Mapping File drop-down list.
For more information about metadata mapping files, see the
section on type conversion of Talend Studio User Guide.

Use PreparedStatement Select this check box if you want to query the database
using a prepared statement. In the Set PreparedStatem
ent Parameters table displayed, specify the value for each
parameter represented by a question mark ? in the SQL
statement defined in the Query field.
• Parameter Index: the position of the parameter in the
SQL statement.
• Parameter Type: the data type of the parameter.
• Parameter Value: the value of the parameter.
For a related use case of this property, see Using
PreparedStatement objects to query data on page 2498.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

1863
tJDBCInput

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

NB_LINE The number of rows processed. This is an After variable and


it returns an integer.

QUERY The query statement being processed. This is a Flow


variable and it returns a string.

Usage

Usage rule This component covers all possible SQL queries for any
database using a JDBC connection.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For related topics, see:
Related topic in tContextLoad: see Reading data from different MySQL databases using dynamically
loaded connection parameters on page 497.

1864
tJDBCOutput

tJDBCOutput
Executes the action defined on the data contained in the table, based on the flow incoming from the
preceding component in the Job.
tJDBCOutput writes, updates, makes changes or suppresses entries in any type of database connected
to a JDBC API.

tJDBCOutput Standard properties


These properties are used to configure tJDBCOutput running in the Standard Job framework.
The Standard tJDBCOutput component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Click this icon to open a database connection wizard and


store the database connection parameters you set in the
component Basic settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.

Connection Component Select the component that opens the database connection
to be reused by this component.

JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.

Drivers Complete this table to load the driver JARs needed. To do


this, click the [+] button under the table to add as many

1865
tJDBCOutput

rows as needed, each row for a driver JAR, then select


the cell and click the [...] button at the right side of the
cell to open the Module dialog box from which you can
select the driver JAR to be used. For example, the driver jar
RedshiftJDBC41-1.1.13.1013.jar for the Redshift
database.
For more information, see Importing a database driver.

Driver Class Enter the class name for the specified driver between
double quotation marks. For example, for the
RedshiftJDBC41-1.1.13.1013.jar driver, the name
to be entered is com.amazon.redshift.jdbc41.D
river.

Use Id and Password The database user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Table Name The name of the table into which data will be written.

Data Action Select an action to be performed on data of the table


defined.
• Insert: Add new entries to the table. If duplicates are
found, job stops.
• Update: Make changes to existing entries.
• Insert or update: Insert a new record in the index pool.
If the record with the given reference already exists, an
update would be made.
• Update or insert: Update the record with the given
reference. If the record does not exist in the index pool,
a new record would be inserted.
• Delete: Remove entries corresponding to the input
flow.

Warning:
It is necessary to specify at least one column as a
primary key on which the Update and Delete operations
are based. You can do that by clicking Edit Schema
and selecting the check box(es) next to the column(s)
you want to set as primary key(s). For an advanced
use, click the Advanced settings view where you can
simultaneously define primary keys for the Update
and Delete operations. To do that: Select the Use field
options check box and then in the Key in update column,
select the check boxes next to the column names you
want to use as a base for the Update operation. Do
the same in the Key in delete column for the Delete
operation.

Clear data in table Select this check box to clear data in the table before
performing the action defined.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.

1866
tJDBCOutput

• Built-In: You create and store the schema locally for


this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
When the schema to be reused has default values that are
integers or functions, ensure that these default values are
not enclosed within quotation marks. If they are, you must
remove the quotation marks manually.
You can find more details about how to verify default values
in retrieved schema in Talend Help Center (https://help.t
alend.com).
Click Edit schema to make changes to the schema.

Note: If you make changes, the schema automatically


becomes built-in.

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Guess Schema Click this button to generate schema columns based on the
settings of database table columns.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows.
When errors are skipped, you can collect the rows on error
using a Row > Reject connection.

Specify a data source alias Select this check box and in the Data source alias field
displayed, specify the alias of a data source created on
Talend Runtime side to use the shared connection pool
defined in the data source configuration. This option works
only when you deploy and run your Job in Talend Runtime.
If you use the component's own DB configuration, your
data source connection will be closed at the end of the
component. To prevent this from happening, use a shared
DB connection with the data source alias specified.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Advanced settings

Commit every Specify the number of rows to be processed before


committing batches of rows together into the database.

1867
tJDBCOutput

This option ensures transaction quality (but not rollback)


and, above all, better performance at executions.

Additional Columns This option allows you to call SQL functions to perform
actions on columns, which are not insert, update or delete
actions, or actions that require particular preprocessing. It is
not offered if you create (with or without drop) the database
table.
• Name: The name of the schema column to be inserted,
or the name of the schema column used to replace an
existing column.
• SQL expression: The SQL statement to be executed in
order to insert or replace relevant column.
• Position: Select Before, After, or Replace
according to the action to be performed on the
reference column.
• Reference column: The name of the reference column
that can be used to locate the new column to be
inserted or that will be replaced.

Use field options Select this check box and in the Fields options table
displayed, select the check box for the corresponding
column to customize a request, particularly if multiple
actions are being carried out on the data.
• Key in update: Select the check box for the
corresponding column based on which data is updated.
• Key in delete: Select the check box for the
corresponding column based on which data is deleted.
• Updatable: Select the check box if data in the
corresponding column can be updated.
• Insertable: Select the check box if data in the
corresponding column can be inserted.

Debug query mode Select this check box to display each step during processing
entries in a database.

Use Batch Select this check box to activate the batch mode for data
processing, and in the Batch Size field displayed, specify the
number of records to be processed in each batch.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Enable parallel execution Select this check box to perform high-speed data processing
by treating multiple data flows simultaneously. This feature
depends on the database or the application ability to handle
multiple inserts in parallel as well as the number of CPU
affected. With this check box selected, you need to specify
the number of parallel executions desired in the Number of
parallel executions field displayed.

Note: When parallel execution is enabled, it is not


possible to use global variables to retrieve return values.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

1868
tJDBCOutput

NB_LINE The number of rows processed. This is an After variable and


it returns an integer.

NB_LINE_INSERTED The number of rows inserted. This is an After variable and it


returns an integer.

NB_LINE_UPDATED The number of rows updated. This is an After variable and it


returns an integer.

NB_LINE_DELETED The number of rows deleted. This is an After variable and it


returns an integer.

NB_LINE_REJECTED The number of rows rejected. This is an After variable and it


returns an integer.

QUERY The query statement being processed. This is a Flow


variable and it returns a string.

Usage

Usage rule This component offers the flexibility benefit of the database
query and covers all of the SQL queries possible.
This component must be used as an output component. It
allows you to carry out actions on a table or on the data of
a table in a JDBC database. It also allows you to create a
reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see Retrieving data
in error with a Reject link on page 2474.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For tJDBCOutput related topics, see:
• Inserting a column and altering data using tMysqlOutput on page 2466.

1869
tJDBCRollback

tJDBCRollback
Avoids commiting part of a transaction accidentally by canceling the transaction committed in the
connected database.

tJDBCRollback Standard properties


These properties are used to configure tJDBCRollback running in the Standard Job framework.
The Standard tJDBCRollback component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Connection Component Select the component that opens the database connection
to be reused by this component.

Close Connection Select this check box to close the database connection once
the component has performed its task.
Clear this check box to continue to use the selected
connection once the component has performed its task.

Advanced settings

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

Usage

Usage rule This component is more commonly used with other tJDBC* components, especially with the
tJDBCConnection and tJDBCCommit components.

Dynamic Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
settings your database connection dynamically from multiple connections planned in your Job. This feature
is useful when you need to access database tables having the same data structure but in different
databases, especially when you are working in an environment where you cannot change your Job
settings, for example, when your Job has to be deployed and executed independent of Talend Studio.
For examples on using dynamic parameters, see Reading data from databases through context-
based dynamic connections on page 2446 and Reading data from different MySQL databases using

1870
tJDBCRollback

dynamically loaded connection parameters on page 497. For more information on Dynamic settings
and context variables, see Talend Studio User Guide.

Related scenario
For tJDBCRollback related scenario, see tMysqlRollback on page 2491

1871
tJDBCRow

tJDBCRow
Acts on the actual DB structure or on the data (although without handling data) using the SQLBuilder
tool to write easily your SQL statements.
tJDBCRow is the component for any type database using a JDBC API. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesn't provide output.

tJDBCRow Standard properties


These properties are used to configure tJDBCRow running in the Standard Job framework.
The Standard tJDBCRow component belongs to the Databases family.
The component in this framework is available in all Talend products.

Note: This component is a specific version of a dynamic database connector. The properties related
to database settings vary depending on your database type selection. For more information about
dynamic database connectors, see Dynamic database components on page 595.

Basic settings

Database Select a type of database from the list and click Apply.

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Connection Component Select the component that opens the database connection
to be reused by this component.

JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.

Drivers Complete this table to load the driver JARs needed. To do


this, click the [+] button under the table to add as many
rows as needed, each row for a driver JAR, then select
the cell and click the [...] button at the right side of the
cell to open the Module dialog box from which you can
select the driver JAR to be used. For example, the driver jar
RedshiftJDBC41-1.1.13.1013.jar for the Redshift
database.

1872
tJDBCRow

For more information, see Importing a database driver.

Driver Class Enter the class name for the specified driver between
double quotation marks. For example, for the
RedshiftJDBC41-1.1.13.1013.jar driver, the name
to be entered is com.amazon.redshift.jdbc41.D
river.

Use Id and Password The database user authentication data.


To enter the password, click the [...] button next to the
password field, and then in the pop-up dialog box enter the
password between double quotes and click OK to save the
settings.

Schema and Edit schema A schema is a row description. It defines the number of
fields (columns) to be processed and passed on to the next
component. When you create a Spark Job, avoid the reserved
word line when naming the fields.
• Built-In: You create and store the schema locally for
this component only.
• Repository: You have already created the schema and
stored it in the Repository. You can reuse it in various
projects and Job designs.
Click Edit schema to make changes to the schema.

Note: If you make changes, the schema automatically


becomes built-in.

• View schema: choose this option to view the schema


only.
• Change to built-in property: choose this option to
change the schema to Built-in for local changes.
• Update repository connection: choose this option
to change the schema stored in the repository and
decide whether to propagate the changes to all the
Jobs upon completion. If you just want to propagate
the changes to the current Job, you can select No upon
completion and choose this schema metadata again in
the Repository Content window.

Table Name The name of the table to be processed.

Query Type and Query Specify the database query statement paying particularly
attention to the properly sequence of the fields which must
correspond to the schema definition.
• Built-In: Fill in the query statement in the Query field
manually or click the [...] button next to the Query
field to build the statement graphically using the
SQLBuilder.
• Repository: Select the relevant query stored in the
Repository by clicking the [...] button next to it and
in the pop-up Repository Content dialog box, select
the query to be used, and the Query field will be
automatically filled in.

Guess Query Click this button to generate query in the Query field based
on the defined table and schema.

Specify a data source alias Select this check box and in the Data source alias field
displayed, specify the alias of a data source created on

1873
tJDBCRow

Talend Runtime side to use the shared connection pool


defined in the data source configuration. This option works
only when you deploy and run your Job in Talend Runtime.
If you use the component's own DB configuration, your
data source connection will be closed at the end of the
component. To prevent this from happening, use a shared
DB connection with the data source alias specified.
This property is not available when other connection
component is selected from the Connection Component
drop-down list.

Die on error Select the check box to stop the execution of the Job when
an error occurs.
Clear the check box to skip any rows on error and complete
the process for error-free rows.
When errors are skipped, you can collect the rows on error
using a Row > Reject connection.

Advanced settings

Propagate QUERY's recordset Select this check box to propagate the result of the query
to the output flow. From the use column list displayed, you
need to select a column into which the query result will be
inserted.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the query's recordset should be set to
the Object type and this component is usually followed by a
tParseRecordSet component.

Use PreparedStatement Select this check box if you want to query the database
using a prepared statement. In the Set PreparedStatem
ent Parameters table displayed, specify the value for each
parameter represented by a question mark ? in the SQL
statement defined in the Query field.
• Parameter Index: the position of the parameter in the
SQL statement.
• Parameter Type: the data type of the parameter.
• Parameter Value: the value of the parameter.
For a related use case of this property, see Using
PreparedStatement objects to query data on page 2498.

Commit every Specify the number of rows to be processed before


committing batches of rows together into the database.
This option ensures transaction quality (but not rollback)
and, above all, better performance at executions.

tStatCatcher Statistics Select this check box to gather the Job processing metadata
at the Job level as well as at each component level.

Global Variables

ERROR_MESSAGE The error message generated by the component when an


error occurs. This is an After variable and it returns a string.

1874
tJDBCRow

QUERY The query statement being processed. This is a Flow


variable and it returns a string.

Usage

Usage rule This component offers the flexibility of the DB query for any
database using a JDBC connection and covers all possible
SQL queries.

Dynamic settings Click the [+] button to add a row in the table and fill the
Code field with a context variable to choose your database
connection dynamically from multiple connections planned
in your Job. This feature is useful when you need to acces
s database tables having the same data structure but in
different databases, especially when you are working in an
environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed
independent of Talend Studio.
For examples on using dynamic parameters, see Reading
data from databases through context-based dynamic
connections on page 2446 and Reading data from different
MySQL databases using dynamically loaded connection
parameters on page 497. For more information on Dynamic
settings and context variables, see Talend Studio User
Guide.

Related scenarios
For related topics, see:
• Combining two flows for selective output on page 2503.
• Procedure on page 622.
• Removing and regenerating a MySQL table index on page 2497.

1875
tJDBCSCDELT

tJDBCSCDELT
Tracks data changes in a source database table using SCD (Slowly Changing Dimensions) Type 1
method and/or Type 2 method and writes both the current and historical data into a specified SCD
dimension table.

tJDBCSCDELT Standard properties


These properties are used to configure tJDBCSCDELT running in the Standard Job framework.
The Standard tJDBCSCDELT component belongs to two families: Business Intelligence and Databases.
The component in this framework is available in all Talend products.

Basic settings

Property Type Select the way the connection details will be set.
• Built-In: The connection details will be set locally for
this component. You need to specify the values for all
related connection properties manually.
• Repository: The connection details stored centrally
in Repository > Metadata will be reused by this
component. You need to click the [...] button next to
it and in the pop-up Repository Content dialog box,
select the connection details to be reused, and all
related connection properties will be automatically
filled in.

Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection
details you already defined.

JDBC URL The JDBC URL of the database to be used. For example,
the JDBC URL for the Amazon Redshift database is
jdbc:redshift://endpoint:port/database.

Driver JAR Complete this table to load the driver JARs needed. To do
this, click the [+] button under the table to add as many
rows as needed, each row for a driver JAR, then select
the cell and click the [...] button at the right side of the
cell to open the Module dialog box from which you can
select the driver JAR to be used. For example, the driver jar

You might also like