Ascential Datastage: Nls Guide
Ascential Datastage: Nls Guide
NLS Guide
Version 6.0
September 2002
Part No. 00D-0007DS60
Published by Ascential Software
Ascential, DataStage and MetaStage are trademarks of Ascential Software Corporation or its affiliates and may
be registered in other jurisdictions
Software and documentation acquired by or for the US Government are provided with rights as follows:
(1) if for civilian agency use, with rights as restricted by vendor’s standard license, as prescribed in FAR 12.212;
(2) if for Dept. of Defense use, with rights as restricted by vendor’s standard license, unless superseded by a
negotiated vendor license, as prescribed in DFARS 227.7202. Any whole or partial reproduction of software or
documentation marked with this legend must reproduce this legend.
Table of Contents
Preface
Organization of This Manual ..................................................................................... vii
Documentation Conventions ....................................................................................viii
Chapter 3. Maps
How Maps Work ......................................................................................................... 3-1
Map Naming Conventions ........................................................................................ 3-4
Creating New Maps .................................................................................................... 3-5
Building and Installing Maps .................................................................................... 3-8
Multibyte NLS Maps and System Delimiters ......................................................... 3-9
Handling Extra Characters ...................................................................................... 3-10
Maps and Files ........................................................................................................... 3-11
iv NLS Guide
Appendix C. NLS Quick Reference
DataStage Commands ............................................................................................... C-1
BASIC Statements and Functions ............................................................................ C-3
Map Tables .................................................................................................................. C-4
DataStage Locales ...................................................................................................... C-6
Unicode Blocks ........................................................................................................... C-7
Table of Contents v
vi NLS Guide
Preface
This guide is for users, programmers, and administrators who are familiar with
DataStage and want to use and manage its National Language Support (NLS)
facilities.
Preface vii
Documentation Conventions
This manual uses the following conventions:
Convention Usage
Bold In syntax, bold indicates commands, function names, and
options. In text, bold indicates keys to press, function names,
menu selections, and MS-DOS commands.
UPPERCASE In syntax, uppercase indicates DataStage commands,
keywords, and options; BASIC statements and functions;
and SQL statements and keywords. In text, uppercase also
indicates DataStage identifiers such as filenames, account
names, schema names, and Windows NT filenames and
pathnames.
Italic In syntax, italic indicates information that you supply. In text,
italic also indicates UNIX commands and options, filenames,
and pathnames.
Courier Courier indicates examples of source code and system
output.
Courier Bold In examples, courier bold indicates characters that the user
types or keys the user presses (for example, <Return>).
[] Brackets enclose optional items. Do not type the brackets
unless indicated.
{} Braces enclose nonoptional items from which you must
select at least one. Do not type the braces.
itemA | itemB A vertical bar separating items indicates that you can choose
only one item. Do not type the vertical bar.
... Three periods indicate that more of the same type of item can
optionally follow.
➤ A right arrow between menu options indicates you should
choose each option in sequence. For example, “Choose
File ➤ Exit” means you should choose File from the menu
bar, then choose Exit from the File pull-down menu.
I Item mark. For example, the item mark ( I ) in the following
string delimits elements 1 and 2, and elements 3 and 4:
1I2F3I4V5
F Field mark. For example, the field mark ( F ) in the following
string delimits elements FLD1 and VAL1:
FLD1FVAL1VSUBV1SSUBV2
Preface ix
x Ascential DataStage NLS Guide
1
What Is NLS?
This chapter gives an overview of what NLS (National Language Support) is, why
you need it, how it works, and what you will find when you install NLS.
Note: This manual uses some terms that may be new to you. When a new term is
introduced, it is printed in italic. This means you can find an entry for the
term in the Glossary at the back of the book.
NLS Mode
DataStage has a special mode that offers National Language Support (NLS). With
NLS mode enabled, you can use DataStage in various languages and countries.
You can do the following:
• Input data in many character sets (dependent on your local keyboard)
• Retrieve data and format it using your own conventions or those of another
country
• Output data to a screen or printer using the character sets and display
conventions of different countries
• Write programs that run in different languages and countries without
source changes or recompilation
About Unicode
The NLS internal character set conforms to the Unicode standard. Unicode
defines characters using 16-bit codes in 4-digit hexadecimal format. The Unicode
standard gives unique character definitions for many languages, as well as many
symbols and special characters.
The Unicode standard forms part of ISO 10646. NLS complies with:
• ISO/IEC 10646-1:1993 Basic Multilingual Plane
• Unicode Version 2.0 (with the exception of Tibetan)
For more information about Unicode, see The Unicode Standard, Version 2.0,
Addison Wesley, ISBN 0-201-48345-9, or the Unicode Consortium’s World Wide
Web page at http://www.unicode.org.
Mapping
When you need to enter, list, print, or transfer data, NLS maps the data to or from
the external character set you want to use. NLS includes map tables for many of
the character sets used in the world (see the list in Appendix C). You can specify
mapping for:
• DataStage files
• Operating system files
• Terminals
• Keyboards and other input devices
• Printers (including auxiliary printers)
• Storage media
• Communications devices
Note: If your files contain only ASCII 7-bit characters, they need not be mapped.
Maps
Maps define how DataStage converts characters in the external character set to
the internal character set, and vice versa. The external character set is what the
user sees and uses to input data on a keyboard, to print reports, and so on.
Appendix C shows the map tables that are supplied with DataStage. For more
information about specifying the correct map for your system, see “Setting
Default Maps and Locales” on page 2-4.
Locales
Strictly speaking, a DataStage NLS locale is a set of national conventions. A locale is
viewed as a separate entity from a character set. You need to consider the
language, character set, and conventions for data formatting that one or more
groups of people use. You define the character set independently, although for
national conventions to work correctly, you must also use the appropriate char-
acter sets. For example, Venezuela and Ecuador both use Spanish as their
language, but have different data formatting conventions.
Locales do not respect national boundaries. One country may use several locales,
for example, Canada uses two and Belgium uses three. Several countries may use
one locale, for example, a multinational business could define a worldwide locale
to use in all its offices. Appendix C lists all the locales that are supplied with
DataStage and the territories and languages associated with them.
National Conventions
A national convention is a standard set of rules that define data formatting a
particular territory uses. NLS supports the following national conventions:
• The format for times and dates
• The format for displaying numbers
• How to display monetary values
• Whether a character is alphabetic, numeric, nonprinting, and so on
• The order in which characters should be sorted (collation)
Time and Date. Most territories have a preferred style for presenting times and
dates. For times, this is usually a choice between a 12-hour or 24-hour clock. For
dates, there are more variations. Here are some examples of formats used by
different locales to express 9.30 at night on the first day of April in 1990:
Collation. This convention defines the order in which characters are collated, that
is, sorted. There can be many variations in collation order within a single char-
acter set. For example, the character Ä follows A in Germany, but follows Z in
Sweden. For an explanation of how NLS determines the sort order for an external
character set, see “How DataStage Collates” on page 4-23.
This chapter tells you how to configure NLS using the maps and locales supplied
with DataStage. Topics include the following:
• Setting the configurable parameters used by NLS
• How to set up maps for devices and files
• How to set up locales
• How to set the maps for client/server programs
Parameter Description
NLSDEFDEVMAP Specifies the name of the default map to use for device
input or output. This map is used for all devices except
printers that do not have a map specified in the
&DEVICE& file. The ASSIGN MAP command over-
rides this setting. The default value is ISO8859-
1+MARKS.
NLSDEFDIRMAP Specifies the name of the default map to use for type 1
and type 19 files without assigned maps. This occurs if
a type 1 or type 19 file was not created on an NLS
system and has not had a map defined for it by the
SET.FILE.MAP command. This map applies only to the
data in records, not to record IDs. The default value is
ISO8859-1+MARKS.
NLSDEFFILEMAP Specifies the name of the default map to use for hashed
files without assigned maps. This occurs if a hashed file
was not created on an NLS system and has not had a
map defined for it by the SET.FILE.MAP command.
The default value is ISO8859-1+MARKS.
NLSDEFGCIMAP Specifies the name of the default map to use for string
arguments passed to and from GCI subroutines. This
map is used if the GCI subroutine does not explicitly
define a map. The default value is ISO8859-1+MARKS.
NLSDEFPTRMAP Specifies the name of the default map to use for printer
output. This map is used if a printer does not have a
map defined for it in the &DEVICE& file. The default
value is ISO8859-1+MARKS.
NLSDEFSEQMAP Specifies the name of the default map to use for
sequential input or output for files or devices without
assigned maps. The SET.SEQ.MAP command overrides
this setting. The default value is ISO8859-1+MARKS.
NLSDEFSRVLC Specifies the name of the default locale to use for
passing data to and from client programs. This locale is
used if the client program does not specify a server
locale. The default value is ISO8859-1+MARKS.
Parameter Description
NLSDEFSRVMAP Specifies the name of the default map to use for passing
data to and from client programs. This map is used if
the client program does not specify a server map. The
default value is ISO8859-1+MARKS.
NLSDEFTERMMAP Specifies the name of the default map to use for
terminal input or output. This map is used if a terminal
does not have a map defined for it in its terminfo defini-
tion. The SET.TERM.TYPE MAP command overrides
this setting. The default value is ISO8859-1+MARKS.
NLSDEFUSRLC Specifies the default locale. The default value is OFF.
NLSLCMODE Specifies whether locales are enabled. A value of 1 indi-
cates that locales are enabled; a value of 0 indicates that
locales are disabled. The default setting is 0. This
parameter has no effect unless NLSMODE is set to 1.
NLSMODE Turns NLS mode on or off. A value of 1 indicates NLS
is on, a value of 0 indicates NLS is off. If NLS mode is
off, DataStage does not check any other NLS
parameters.
NLSNEWDIRMAP Specifies the name of the map to use for new type 1 and
type 19 files created when NLS mode is on. This map
applies only to the data in records, not to record IDs.
The default value is ISO8859-1+MARKS.
NLSNEWFILEMAP Specifies the name of the map to use for new hashed
files created when NLS mode is on. A value of NONE
(the default value) indicates that data is to be held in
the internal DataStage character set.
NLSOSMAP Specifies the name of the map to use for filenames or
record IDs visible to the operating system. This chiefly
affects CREATE.FILE and record IDs written to type 1
or type 19 files. The default value is ISO8859-1.
NLSREADELSE Specifies the action to take if characters cannot be
mapped when a record is read by a READ statement. A
value of 1 indicates that the READ statement takes the
ELSE clause. A value of 0 indicates that unmappable
characters are returned as the Unicode replacement
character 0xFFFD. The default value is 1.
Parameter Description
NLSWRITEELSE Specifies the action to take if characters cannot be
mapped when data is written to a record. A value of 1
indicates that the write aborts or takes the ON ERROR
clause (if there is one). A value of 0 indicates that
unmappable characters are converted to the file map’s
unknown character (for example, ?) before writing the
record. When this happens, some data may be lost.
Setting Locales
UVLANG Environment Variable. To set your initial DataStage locale, use the
UVLANG environment variable. When you start a DataStage session, DataStage
retrieves the value of the UVLANG variable and checks to see if a locale of the
specified name is loaded. If it is, it becomes your current locale.
Direct DataStage connections (uvsh), telnet connections, and BCI connections are
all affected by the UVLANG variable.
System Locale. You can set a locale for your whole system with the NLSDEFU-
SERLC parameter in the uvconfig file. The procedure is described in “Setting
Default Maps and Locales” on page 2-4).
Users can set locales from the DataStage prompt using the SET.LOCALE
command. You can set locales from BASIC programs using the SETLOCALE
function.
For more information about the locale database and how to customize locales, see
Chapter 4, “Locales.”
Updating Accounts
Once NLS mode is enabled, all users who enter DataStage have NLS mode on by
default. All accounts created after NLS mode is enabled can use NLS commands
and functionality.
If you are installing NLS on a system that has previously been running DataStage
without NLS, you must use the NLS.UPDATE.ACCOUNT command to update
all existing accounts. This command ensures that an account contains all of the
correct VOC entries and converts relevant system files for NLS use, e.g.,
char.set.ID is a text string that identifies the character set used by the client. On
Windows systems, the identifier is normally an integer, for example, 1252. On
UNIX systems, the identifier can be any text. An example of a complete client type
and character set identifier is WIN:1252.
Each development environment differs in how you determine which char.set.ID to
use. For example, you can call something like the COleControl::AmbientLocaleID
in an OLE application.
If UniVerse cannot find the client type and character set identifier, it uses a
default. The default is either WIN:DEFAULT or UNX:DEFAULT. If these defaults
• Delete the WIN:1252 entry and set the WIN:DEFAULT entry to point to the
correct NLS map.
• Delete both WIN:1252 and WIN:DEFAULT entries and set the NLSDEFS-
RVMAP configurable parameter to the correct NLS map.
The first of these options is preferable.
locale.ID is a text string that identifies the locale used by the client. On Windows
systems the identifier is a hexadecimal number, for example, 0409. An example of
a complete client type and locale identifier is WIN:0409. On UNIX systems the
identifier can be any text string.
This chapter provides more detailed information about the maps supplied with
DataStage. The topics covered include:
• How DataStage maps work
• Map types
• How to create, build, and install maps
• Extending a character set to cover extra characters
Maps 3-1
the same external byte sequences on output. For a list of the map tables supplied
with DataStage, see “Map Tables” on page C-4.
Input map tables, also known as deadkey tables, are one-way. They define byte
sequences that map from external to internal values only. You use them to enter
characters that a system can display on the screen but that are not on the
keyboard.
Base Maps
A map can be based on another map. When it is, the record in the
NLS.MAP.DESCS file also contains a pointer to the base map. This map can be
based on yet another map. To understand the complete map you must follow the
chain of base maps. For more information about the construction of a map, choose
Mappings ➤ Descriptions ➤ Xref and Mappings ➤ Tables ➤ Xref from the NLS
Administration menu.
For example, the map C0-CONTROLS is a single-byte character set map using the
C0-CONTROLS table. It maps the set of 7-bit control characters. The italic
comments are not part of the record but are added here for clarity.
NLS.MAP.DESCS C0-CONTROLS
0001 Standard ISO2022 C0 control set, chars 00-1F+7F
0002 - Name of base map
0003 SBCS
0004 C0-CONTROLS - Name of map table
NLS.MAP.TABLES C0-CONTROLS
0001 * FIRST 32 CONTROL CHARACTERS (IDENTITY MAP) + DEL
0002 00-1F 0000
0003 7F 007F
In general you can construct larger maps from existing maps by adding another
table. For example, the map ASCII, which maps all of the 7-bit characters, is
constructed by adding the table ASCII to the map C0-CONTROLS:
NLS.MAP.DESCS ASCII
0001 #Standard ASCII 7-bit set
0002 C0-CONTROLS - Name of base map
0003 SBCS
0004 ASCII - Name of map table
NLS.MAP.TABLES ASCII
0001 * 7-BIT ASCII, identity mapping to 1st 127 chars
0002 * (not including control characters - see C0-CONTROLS)
0003 20-7F 0020
You can further modify this map as required. The map ASCII+C1 is constructed
by adding the table ASCII to the map C1-CONTROLS, and the map ISO8859-1 by
adding the table ISO8859-1 to the map ASCII+C1.
Maps 3-3
Map Naming Conventions
Map names must contain only characters in the ASCII-7 character set. The
following map names are reserved and have special meanings:
Avoid defining a map that uses any of the following prefixes or suffixes that are
associated with existing groups of maps:
ASCII… Underlies most other code pages and defines the characters 0000
through 007F.
BIG5… The de facto standard Chinese double-byte character set.
EBCDIC… IBM EBCDIC encodings.
GB… Chinese GB standards (for example, GB2312-80).
ISO8859-nn ISO 8859 series of single-byte character set standards.
KSC… Korean DBCS national standards (for example, KSC5601).
…JIS and JIS… Japanese DBCS national standards (for example, SHIFT-JIS and
JIS-EUC).
MNEMONICS A large set of deadkey sequences for entering Unicode characters
using the form <xx>. For example, <Ye> enters the Yen symbol.
MAC… Apple Macintosh code pages (single-byte character set).
MSnnnn Microsoft Windows code pages. nnnn is four decimal digits.
PCnnn IBM PC code pages. nnn is usually three decimal digits.
Option Description
List Lists all the tables or descriptions.
Create Creates a new record in the NLS database.
Edit Edits a record in the NLS database.
Delete Deletes a record in the NLS database.
Xref Prints cross-reference information on a record.
Maps 3-5
Field Name Description
5 Display length The display length of all characters in the mapping
table specified in field 4. Most double-byte char-
acter sets have some characters that print as two
display positions on a screen (for example, Hangul
characters or CJK ideographs). However, the same
map will usually require that ASCII characters are
printed as one display position. This field does not
pick up a value from any base map description.
The default value is 1.
6 Unknown char This field specifies the character sequence to substi-
seq. tute for unknown characters that do not form part
of the character set. The value, which is a byte
sequence in the external character set, should be a
hexadecimal number from one to four bytes. The
default value is 3F, the ASCII question mark char-
acter. The default is used if neither this map nor
any underlying base map has a value in this field.
7 Compose seq. This field contains the character sequence to
compose hexadecimal Unicode values from one to
four bytes. If DataStage detects the sequence on
input, the next four bytes entered are checked to
see if they are hexadecimal values. If so, the
Unicode character with that value is entered
directly. If neither this map nor any base map has a
value in this field, you cannot input Unicode char-
acters by this means. A value of NONE overrides a
compose sequence set by an underlying map.
8 Input Table ID The name of a map table in NLS.MAP.TABLES to
be used for inputting deadkey sequences.
9 Prefix string A string in hexadecimal numbers to be prefixed to
all external character mappings in the table refer-
enced by field 4. Used mainly for mapping
Japanese character sets.
10 Offset value A value in hexadecimal numbers to be added to
each external mapping in the table referenced by
field 4. If prefixed by a minus sign, the value is
subtracted. Used mainly for mapping Japanese
character sets.
Maps 3-7
• The second value can be one of the following special strings:
Note: The report can be thousands of lines long for large double-byte character
set maps.
Maps 3-9
The DataStage system delimiters are mapped into the following values for each
character set:
CAUTION: Take care when transferring data between sites. Both sites must
agree on the use of positions E000 upward in the Private Use Area,
otherwise you lose data integrity.
Maps 3-11
Use the UNICODE.FILE command to convert a mapped file to an unmapped file,
or vice versa, without making a copy of the file. The conversion process first
checks that all record IDs and data can be read from the file using the correct map.
If record IDs and data cannot be retrieved using the input map, the command
fails. If some characters cannot be converted using the output map, the records
are not written.
This chapter provides more information about how locales work, and how to
modify the locales and conventions supplied with DataStage. The topics covered
include:
• Creating locales and conventions
• The format of convention records
• How DataStage collates
Locales 4-1
The following example shows the record for the US-ENGLISH locale:
Locale name..... USA
Description..... Country=USA, Language=English
Time/Date....... US-ENGLISH
Numeric......... DEFAULT
Monetary........ USA
Ctype........... DEFAULT
Collate......... DEFAULT
.
.
.
Each of the five categories has its own DataStage file that stores the definitions for
these categories. The conventions are grouped together and identified by a name
which is the record ID of an item in the appropriate category file.
For example, the US-ENGLISH conventions for Time /Date are defined by a
record ID of that name in the NLS.LC.TIME file.
The NLS.LC.ALL file acts as an index for the locales. It contains a record for each
locale, such as US-ENGLISH, with fields for each category.
Each field contains a pointer to a record in another file, which is the relevant cate-
gory file. The Time field has a pointer to a record in the NLS.LC.TIME file, the
Numeric field has a pointer to a record in the NLS.LC.NUMERIC file, and so on.
The US-ENGLISH
Points to a record in locale record contains
the corresponding these corresponding
Each category field… file… values…
Time NLS.LC.TIME USA
Numeric NLS.LC.NUMERIC DEFAULT
Monetary NLS.LC.MONETARY USA
Ctype NLS.LC.CTYPE DEFAULT
Collate NLS.LC.COLLATE DEFAULT
This means that a locale can be built from existing conventions without duplica-
tion. Different locales can share conventions, and one convention can be based on
another.
For example, Canada uses the locales CA-FRENCH and CA-ENGLISH. The two
locales are not completely different; they share the same Monetary convention.
Notice that for both locales the Monetary field points to a record in the
NLS.LC.MONETARY file called CANADA. The other fields contain the appro-
priate value for the language concerned.
You examine the conventions defined for a locale using the NLS Administration
menu. Enter the command NLS.ADMIN in the DataStage serevr engine home
account (UV), choose Locales ➤ View. When prompted for a locale ID, enter one
of the IDs shown in Appendix C.
Locales 4-3
Creating Conventions
The conventions supplied with DataStage conform to international standards. For
major languages you should not need to create completely new conventions. To
modify a convention, you create a new convention based on an existing conven-
tion. An outline of the procedure is as follows:
1. Plan your new convention. Study the format of the convention records in each
category and decide which fields you need to modify. See “Format of
Convention Records” on page 4-5.
2. From the NLS Administration menu, choose Categories. Then choose Time,
Numeric, Monetary, Ctype, or Collate.
3. Using the View option, find a convention that looks like what you need. If
you want to create a Collate convention, you may also need to choose a suit-
able weight table. This is explained in “Collating” on page 4-22.
4. Choose the Create option to create the new convention.
5. Choose Edit to change the convention to suit your needs. You are prompted
to edit and save the record using ReVise.
Naming Locales
Locale names can be any string that is a valid DataStage record ID. You must not
use any string that is the same as a VOC record ID. The locales shipped with
DataStage have names that use only ASCII-7 characters, but you can rename them
using different character sets, as appropriate.
Time Records
Convention records in the Time category are stored in the NLS.LC.TIME file. The
following table shows each field number, its display name, and a description for
time and date information:
Locales 4-5
Field Name Description
6 Date ‘DI’ format The default date format for the DI conver-
sion code. The value should be a D
conversion code. The order is specified by
the DMY order (field 23). The separator is
specified by the date separator (field 24).
7 Time ‘MT’ format The default time format for the MT conver-
sion code. The value should be an MT
conversion code. In most cases, use the value
TI.
8 Time ‘TI’ format The format for the TI conversion code. The
value should be an MT conversion code that
specifies separators. The default separator is
a colon (:) as specified by the time separator
(field 25).
9 Days of the week A multivalued list of the full names of the
days of the week. For example, Monday,
Tuesday. Fields 9 and 10 are associated multi-
valued fields; the same number of values
must exist in each field.
10 Abbreviated A multivalued list of abbreviated names of
the days of the week. For example, Mon, Tue.
See field 9.
11 Month names A multivalued list of the full names of the
months of the year. For example, January,
February. Fields 11 and 12 are associated
multivalued fields; the same number of
values must exist in each field.
12 Abbreviated A multivalued list of abbreviated names of
the months of the year. For example, Jan, Feb.
See field 11.
13 Chinese years A multivalued list of Chinese year names
(Monkey to Sheep).
14 AM string A string used to denote times before noon in
12-hour formats.
15 PM string A string used to denote times after noon in
12-hour formats.
Name [ %n ] [ string ]
Name is the era name.
Locales 4-7
%n is a digit from 1 through 9, or the characters +, –, or Y.
string is any text string.
The %n syntax allows era year numbers to be included in the era name and indi-
cates how the era year numbers are to be calculated. If %n is omitted, %1 is
assumed.
The rules for the %n syntax are as follows:
%1 – %9: The number following the % is the number to be used for the first year n
of this era. This is effectively an offset which is added to the era year number. This
will usually be 1 or 2.
%+: The era year numbers count backward relative to year numbers; that is, if era
year number 1 corresponds to Julian year Y, year 2 corresponds to Y–1, year 3 to
Y–2, etc.
%– : The same as for %+, but uses negative era year numbers; that is, first year Y is
–1, Y–1 is –2, Y–2 is –3, and so forth.
%Y: Uses the Julian year numbers for the era year numbers. The year number will
be displayed as a 4-digit year number.
The %+, %–, and %Y syntax should only be used in the last era name in the list of
era names, that is, the first era, since the list of era names must be in descending
date order.
string allows any text string to be appended to the era name. It is frequently the
case that the first year or part-year of an era is followed by some qualifying char-
acters. Therefore, the actual era is divided into two values, each with the same era
name, but one terminated by %1string and the other by %2. You must define the
era names accordingly.
Example
This example shows the contents of the records named DEFAULT and US-
ENGLISH in the NLS.LC.TIME file. The US-ENGLISH record is based on the
ENGLISH.NAMES record. An empty field specifies that its definition is derived
from any category on which it is based. If there is no base category, the default
category is used.
Time/Date Conventions for Locale DEFAULT
Locales 4-9
Heisi 08 JAN 1989
Showa 25 DEC 1926
Taisho 30 JUL 1912
Meiji 08 SEP 1868
HEADING/FOOTING D format. D2-
HEADING/FOOTING T format. MTS
. D2-
Gregorian calendar day 1. 11 JAN 1583
Number of days skipped... 10
Default DMY order........
Default date separator...
Default time separator...
Time/Date Conventions for US-ENGLISH
Chinese years............
AM string................
PM string................
BC string................
Era name................................ Start date....
HEADING/FOOTING D format.
HEADING/FOOTING T format.
Gregorian calendar day 1.
Number of days skipped...
Default DMY order........ MDY
Default date separator...
Default time separator...
This example shows the contents of the records named DEFAULT and
DEC.COMMA+DOT locale (used by DE-GERMAN) in the NLS.LC.NUMERIC
file. The DEC.COMMA+DOT conventions are based on DEFAULT.
Numeric Conventions for DEFAULT
Locales 4-11
Suppress leading zero. 0
Alternative digits (0 first).
Monetary Records
Convention records in the Monetary category are stored in the NLS.LC.MONE-
TARY file. The following table shows each field number, its display name, and a
description:
Locales 4-13
Field Name Description
12 Negative currency The format for negative monetary amounts.
format This is expressed using a combination of the
characters $ S – 1 and a space. The $ or S
represents the local currency symbol. 1 repre-
sents the monetary amount. – represents the
negative sign. If the negative sign (field 10)
contains two characters the – sign is ignored.
For example, the value –$1 in a PORTU-
GUESE locale results in the format –1,234$56.
The value $ –1 in a DUTCH locale results in
the format F1 –1.234,56.
Locales 4-15
Monetary thousands separator.. . - FULL STOP
Local currency symbol......... NONE
International currency symbol. PTE<SP>
Decimal places................ 2
International decimal places.. 2
Positive sign................. NONE
Negative sign................. - - HYPHEN-MINUS
Positive currency format...... 1 S
Negative currency format...... -1 S
The following table shows how the data in the previous records affect monetary
formats:
Note: Italian lire are usually quoted in whole numbers only. Your programs must
detect that the DEC_PLACES and INTL_DEC_PLACES fields contain zero
in this case and not hard code an MD2 conversion. An MM conversion
handles the scaling automatically.
Ctype Records
Convention records in the Ctype category are stored in the NLS.LC.CTYPE file.
The following table shows each field number, its display name, and a description.
Note: For fields 3 onward, you can enter the values as characters or as Unicode
values. You can specify a range of values separated by a dash (–).
Locales 4-17
Field Name Description
13 Trimmables A multivalued list of characters that are to be
removed by TRIM functions in addition to
spaces and tab characters.
In Spanish, accented characters other than ñ drop their accents when converted to
uppercase. In French, all accented characters drop their accents in uppercase.
This example shows a convention called NOACCENT.UPCASE, which the locale
FR-FRENCH uses, and a convention called SPANISH, that is based on it.
Note: In this example, the only characters affected are those in general use in
French and Spanish. There are many other accented characters in Unicode.
This example displays <N?> that comes from the MNEMONICS map.
This lets you easily enter non-ASCII characters rather than their Unicode
values.
Alphabetics.....
Non-Alphabetics.
Numerics........
Non-Numerics....
Printables......
Non-Printables..
Trimmables......
Alphabetics.....
Non-Alphabetics.
Numerics........
Non-Numerics....
Printables......
Non-Printables..
Trimmables......
Collate Records
Convention records in the Collate category are stored in the NLS.LC.COLLATE
file. The following table shows each field number, its display name, and a descrip-
tion. Many of the fields are Boolean. An empty field or a value of 0 or N indicates
false; any other value indicates true.
Locales 4-19
Field Name Description
3 Accented Sort? This field determines how accents on characters
affect the collate order. A false value indicates that
accents are not collated separately. A true value indi-
cates that accents are used as tie breakers in the sort.
See “Collating” on page 4-22.
4 In reverse? If field 3 indicates an accented collation, this field
determines the direction of that collation. A false
value indicates forward collation. A true value indi-
cates reverse collation.
5 Cased Sort? This field determines whether the case of a character
is considered during collation. A false value indicates
that case is not considered. A true value indicates
that case is used as a tie breaker in the collation.
6 Lowercase first? If field 5 indicates a cased collation, this field deter-
mines which case is collated first. A false value
indicates that lowercase is collated first. A true value
indicates that uppercase is collated first.
7 Expand A multivalued field containing Unicode values of
characters that are expanded before collation. See
“Contractions and Expansions” on page 4-24.
8 Expanded A multivalued field associated with field 7 that
supplies the values the characters expand to. Each
value may be one or more Unicode values separated
by tab characters or spaces. To override an expansion
inherited from a based convention named in field 2,
enter the same multivalue in fields 7 and 8. (For
another method, see the description of field 10.)
9 Before? A multivalued field associated with fields 7 and 8
that determines how expanded characters collate. A
false value indicates that a character is collated after
expansion; a true value indicates that a character is
collated before expansion.
10 Contract A multivalued field containing a list of pairs of
Unicode values of characters after contraction. The
values should be separated by tab characters or
spaces. To override an expansion inherited from a
based convention named in field 2, enter a value in
this field and a corresponding empty value in field
11. See “Contractions and Expansions” on page 4-24.
Locales 4-21
Weight Tables....
Collating
Collating is a complex issue for many languages. It is not sufficient to collate a
character set in numerical order of its Unicode values. Locales that share a char-
acter set often have different collating rules. For example, these are the main
issues that affect collating in Western European languages:
• Accented characters. Should accented characters come before or after their
unaccented equivalents? Or should accents only be examined if two strings
being compared would otherwise be identical (that is, as a tie breaker)?
• Expanding characters. Some languages treat certain single characters as
two separate characters for collating purposes.
• Contracting characters. Some languages have pairs of characters that
collate as though they were a single character.
• Should case be considered? Should case be used as a tie breaker for other-
wise identical strings? If so, which comes first, uppercase or lowercase?
• Should hyphens or other punctuation be considered as tie breakers?
Shared weight All characters that are essentially the same have the same shared
weight, even though they may differ in accent or case.
Accent weight This weight shows the order of precedence for accented charac-
ters. The Collate convention determines the direction of the
collation.
Case weight This weight differentiates between uppercase and lowercase
characters. The Collate convention determines which case has
precedence.
In the accented collation, the words are in the order they would be found in a
French dictionary. (It is actually a reverse accented collation.) Each accented char-
acter has the same shared weight as it would have without the accent. The order is
decided by referring to the accent weight.
Locales 4-23
In the unaccented collation, each accented character has a different shared weight
unrelated to its unaccented equivalent. The order is decided by the shared weight
alone.
In the cased collation, Aaron follows aardvark because the characters ‘A’ and ‘a’
have the same shared weight. The case weight is only considered for the two
strings that are otherwise identical, that is, Aardvark and aardvark.
In the uncased collation, Aaron precedes aardvark because the characters ‘A’ and
‘a’ have different shared weights.
Locales 4-25
value. A list of conventional values to assign to this field can be found by listing
records starting with “CW…” in the NLS.WT.LOOKUP file.
comments can contain any characters.
( BW x 224 ) + ( SW x 29 )
BW is the character’s Unicode block number. SW depends on its position within
the block: the first character has a SW of 1, the second a SW of 2, and so on.
Using Locales
From within a BASIC program you can do the following:
• Retrieve the current locale name of a specified category
• Save the current locale settings
• Restore the saved locale settings
• List the current locale settings
• Change the current locale settings
For information about using functions to do these tasks from within BASIC
programs, see Chapter 5.
Locales 4-27
• From a BASIC program using the SETLOCALE function. This is described
in detail in “Changing the Current Locale” on page 5-19.
A locale is always set up and saved when you enter DataStage. You can restore
this initial locale using RESTORE.LOCALE if you have not issued a
SAVE.LOCALE command during your DataStage session. SAVE.LOCALE and
RESTORE.LOCALE return errors if they are issued when locales are turned off,
that is, if either the NLSLCMODE or NLSMODE configurable parameters in the
uvconfig file is set to 0.
Note: When you want to specify numeric and monetary formatting for a locale,
you must set both the Numeric and Monetary categories to something
other than OFF, for example, DEFAULT. If not, DataStage treats BASIC
conversions, such as MD, ML, and MR, as if locales are turned off.
This chapter describes how DataStage BASIC programs use NLS. The topics
covered include:
• How BASIC is affected by NLS.
• Display length in BASIC. This describes how to accommodate the differ-
ence between a character’s display length and its string length.
• Maps in DataStage BASIC. This covers how maps are used by files and
devices, how to set and modify maps, and how BASIC handles unmap-
pable characters.
• Multinational characters in BASIC. This describes how you can include
multinational characters in source code, specify them for printing, or edit
them using ED.
• Using locales in BASIC. This topic describes how to set or query a locale
from within a program.
The UVNLS.H include file also gives the internal character set values of the
DataStage system delimiters.
Here is a program example that examines the current NLS settings:
$INCLUDE UNIVERSE.INCLUDE UVNLS.H
IF SYSTEM(NLS$ON)
THEN PRINT "Terminal map set to: ":SYSTEM(NLS$TERMMAP)
ELSE PRINT "NLS is not enabled"
FILEINFO Function
To use the FILEINFO function to determine a file’s map name, use the
FINFO$NLSMAP value. A token is defined in the FILEINFO.H include file as
follows:
The following example returns the map currently used by the VOC file:
$INCLUDE UNIVERSE.INCLUDE FILEINFO.H
OPEN "VOC" TO filevar
ELSE STOP "Cannot open the VOC file"
mapname = FILEINFO(filevar, FINFO$NLSMAP)
PRINT "Map in use for the VOC is: ":FIELD(mapname, ’(’, 1)
If these map entries are not set in the terminfo file, the default specified in the
NLSDEFTERMMAP parameter of the uvconfig file is used. If the terminfo record
specifies maps that are not installed, the defaults are used and you may see a
warning.
CAUTION: The maps named in terminfo may not be the current terminal map.
For example, the value can be overridden by a SET.TERM.TYPE
command. Do not use the TERMINFO function or the @ function to
read the terminfo values. Use the GETPU subroutine, the
GET.TERM.TYPE command, or the SYSTEM function instead.
If this token is used to call !GETPU when NLS is disabled, the following run-time
warning message is issued:
Program "!GETPU": pc = nnnn, Unsupported option "PU$NLSMAP".
Ignored.
This code example finds the name of the map associated with print channel 0:
$INCLUDE UNIVERSE.INCLUDE GETPU.H
CALL !GETPU(PU$NLSMAP, 0, mapname, code)
PRINT "Map in use for print unit 0 is: ":mapname
Note: This is different from the case when a record does not exist, where
STATUS returns 0.
• If the unmappable character is in the record’s data, the record is read, and
the unmappable characters are replaced with the Unicode replacement
character (value xFFFD). No message is displayed, and data is lost.
Note: If your program source uses a CHAR (nnn) function, it must be recom-
piled for use in NLS mode.
Value Description
0 The conversion succeeds.
1 The map name supplied is invalid, an empty string is returned.
2 The conversion is invalid or NLS is not enabled.
3 Some characters of the converted string could not be mapped, and the
returned string contains replacement characters.
Use UPRINT instead of PRINT (which treats string as being in internal format) to
print the external format string returned by OCONV NLSmapname.
For example:
UPRINT OCONV(VAR, "NLSSHIFT-JIS")
Note: The MU0C conversion code uses four hexadecimal digits. The MX0C
conversion code treats strings as two hexadecimal digits per byte, and
does not know about internal Unicode format.
Value Description
0 The conversion succeeds.
2 The conversion is invalid or NLS is not enabled.
The following example shows internal to external byte sequences for several
characters:
X = UNICHAR(222):UNICHAR(240):@FM
PRINT "Internal form in hex bytes is: ":OCONV(X, ’MX0C’)
Y = OCONV(X, ’NLSISO8859-1’)
PRINT "External form in hex bytes is: ":OCONV(Y, ’MX0C’)
PRINT "Internal form in Unicode is: ":OCONV(X, ’MU0C’)
The characters in the output are separated by spaces in order to display the differ-
ences more easily. For example, C39E represents 222 in the internal form in
DataStage, DE represents 222 in the external byte sequence as it is displayed on
the terminal, and 00DE represents 222 in the Unicode byte sequence.
Likewise, C3B0 represents 240 in the internal form in DataStage, F0 represents 240
in the external byte sequence for the terminal, and 00F0 represents 240 in the
Unicode byte sequence.
In the final column, FE is the internal representation of @FM, 3F (the Unicode
character ?) represents the external byte sequence for the terminal, and F8FE
represents the Unicode byte sequence.
The COPY, CP, and CT commands have a HEX option to display the contents of a
record in hexadecimal digits, and a UNICODE option to display the Unicode
values of the characters. For the Pick version of the COPY verb, you specify (U
instead of UNICODE, and (H instead of HEX.
For example, if a record contains the string ABC in field 1 and ÄßÇ in field 2,
using the HEX option, you see the following with NLS mode off. In field 1 the 41
is the ASCII code for A, and C4 is the (single byte) ASCII code for Ä.
>COPY FROM VOC ’EXAMPLE’ CRT HEX
EXAMPLE
0001 414243
0002 C4DFC7
EXAMPLE
0001 414243
0002 C384C39FC387
ABC uses one byte per character in internal format (line 0001) whereas ÄßÇ uses
two bytes per character (line 0002). Field 1 contains 41, the (single byte) internal
code for A, and field 2 contains C384, the (double byte) internal code for Ä.
Using the UNICODE option you see the following:
>COPY FROM VOC ’EXAMPLE’ CRT UNICODE
EXAMPLE
0001 004100420043
0002 00C400DF00C7
This chapter describes the structure and content of the NLS Administration
menus.
You must be a DataStage Administrator in the DataStage server engine account
(UV) to use the menus. To display the main NLS Administration menu, use the
NLS.ADMIN command. The NLS Administration menu has the following
options:
• Unicode. This option lets you examine the Unicode character set using
various search criteria.
• Mappings. This option lets you view, create, or modify map descriptions or
map tables.
• Locales. This option lets you view, create, or modify locale definitions.
• Categories. This option lets you view, create, or modify category files and
weight tables.
• Installation. This option lets you install maps into shared memory or edit
the uvconfig file.
The options lead to further menus that are described in the following sections.
Unicode Menu
Use the Unicode menu to examine the Unicode character set. The following
options are available:
• Characters. This option leads to a further menu containing the following
options:
– List All descriptions. Provides a very long listing of all the Unicode
characters.
Locales Menu
Use the Locales menu to examine, create, and edit locale definitions. The
following options are available:
• List All. Lists all the locales that are available in DataStage, that is, all the
records in the NLS.LC.ALL file. You may need to build the locales in order
to install them into shared memory.
• View. Prompts you for the name of a locale, then lists the record for that
locale.
• Create. Creates a new locale record.
• Edit. Edits an existing locale record.
• Delete. Deletes a locale record
• Xref. Cross-references a locale. This lets you see the relationship between
various locale definitions.
• Clients. Administers the NLS.CLIENT.LCS file, which provides synonyms
between locale names on a client, and the DataStage NLS locales on the
server. You can list, create, edit, and delete records using this option.
Categories Menu
From the Categories menu you can administer the NLS category files for different
types of convention. The following options are available:
• Time/date
• Numeric
• Monetary
• Ctype
• Collate
• Weight tables
• Language info
The first five options call submenus that let you list, view, create, edit, delete, and
cross-reference records in the specific category. The final two options have differ-
ences as described below.
• Weight tables. This option has two additional suboptions as follows:
– Accent weights. This option lists all the records in the
NLS.WT.LOOKUP file that refer to accents.
– Case weights. This option lists all the records in the NLS.WT.LOOKUP
file that refer to casing.
• Language info. This option administers the NLS.LANG.INFO file and lets
you list, view, create, edit, delete, and cross-reference records in the file.
Installation Menu
Use the Installation menu to edit the system configuration file or to install maps in
shared memory. The following options are available:
• Edit uvconfig. This option lets you edit the configurable parameters in the
uvconfig file. You can edit all the parameters, or just those referring to NLS,
maps, locales, or clients.
• Maps. This option leads to a further menu with the following options:
– Configure. Runs the NLS map configuration program.
This appendix describes the files in the NLS database. The NLS database is in the
nls subdirectory of the server engine directory. The nls directory contains the
subdirectories charset, locales, and maps.
Each subdirectory of the NLS directory contains further subdirectories, such as
the listing and install subdirectories. listing contains listing information generated
when building maps and locales (if the user selects this option). install contains
the binary files that are loaded into memory.
You should use the NLS.ADMIN command to perform all NLS administration.
The VOC names for NLS files start with the prefix NLS (this prefix is absent if you
view the files from the operating system). The second part of the filename indi-
cates the logical group that the file belongs to. The logical groups are as follows:
The third part of the filename indicates the contents of the file. For example, the
file called NLS.LC.COLLATE is an NLS file belonging to the locales group that
contains information about collating sequences.
File Description
NLS.CLIENT.LCS Defines the locales to be used by client programs
connecting to DataStage. For a description of the record
format for this file, see “Locales for Client Programs”
on page 2-10.
NLS.CLIENT.MAPS Defines the character set used by client programs. For a
description of the record format for this file, see “Maps
for Client Programs” on page 2-9.
NLS.CS.ALPHAS Defines which characters are defined as alphabetic in
the Unicode standard. Each record ID is a hexadecimal
code point value that indicates the start of a range of
characters. The record itself specifies the last character
in the range. These default values can be overridden by
a national convention. You should not modify this file;
it is for information only.
NLS.CS.BLOCKS Defines the blocks of consecutive code point values for
characters that are normally used together as a set for
one or more languages. The record IDs are block
numbers. This file is cross-referenced by the
NLS.CS.DESCS file. You should not modify this file; it
is for information only.
NLS.CS.CASES Defines those characters that have an uppercase and
lowercase version, and how they map between the two,
according to the Unicode standard. These default
values can be overridden by a national convention.
Each record ID is the hexadecimal code point value for
a character. You should not modify this file; it is for
information only.
NLS.CS.DESCS Contains descriptions of every character supported by
DataStage NLS. Each character has its own record,
using its hexadecimal code point value as the record
ID. The descriptions are based on those used by the
Unicode standard. You should not modify this file; it is
for information only.
File Description
NLS.CS.TYPES Defines which characters are numbers, nonprintable
characters, and so on, according to the Unicode stan-
dard.These default values can be overridden by a
national convention. Each record ID is the hexadecimal
code point value for a character. You should not modify
this file; it is for information only.
NLS.LANG.INFO Contains information about languages. Provides
possible mappings between language, locale and char-
acter set map. It is used for installing NLS and
reporting on locales, and should not be modified.
NLS.LC.ALL Holds records for all the locales known to DataStage.
The record IDs are the locale names. The fields of each
record are the IDs of records in other locale files. These
files contain data about the categories that make up a
locale (Time, Numeric, and so on). For a description of
the record format for this file, see “Creating New
Locales” on page 4-4.
NLS.LC.COLLATE Each record in this file defines a collating sequence
used by a locale. The collating sequences are defined
according to how they differ from the default collating
sequence. For a description of the record format for this
file, see “Format of Convention Records” on page 4-5.
NLS.LC.CTYPE Each record in this file holds character typing informa-
tion used in a locale, that is, which characters are
alphabetic, numeric, lowercase, uppercase,
nonprinting, and so on. The character types are defined
according to how they differ from the default character
typing. For a description of the record format for this
file, see “Format of Convention Records” on page 4-5.
NLS.LC.MONETARY Each record in this file holds the monetary formatting
convention used in a locale. For a description of the
record format for this file, see “Format of Convention
Records” on page 4-5.
NLS.LC.NUMERIC Each record in this file holds the numeric formatting
convention used in a locale. For a description of the
record format for this file, see “Format of Convention
Records” on page 4-5.
File Description
NLS.LC.TIME Each record in this file holds the time and date format-
ting convention for a locale. For a description of the
record format for this file, see “Format of Convention
Records” on page 4-5.
NLS.MAP.DESCS Contains descriptions of every map known to
DataStage. The record ID of each map is the map name
used in DataStage commands or BASIC programs. The
record IDs must comprise ASCII-7 characters only. For
a description of the record format for this file, see
“Creating a Map Description” on page 3-5.
NLS.MAP.TABLES A type 19 file that contains the map tables for mapping
an external character set to the DataStage internal char-
acter set. For more information about the structure of
this file, see “Creating a Map Table” on page 3-7.
NLS.WT.LOOKUP Contains weightings given to characters during a sort,
based on the Unicode standard. This file should not be
modified.
NLS.WT.TABLES Contains specific weight information about characters
used in a locale. For more information about the struc-
ture of this file, see “Editing Weight Tables” on
page 4-25.
The national conventions support described in DataStage NLS Guide does not
cover all needs. It is designed to be as table-driven as possible, with all tables
visible to and changeable by a knowledgeable user. For maximum flexibility, we
also support user-written code hooks. These are routines you write to implement
specific NLS functions and then hook them into DataStage on request.
Hooks are points in DataStage code where an NLS convention is in force; at such
points, user-written code can be plugged in to intercept an action that NLS would
otherwise do. Hook routines must be written in C. Each routine has a fixed name
and interface, as described later.
All string data is passed in and out of hooks in external format (i.e., as multibyte
8-bit strings). That is, a map name (other than UNICODE) associated with a hook
is used to map string data from DataStage internal format to external format
before calling the hook. All hooks for a particular locale specify the same map
name. To accommodate CHAR(0) bytes, STRING data types are used (a variable-
length character string) rather than null-terminated C strings.
This hook mechanism is available only if both NLS mode and locale support are
enabled. The hooks also introduce some areas of potential internationalization that
are not otherwise supported by NLS, notably:
• Specialized FMT format codes
• Soundex ‘sounds-like’ replacement
The ih_xxx_HID routines in the sample/NLSHKtmplt.c file ignore all input argu-
ments and simply return NLSHK_HKE_NO_CONV.
Memory Management
Hook routines are responsible only for the memory they allocate to perform their
allotted function, i.e., memory for return parameters and temporary variables.
They do not need to worry about memory occupied by input parameters;
DataStage deals with this.
Memory must be allocated and freed using the standard system memory allocator
interfaces: malloc (and realloc) to allocate memory and free to deallocate it.
HID is the hook library identifier, e.g., HEBREW. This identifies the initialization
routine to DataStage and lets it be called.
Hook Functions
The initialization function ih_init_HID initializes each element of the Hook table
to a corresponding hook function or sets it to null as shown below. You should
replace HID with your hook library ID. In the example only the CASE hook is
supplied.
void ih_init_HID()
{
NLSHKHookTable[NLSHK_TABLE_CASE] = ih_case_HID;
NLSHKHookTable[NLSHK_TABLE_COMPARE] = 0;
NLSHKHookTable[NLSHK_TABLE_CTYPE] = 0;
NLSHKHookTable[NLSHK_TABLE_FMT] = 0;
NLSHKHookTable[NLSHK_TABLE_ICONV] = 0;
NLSHKHookTable[NLSHK_TABLE_LENDP] = 0;
NLSHKHookTable[NLSHK_TABLE_MATCH] = 0;
NLSHKHookTable[NLSHK_TABLE_OCONV] = 0;
NLSHKHookTable[NLSHK_TABLE_SOUNDEX] = 0;
NLSHKHookTable[NLSHK_TABLE_TRIM] = 0;
}
Argument Description
in_str The input string.
replaced_char Set to 1 if a character was replaced in in_str.
out_str Output STRING variable whose text field is malloc’d by the
hook function if the hook function returns NLSHK_HKE_OK
or NLSHK_HKE_SOME_CONV.
conv_type Input argument to contain NLSHK_CT_DOWNCASE or
NLSHK_CT_UPCASE.
Argument Description
in_str1 The first input string.
rep_char1 Set to 1 if a character was replaced in in_str1.
in_str2 The second input string.
rep_char2 Set to 1 if a character was replaced in in_str2.
type Input arguments to contain the following values while
justprec pretval is an output argument:
• COMPARE
type is NLSHK_CO_COMPARE.
justprec contains 0 for left justification (default), 1 for right
justification.
Argument Description
• Simple comparisons of the type <, =, >, LE, GE, NE
type is one of NLSHK_CO_GREATER,
NLSHK_CO_GTEQUAL, NLSHK_CO_EQUAL,
NLSHK_CO_NEQUAL, NLSHK_CO_LTEQUAL, or
NLSHK_CO_LESSTHAN depending on the type of
comparison being done.
justprec is the current precision (if required) or 0.
• Vector comparisons like LES, LTS, GTS, GES, EQS, NES.
type is one of NLSHK_CO_GREATER,
NLSHK_CO_GTEQUAL, NLSHK_CO_EQUAL,
NLSHK_CO_NEQUAL, NLSHK_CO_LTEQUAL, or
NLSHK_CO_LESSTHAN depending on the type of
comparison being done.
justprec is the current precision (if required) or 0.
pretval Must be set to one of the following if the return value is
NLSHK_HKE_OK:
<0 If in_str1 is less than in_str2
0 If in_str1 and in_str2 are equal
>0 If in_str1 is greater than in_str2
The ctype hook function is called in response to a call to the BASIC function
ALPHA, which checks whether a string is alphabetic. The hook function must be
defined as follows:
Argument Description
in_str The input string.
replaced_char Set to 1 if a character was replaced in in_str.
pretval Must be set to one of the following if the return value is
NLSHK_HKE_OK:
1 If in_str is alphabetic
0 If in_str is not alphabetic
The match hook function is called in response to a call to the BASIC function
MATCH or MATCHFIELD, which check for the presence of a pattern in a string.
The hook function must be defined as follows:
Argument Description
in_str1 The input string.
rep_char1 Set to 1 if a character was replaced in in_str1.
mask_str The mask to use.
rep_char2 Set to 1 if a character was replaced in mask_str.
out_str Output STRING variable whose text field is malloc’d by the
hook function in certain cases.
fieldnum 0 if the hook function is for MATCH, otherwise the field
number specified to the MATCHFIELD function.
pmatched Output parameter that indicates whether a match was found
(see below).
The format hook function is called in response to a call to the BASIC functions
FMT and FMTS. The hook function must be defined as follows:
Argument Description
in_str The input string.
replaced_char Set to 1 if a character was replaced in in_str.
out_str Output STRING variable whose text field is malloc’d by the
hook function if the hook function’s return value is
NLSHK_HKE_OK or NLSHK_HKE_SOME_CONV.
fmt_code Input argument to contain the format code supplied to FMT
or FMTS.
options_flag Input argument to contain one of the following:
IDEAL_FLAVOR, PICK_FLAVOR, INFO_FLAVOR,
REAL_FLAVOR, IN2_FLAVOR, PIOPEN_FLAVOR
Also, if fmt_code is in display positions, the options_flag is
ORed with DP_FLAVOR.
See the file gcidir/include/flavor.h for these tokens.
precision The current DataStage precision.
The iconv and oconv hook functions are called in response to a call to the BASIC
functions ICONV, OCONV, ICONVS, or OCONVS. The hook function must be
defined as follows:
Argument Description
in_str The input string.
replaced_char Set to 1 if a character was replaced in in_str.
out_str Output STRING variable whose text field is malloc’d by the
hook function if the hook function’s return value is
NLSHK_HKE_OK or NLSHK_HKE_SOME_CONV.
conv_code Input argument to contain the conversion code to apply.
options_flag Input argument to contain one of the following:
IDEAL_FLAVOR, PICK_FLAVOR, INFO_FLAVOR,
REAL_FLAVOR, IN2_FLAVOR, PIOPEN_FLAVOR
See the file gcidir/include/flavor.h for these tokens.
The lendp hook function is called in response to a call to the BASIC functions
LENDP and LENSDP. The hook function must be defined as follows:
Argument Description
in_str The input string.
replaced_char Set to 1 if a character was replaced in in_str.
pretval Must be set to the length in display positions of the input
string when the return value is NLSHK_HKE_OK.
The soundex hook function is called in response to a call to the BASIC function
SOUNDEX. The hook function must be defined as follows:
Argument Description
in_str The input string.
replaced_char Set to 1 if a character was replaced in in_str.
out_str Output STRING variable whose text field is malloc’d by the
hook function if the hook function returns NLSHK_HKE_OK
or NLSHK_HKE_SOME_CONV.
The trim hook function is called in response to a call to the BASIC functions
TRIM, TRIMB, TRIMF, TRIMS, TRIMBS, and TRIMFS. For TRIM, the hook func-
tion is called only if expression is the sole argument specified in the TRIM function
call (see DataStage BASIC for more details). The hook function must be defined as
follows:
Argument Description
in_str The input string.
replaced_char Set to 1 if a character was replaced in in_str.
out_str Output STRING variable whose text field is malloc’d by the
hook function if the hook function returns NLSHK_HKE_OK
or NLSHK_HKE_SOME_CONV. The length of the output
must not be greater than the length of in_str.
trim_type Input argument, which is one of the following:
NLSHK_TT_TRIM, NLSHK_TT_TRIMB, NLSHK_TT_TRIMF
This appendix contains reference tables for NLS, including the following:
• DataStage commands that are available only in NLS mode
• DataStage commands that support NLS features
• Useful NLS commands
• BASIC functionality that is available only in NLS mode
• Map tables that are supplied with NLS
• DataStage locales
• Unicode blocks
DataStage Commands
Table C-1 lists DataStage commands that are available only in NLS mode.
Command Description
GET.FILE.MAP Displays the map name associated with the speci-
fied file.
GET.LOCALE Retrieves the current locale settings.
LIST.LOCALES Lists the current locales.
LIST.MAPS Lists maps that are built and installed in shared
memory.
NLS.UPDATE.ACCOUNT Updates an account to NLS mode.
RESTORE.LOCALE Restores a locale.
SAVE.LOCALE Saves a locale.
SET.FILE.MAP Associates a map name with a file.
Command Description
SET.GCI.MAP Sets a map for passing character string parameters
to and from GCI subroutines.
SET.LOCALE Sets or restores a locale.
SET.SEQ.MAP Associates a map with sequential I/O.
UNICODE.FILE Converts a mapped file to the DataStage internal
character set, or vice versa, without copying the
file.
Table C-2 lists DataStage commands that behave differently in NLS mode.
Table C-2. Commands That Change in NLS Mode
Command Description
EDIT.CONFIG Edits the uvconfig file. This command is also available by
choosing Installation ➤ Edit uvconfig from the NLS
Administration menu.
NLS.ADMIN Enters the NLS Administration menu system.
Statement/Function Description
AUXMAP Switches to a terminal’s auxiliary map.
FILEINFO Returns a file’s map name.
FMTDP Formats a string in display positions rather than char-
acter positions. IF NLS mode is off, FMTDP acts like
FMT.
FMTSDP Formats a dynamic array in display positions rather than
character positions. If NLS mode is off, FMTSDP acts like
FMTS.
FOLDDP Determines where to fold a string using display posi-
tions. If NLS mode is off, FOLDDP acts like FOLD.
FOOTING Calculates gaps in footings using display positions.
GETLOCALE Retrieves the names of specified categories of the current
locale.
HEADING Calculates gaps in headings using display positions.
ICONV Uses the NLS, MU0C, other new conversion codes.
INPUTDP Defines input formats using display positions.
LENDP Returns the length of a string in display positions. If NLS
mode is off, LENDP acts like LEN.
Statement/Function Description
LENSDP Returns the length of a dynamic array in display posi-
tions. If NLS mode is off, LENSDP acts like LEN.
LOCALEINFO Retrieves the settings of the current locale.
OCONV Uses the NLS, MU0C, and other new conversion codes.
SETLOCALE Changes the setting of one or all categories for the
current locale.
STATUS Returns additional values for READ and WRITE state-
ments that encounter unmappable characters.
SYSTEM Returns a value to indicate the current NLS mode and
other NLS parameters.
UNICHAR Generates a single character in external format.
UNICHARS Generates a dynamic array in external format.
UNISEQ Returns the Unicode value of a single character in
internal format.
UNISEQS Returns a dynamic array of Unicode values in internal
format.
UPRINT Sends data to a printer without using the printer’s map.
!GETPU Determines the map name associated with a print
channel.
Map Tables
The following list shows all the map tables for major character sets used world-
wide that are supplied with DataStage. The left column contains the name of the
map, the middle column contains the name of the map table used by the map (in
NLS.MAP.TABLES), and the right column contains a description of the map.
MAP.DESCS...... Table ID....... Map description..................................
DataStage Locales
The following list shows the locales supplied with DataStage, the territory that
uses each locale, and the relevant language:
NLS.LC.ALL..... Description............................................
Unicode Blocks
Unicode is divided into blocks of related characters. These correspond approxi-
mately to the scripts used for different families of languages. Characters allocated
within blocks have a code value and a description. The description must use
uppercase A through Z, hyphen, and digits 0 through 9 only. In DataStage NLS,
the blocks are allocated numbers starting from 1. The main blocks are shown in
Table C-5.
Glossary-1
JEF character set A Fujitsu proprietary encoding of several thousand
characters. It includes the single-byte EBCDIK and
double-byte JIS character sets. The JEF character set
differs from all other character sets that DataStage
NLS supports, in that it uses a pair of shift characters
to toggle between single-byte and double-byte
encoding.
input map table Mapping tables used to define byte sequences that are
valid only on input. They are used to define deadkey
characters.
internal character set The character set that DataStage uses to store and
manipulate data. See also external character set and
Unicode.
locale The language, character set, and data formatting
conventions used by a group of people. In DataStage,
a locale comprises a set of conventions in specific
categories (Time, Numeric, Monetary, Ctype, and
Collate). See also territory.
main map table The main table that defines how a character set is
mapped between the internal and external character
sets.
national conventions A standard set of rules that defines how certain data
types such as numbers and dates are used in a terri-
tory.
National Language See NLS.
Support (NLS)
NLS A program’s ability to use any languages, data
formatting rules, or character sets, that are required
by its users all over the world. Also referred to as
internationalization.
single-byte character A character set whose code points have values 0
set through 255, and can therefore be represented by a
single byte. Single-byte character sets are suitable for
some European, American, and Middle Eastern
languages. See also double-byte character set.
territory The area or region where a locale is used. This may
correspond to a geographical location, such as a
Glossary-3
Glossary-4 Ascential DataStageNLS Guide
Symbols blocks, see Unicode: blocks
building
!GETPU subroutine 5-8, C-4 locales 6-4
&DEVICE& file 2-6 maps 6-3
@ function 5-7 BYTE function 5-14
Numerics C
7-bit ASCII 1-2, 5-11 case hook function B-8
8-bit EBCDIC 5-11 case inversion 5-18
case weight 4-23
A Categories menu 6-4
categories, see locale categories
accent weight 4-23 changing locale setting 4-28, 5-19
accounts CHAR function 5-11, 5-14
updating 2-8 character sets 1-1, 1-2, 3-1
adding characters to maps 3-10 code points 1-2
alphabetic characters 4-17, 6-2 definition Gl-1
ANALYZE.FILE command 3-11, C-2 mapping between internal and
ASCII function 5-11 external 1-1
ASSIGN command 2-8, C-2 maps 1-3
assigning maps 3-11, 5-6 maps for multibyte 3-9
auxiliary devices, setting maps for 5-7 Unicode 1-3
auxiliary printers, setting maps characters
for 2-7, 5-7 see also Unicode characters
AUXMAP statement 5-7, C-3 alphabetic 4-17, 6-2
defining in Unicode 3-10
B listing Unicode block 6-2
nonprinting 6-2
base maps 3-2 radix 1-4
definition Gl-1 7-bit ASCII 1-2
BASIC storing 1-2
and locales 5-19 Characters menu 6-1
and multinational characters 5-11 client programs
determining display length 5-3 code page 2-10
determining string length 5-3 code page 2-10
functions and statements C-3 code point 1-2, 3-1, 5-12
and maps 5-5 definition Gl-1
BASIC command C-2 Collate category 2-5, 4-1, 4-19
block characters definition 1-6
listing 6-2 Collate records 4-19
block size 5-5 collating
Index-1
accented sorts 4-20 conversion codes 5-14
considering case 4-20 conversions, ASCII and EBCDIC 5-11
contractions and expansions 4-24 converting
in DataStage 4-23 lowercase to uppercase 6-2
issues 4-22 uppercase to lowercase 6-2
compare hook function B-9 converting strings 5-14
compiling COPY command 5-17, C-2
locales 6-5 CP command 5-17, C-2
maps 6-5 CREATE.FILE command 3-11, C-2
configurable parameters creating
editing 6-4 conventions 4-4
NLSDEFDEVMAP 2-2, 2-8 locale records 6-3
NLSDEFDIRMAP 2-2, 2-7 locales 4-4
NLSDEFFILEMAP 2-2, 2-7 map descriptions 3-5
NLSDEFGCIMAP 2-2 map tables 3-7, 6-3
NLSDEFPTRMAP 2-2, 2-5 maps 3-5
NLSDEFSEQMAP 2-2 new maps 3-3
NLSDEFSRVLC 2-2, 2-10 cross-referencing
NLSDEFSRVMAP 2-3, 2-10 locales 6-3
NLSDEFTERMMAP 2-3, 2-5, 2-7 map tables 6-3
NLSLCDEF 2-3 CT command 5-17, C-2
NLSLCMODE 2-3 Ctype category 4-1, 4-16, 6-2
NLSMODE 2-3 definition 1-6
NLSNEWDIRMAP 2-3, 2-7, 3-11 ctype hook function B-11
NLSNEWFILEMAP 2-3, 2-7, 3-11 Ctype records 4-16
NLSOSMAP 2-3 currency symbols
NLSREADELSE 2-3, 5-9 international 4-12
NLSWRITEELSE 2-4 local 4-12
setting 2-1
table of 2-2 D
configuring
locales 6-5 DataStage BASIC, see BASIC
maps 2-4, 6-4 DataStage commands C-1
NLS by language 6-5 DataStage NLS, see NLS
convention DataStage Resource service 2-5
definition 4-1 DataStage server engine account direc-
convention records 4-5–4-22 tory, see server engine account
conventions 4-2, 4-3 directory
creating 4-4 deadkey characters
national 1-3, 1-4, 1-4–1-6 and case inversion 5-18
viewing 4-4 definition 3-2, Gl-1
conventions, documentation 1-viii deadkey tables 3-2
Index-3
NLS.MAP.LISTING 3-8, 3-9 I
NLS.MAP.TABLES 2-5, 3-1, 3-7,
6-3, A-4 ICONV function 5-14, C-3
NLS.WT.LOOKUP 4-25, 6-4, A-4 iconv hook function B-15
NLS.WT.TABLES A-4 ideographic area (Unicode) 6-2
type 19 4-25 include files
unmapping 3-12 GETPU.H 5-8
uvconfig 2-2, 2-4, 2-6, 6-4, 6-5 UVNLS.H 5-2
UVNLS.H 5-2 INDEX function 5-3
FINFO$NLSMAP value of the INPUT @ statement 5-4
FILEINFO function 5-6 input map table, definition Gl-2
FMTDP function 5-4, C-3 input maps 3-2
FMTSDP function 5-4, C-3 INPUTDP statement 5-4, C-3
FOLDDP function C-3 inputting
FOOTING statement 5-3, C-3 display positions 5-4
format hook function B-14 system delimiters 5-14
formatting strings in display Unicode values 5-13
positions 5-4 INSERT function 5-3
functions Installation menu 6-4
hook B-7 installing
maps 3-8, 6-4
internal character sets 1-1, 1-2, 3-1
G definition Gl-2
GET.FILE.MAP command 3-11, C-1 ISO 10646 standard 1-2
GET.LOCALE command 4-27, C-1 ISO 4217 standard 4-13
GET.TERM.TYPE command 2-8, C-2 item mark 3-8, 3-10
GETLOCALE function 5-19, C-3
GETPU.H include file 5-8 J
Gregorian calendar 4-7
Japanese Imperial Era 4-7
JEF character set
H definition Gl-2
HEADING statement 5-3, C-3
HEX option 5-17 L
hexadecimal values, displaying
records in 5-17 LEN function 5-3
hooks LENDP function 5-4, C-3
functions B-7 lendp hook function B-16
memory management B-4 LENSDP function 5-4, C-4
national convention B-1 LIST.LOCALES command C-1
using in DataStage B-4 LIST.MAPS command 3-11, C-1
listing
Index-5
creating 3-3, 3-5 menus
creating descriptions 3-5 Categories 6-4
deleting 6-5 Characters 6-1
determining current 3-11, 5-5, 5-6 Installation 6-4
determining for printers 5-8 Locales 6-3
and devices 2-6 Mappings 6-3
editing 3-5 Maps 6-4
and existing files 2-7 NLS Administration 6-1
and external character set 3-11 Unicode 6-1
for multibyte character sets 3-9 MNEMONICS map 6-2
getting name 3-11 modifying maps 3-8
how they work 3-1 Monetary category 4-1, 4-12
input 3-2 definition 1-5
installing in shared memory 3-8, Monetary records 4-12
6-4 moving
listing 3-11, 6-3 locales 2-5
listing built 6-5 maps 2-5
listing installed 6-5 MU0C conversion 5-15
main 3-1 multibyte character sets 3-9
MNEMONICS 6-2 multibyte characters
modifying 3-8, 3-11 and REMOVE pointer 5-5
moving 2-5 multibyte Windows NT systems 2-10
naming conventions 3-4 multinational characters
and new files 3-11 in BASIC 5-11
NLS map configuration editing 5-12
program 6-4
overview 1-3 N
and sequential files 3-11
setting default 2-4 national convention
single-byte 3-9 definition 4-1
for source files 5-6 national conventions 1-3, 1-4, 1-4–1-6,
supplied with DataStage 1-3, C-4 4-2, 4-3
and tape devices 2-8 definition Gl-2
and terminals 2-7 hooks B-1
for UNIX pipes 5-8 National Language Support, see NLS
unmapping a file 3-12 NLS
Maps menu 6-4 configurable parameters 2-2
mask, inputting display positions configuring by language 6-5
through 5-4 definition Gl-2
match hook function B-12 enabling 1-3
MATCHES function 5-3 updating accounts 2-8
memory management and hooks B-4 NLS Administration menu 6-1
Index-7
P SET.TERM.TYPE command 2-7, C-2
SETLOCALE function 2-6, 4-28, 5-19,
PICK flavor 5-17 C-4
PRINT statement 5-3 SETPTR command C-2
printing mapped data 5-7 SETPTR statement 5-3
Private Use area 3-10 SETREM statement 5-5
PTERM command 5-18 setting
configurable parameters 2-1
Q default locales 2-4
default maps 2-4
quick reference, NLS commands C-1– initial locale 2-6
C-7 maps for auxiliary printers 2-7, 5-7
maps for devices 5-7
R maps for tape devices 2-8
shared memory 3-1
radix character 1-4, 4-12 installing maps in 6-4
READ statement 5-3, 5-10 shared weight 4-23
READBLK statement 5-4 single-byte character set 3-4
record IDs, length of 5-3 definition Gl-2
REMOVE single-byte maps 3-9
function 5-3 soundex hook function B-17
pointer 5-5 source code 5-11
REPLACE function 5-3 STATUS function 5-15, C-4
RESTORE.LOCALE command 4-27, storing characters 1-2
C-1 strings
restoring locales 4-27 converting 5-14
determining length 5-3
S formatting in display positions 5-4
and multinational characters 5-11
SAVE.LOCALE command 4-27, C-1 subvalue mark 3-8, 3-10
saving locales 4-27 suppressing zeros 4-11
secondary indexes, maximum charac- system delimiters 3-9, 3-10, 5-18
ters in 5-3 inputting 5-13
SEQ function 5-14 in string variables 5-3
sequential I/O 3-11, 5-8 SYSTEM function 5-2, 5-7, C-4
server engine account directory 1-3
SET.FILE.MAP command 2-7, 3-11,
C-1 T
SET.GCI.MAP command C-2 T.ATT command C-2
SET.LOCALE command 2-6, 6-5, C-2 tape devices, setting maps for 2-8
SET.SEQ.MAP command 3-11, 5-8, TERM command 2-8, C-2
C-2 terminfo file 2-7, 5-7
Index-9
W
weight tables 2-5
editing 4-25
weights
calculating 4-26
shared 4-23
Windows NT
multibyte systems 2-10
WRITE statement 5-3, 5-9
WRITEBLK statement 5-4
Z
zeros, suppressing in numeric
formats 4-11