Character Sets and Encodings
Character Sets and Encodings
Character Sets and Encodings
Internationalization
Making the World Wide Web worldwide!
Character Sets
and Encodings
Updated 2009-05-01 09:44
You can find a selection of more detailed articles using the links to the right.
Once you get some ideas from this page, you will probably just use Learn to in‐
ternationalize, or the site search.
WHAT'S IT ABOUT?
A character set is a collection of letters and symbols used in a writing system.
For example, the ASCII character set covers letters and symbols for English text,
ISO-8859-6 covers letters and symbols needed for many languages based on
the Arabic script, and the Unicode character set contains characters for most of
the living languages and scripts in the world.
There are many different character encodings. If the wrong encoding is applied
to the bytes in memory, the result will be unintelligible text. It is therefore impor‐
https://www.w3.org/International/getting-started/characters 1/5
02/11/2024, 10:49 Character Sets and Encodings
tant, if people are to read your content, that you correctly label the character en‐
coding used.
Learn more...
Character encodings for beginners explains some of the basic concepts about
character encodings, and why you should care.
CHOOSING AN ENCODING
Everyone developing content, whether content authors or programmers,
should use the UTF-8 character encoding, unless there are very special reasons
for using something else. (If you decide to not use UTF-8, you must choose one
of the few encodings that are interoperably implemented across all browsers.)
Learn more...
HTML & CSS authors
Choosing and applying a character encoding
Spec authors
Choosing character encodings
Server setup
https://www.w3.org/International/getting-started/characters 2/5
02/11/2024, 10:49 Character Sets and Encodings
You must also ensure that your data is saved in the encoding you have chosen, it
is not sufficient to just label it.
(Note that with XHTML, encoding declarations are not always straightforward;
they require an understanding of 'standards' vs. 'quirks' modes, and the impact of
the XML declaration.)
Content developers and webmasters may also need to ensure that the server
delivers content with the correct character encoding declarations, since server
settings can override in-document declarations.
Learn more...
HTML & CSS authors
Declaring the character encoding for HTML
Declaring the character encoding for a CSS style sheet
Spec developers
Identifying character encodings
Server setup
Setting the HTTP charset parameter
Setting character encoding information using .htaccess
ESCAPES
Escapes are a way of representing a character using only ASCII text. They pro‐
vide a way of representing characters that are not available in the character en‐
coding you are using, or a way of avoiding the use of the character for other rea‐
sons (such as when they may conflict with syntax). You should be clear on when
and how these escapes should be used.
https://www.w3.org/International/getting-started/characters 3/5
02/11/2024, 10:49 Character Sets and Encodings
Learn more...
HTML & CSS authors
Using escapes to represent characters
SVG authors
WEB ADDRESSES
Web addresses can also include non-ASCII characters. The user does little other
than click on the appropriate link or enter the text as they see it, the heavy lifting
is done by the user agent, but you may be interested to know how this works.
Learn more...
HTML & CSS authors
For the history of document changes, see the news feed for substantive
changes, and the Github commit list for all changes since Jan 2016.
https://www.w3.org/International/getting-started/characters 4/5
02/11/2024, 10:49 Character Sets and Encodings
rules apply. Your interactions with this site are in accordance with our
public and Member privacy statements.
https://www.w3.org/International/getting-started/characters 5/5