CDATA: Difference between revisions

Content deleted Content added

Inline

Revision as of 10:09, 10 October 2021

The term CDATA, meaning character data, is used for distinct, but related, purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general character data, rather than non-character data or character data with a more specific, limited structure.

CDATA sections in XML

In an XML document or external entity, a CDATA section is a piece of element content that is marked up to be interpreted literally, as textual data, not as marked up content.^[1] A CDATA section is merely an alternative syntax for expressing character data; there is no semantic difference between character data in a CDATA section and character data in standard syntax where, for example, "<" and "&" are represented by "<" and "&", respectively.

Use of CDATA in program output

CDATA sections in XHTML documents are liable to be parsed differently by web browsers if they render the document as HTML, since HTML parsers do not recognise the CDATA start and end markers, nor do they recognise HTML entity references such as < within <script> tags. This can cause rendering problems in web browsers and can lead to cross-site scripting vulnerabilities if used to display data from untrusted sources, since the two kinds of parser will disagree on where the CDATA section ends.

Since it is useful to be able to use less-than signs (<) and ampersands (&) in web page scripts, and to a lesser extent styles, without having to remember to escape them, it is common to use CDATA markers around the text of inline <script> and <style> elements in XHTML documents. But so that the document can also be parsed by HTML parsers, which do not recognise the CDATA markers, the CDATA markers are usually commented-out, as in this JavaScript example:

<script type="text/javascript">
//<![CDATA[
document.write("<");
//]]>
</script>

or this CSS example:

<style type="text/css">
/*<![CDATA[*/
body { background-image: url("marble.png?width=300&height=300") }     
/*]]>*/
</style>

This technique is only necessary when using inline scripts and stylesheets, and is language-specific. CSS stylesheets, for example, only support the second style of commenting-out (/* ... */), but CSS also has less need for the < and & characters than JavaScript and so less need for explicit CDATA markers.

CDATA in DTDs

CDATA-type attribute value

In Document Type Definition (DTD) files for SGML and XML, an attribute value may be designated as being of type CDATA: arbitrary character data. Within a CDATA-type attribute, character and entity reference markup is allowed and will be processed when the document is read.

For example, if an XML DTD contains

<!ATTLIST foo a CDATA #IMPLIED>

it means that elements named foo may optionally have an attribute named "a" which is of type CDATA. In an XML document that is valid according to this DTD, an element like this might appear:

<foo a="1 &amp; 2 are &lt; &#51; &#x0A;" />

and an XML parser would interpret the "a" attribute's value as being the character data "1 & 2 are < 3".

CDATA-type entity

An SGML or XML DTD may also include entity declarations in which the token CDATA is used to indicate that entity consists of character data. The character data may appear within the declaration itself or may be available externally, referenced by a URI. In either case, character reference and parameter entity reference markup is allowed in the entity, and will be processed as such when it is read.

<DISPLAY_NAME Attribute="Y"><![CDATA[PFTEST0__COUNTER_6__:4:199:, PFTEST0__COUNTER_7__:4:199:]]></DISPLAY_NAME>

<SVLOBJECT><LONG name="" val="" INTEGER name="" val="" LONG name="" val=""/></SVLOBJECT>

Criticism

In XML context the CDATA is not a first-class citizen, because has no structure and degrades its interoperability power.

Is important avoid the indiscriminate use of CDATA in XML. Some softwares and XML editors, to be "XML compliant", transforms all content into CDATA, rather than a fraction — the original CDATA intention is to be used only in "a certain portion of the document".

UTF-8 is the XML's recommended charset, and since the 2010s the massive UTF-8 adoption allowed to abandon the indiscriminate use of CDATA.

External links

[1] CDATA Sections

[1]

@@ Line 53: / Line 53: @@
 <SVLOBJECT><LONG name="" val="" INTEGER name="" val="" LONG name="" val=""/></SVLOBJECT>
 </syntaxhighlight>
+== Criticism  ==
+In XML context the CDATA is not a [[first-class citizen]], because has no structure and degrades its interoperability power.
+Is important avoid the indiscriminate use of CDATA in XML. Some softwares and XML editors, to be  "XML compliant", transforms all content into CDATA, rather than a fraction &mdash; the original CDATA intention is to be used only in ''"a certain portion of the document"''.
+[[UTF-8]] is the XML's  recommended charset, and since the 2010s the [[UTF-8#Adoption|massive UTF-8 adoption]] allowed to abandon the indiscriminate use  of CDATA.
 ==See also==