Sabyasachi Moitra

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

XML

Sabyasachi Moitra
[email protected]
What is XML?
Extensible Markup Language.
A markup language much like HTML.
Designed to carry and store data, but not to display data (like HTML).
XML tags are not predefined. Programmers must define their own tags.
Information wrapped in tags.
Designed to be self-descriptive.
XML is not a replacement for HTML.

WEB TECHNOLOGY 2
Example
<employee>
<firstname>arjun</firstname>
<lastname>mondal</lastname>
<empno>123456</empno>
<phno>03322811280</phno>
</employee>

WEB TECHNOLOGY 3
XML Separates Data from
HTML
If you need to display dynamic data in your HTML document, it will take a lot of
work to edit the HTML each time the data changes.
With XML, data can be stored in separate XML files. This way you can
concentrate on using HTML/CSS for display and layout, and be sure that
changes in the underlying data will not require any changes to the HTML.
With a few lines of JavaScript code, you can read an external XML file and
update the data content of your web page.

WEB TECHNOLOGY 4
What is an XML Element?
An XML element is everything from (including) the element's start tag to
(including) the element's end tag.
An element may contain:
• other elements
• text
• attributes
• or, a mix of all of the above

WEB TECHNOLOGY 5
XML Naming Rules
XML elements must follow the below mentioned naming rules:
• Names may contain letters, numbers, and other characters
• Names cannot start with a number or punctuation character
• Names cannot start with the letters xml (or XML, or Xml, etc)
• Names cannot contain spaces
Any name can be used, no words are reserved.

WEB TECHNOLOGY 6
XML Elements vs. Attributes
XML Elements XML Attributes
<student> <student course="MSC">
<course>MSC</course> <firstname>AAAA</firstname>
<firstname>AAAA</firstname> <lastname>BBBB</lastname>
<lastname>BBBB</lastname> </student>
</student>
In the above example course is an In the above example course is an
element. attribute.

• There are no rules about when to use elements or when to use


attributes.
• Attributes are handy in HTML. In XML try to avoid attributes.

WEB TECHNOLOGY 7
Well Formed XML Documents
A Well Formed XML document has correct XML syntax:
• XML documents must have a root element.
• XML elements must have a closing tag.
• XML tags are case sensitive.
• XML elements must be properly nested.
• XML attribute values must be quoted.
Errors in XML documents will stop the XML applications.

WEB TECHNOLOGY 8
Valid XML Documents
A Valid XML document is a Well Formed XML document, which also
conforms to the rules of a Document Type Definition (DTD):

<?xml version="1.0" encoding="UTF-8"?>


<!DOCTYPE employee SYSTEM "Record.dtd">
<employee>
<firstname>arjun</firstname>
<lastname>mondal</lastname>
<empno>123456</empno>
<phno>03322821280</phno>
</employee>

The DOCTYPE declaration, in the example above, is a reference to an


external DTD file. The content of the file is shown in the next slide.

WEB TECHNOLOGY 9
XML DTD
The purpose of a DTD is to define the structure of an XML document. It
defines the structure with a list of legal elements.

Record.dtd
<!DOCTYPE employee
[
<!ELEMENT employee (firstname,lastname,empno,phno)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT empno (#PCDATA)>
<!ELEMENT phno (#PCDATA)>
]>

WEB TECHNOLOGY 10
Document Type Definition
(DTD)
Defines the legal building blocks of an XML document.
Defines the document structure with a list of legal elements and
attributes.
Can be declared inline inside an XML document, or as an external
reference (shown in the previous slides).

WEB TECHNOLOGY 11
Internal DTD
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>

WEB TECHNOLOGY 12
Internal DTD (contd…)
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>

WEB TECHNOLOGY 13
Internal DTD (contd…)
The DTD is interpreted as:
• !DOCTYPE note defines that the root element of this document is note.
• !ELEMENT note defines that the note element contains four elements: "to, from,
heading, body“.
• !ELEMENT to defines the to element to be of type "#PCDATA".
• !ELEMENT from defines the from element to be of type "#PCDATA".
• !ELEMENT heading defines the heading element to be of type "#PCDATA".
• !ELEMENT body defines the body element to be of type "#PCDATA".

#PCDATA means parse-able text data (discussed later).

WEB TECHNOLOGY 14
Why Use a DTD?
With a DTD, each of your XML files can carry a description of its own
format.
With a DTD, independent groups of people can agree to use a standard
DTD for interchanging data.
Your application can use a standard DTD to verify that the data you
receive from the outside world is valid.
You can also use a DTD to verify your own data.

WEB TECHNOLOGY 15
The Building Blocks of XML
Documents
Seen from a DTD point of view, all XML documents (and HTML
documents) are made up by the following building blocks:
 Elements
 Attributes
 Entities
 PCDATA
 CDATA

WEB TECHNOLOGY 16
Elements
Elements are the main building blocks of both XML and HTML
documents.

XML Elements HTML Elements


<student> <body>
<course>MSC</course> <p>MSC<br>
<firstname>AAAA</firstname> AAAA</br>
<lastname>BBBB</lastname> BBBB</p>
</student> </body>
Elements can contain text, other Elements can contain text, other
elements, but cannot be empty. elements, or be empty.

WEB TECHNOLOGY 17
Attributes
Attributes provide additional information about an element.
Attributes are always specified in the start tag.
Attributes usually come in name/value pairs like: name="value".

XML Attributes HTML Attributes


<student course="MSC"> <body bgcolor="aqua">
<firstname>AAAA</firstname> <p>MSC<br>
<lastname>BBBB</lastname> AAAA</br>
</student> BBBB</p>
</body>

WEB TECHNOLOGY 18
Entities
Some characters have a special meaning in XML, like the less than sign (<) that defines the
start of an XML tag.
&nbsp :- no-breaking-space entity is used in HTML to insert an extra space in a document.
Entities are expanded when a document is parsed by an XML parser.
An entity has three parts: an ampersand (&), an entity name, and a semicolon (;).
The following entities are predefined in XML:

Entity References Character


&lt; <
&gt; >
&amp; &
&quot; "
&apos; '

WEB TECHNOLOGY 19
Example
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE note [
<!ENTITY nbsp "&#xA0;">
<!ENTITY writer "Writer: Donald Duck.">
<!ENTITY copyright "Copyright: W3Schools.">
]>

<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
<footer>&writer;&nbsp;&copyright;</footer>
</note>

WEB TECHNOLOGY 20
Output

WEB TECHNOLOGY 21
PCDATA
Parsed Character Data
Used to designate mixed content XML elements.
Think of character data as the text found between the start tag and the
end tag of an XML element.
PCDATA is text that WILL be parsed by a parser. The text will be
examined by the parser for entities and markup.
Tags inside the text will be treated as markup and entities will be
expanded.
However, parsed character data should not contain any &, <, or >
characters; these need to be represented by the &amp; &lt; and &gt;
entities, respectively.

WEB TECHNOLOGY 22
CDATA
Character Data
CDATA is text that will NOT be parsed by a parser.
Tags inside the text will NOT be treated as markup and entities will not
be expanded, rather all will be treated / interpreted as characters.
If the numeric character reference &#240; appears in element content,
it will be interpreted as the single Unicode character 00F0 (small
letter eth). But if the same appears in a CDATA section, it will be parsed
as six characters: ampersand (&), hash mark (#), digit 2 (2), digit 4 (4),
digit 0 (0), semicolon (;).

WEB TECHNOLOGY 23
Declaring Attributes
Type Description
CDATA The value is character data
The value must be one from an
(en1|en2|..)
enumerated list
ID The value is a unique id
IDREF The value is the id of another element
IDREFS The value is a list of other ids
NMTOKEN The value is a valid XML name
NMTOKENS The value is a list of valid XML names
ENTITY The value is an entity
ENTITIES The value is a list of entities
NOTATION The value is a name of a notation

WEB TECHNOLOGY 24
Attribute Values
Value Explanation
value The default value of the attribute
#REQUIRED The attribute is required
#IMPLIED The attribute is optional
#FIXED The attribute value is fixed

WEB TECHNOLOGY 25
References
Courtesy of W3Schools – XML Tutorial. URL:
http://www.w3schools.com/xml/
Courtesy of TutorialsPoint – XML Tutorial. URL:
http://www.tutorialspoint.com/xml/

WEB TECHNOLOGY 26

You might also like