XML: An Introduction XML: An Introduction: by Chitrakant Banchhor by Chitrakant Banchhor
XML: An Introduction XML: An Introduction: by Chitrakant Banchhor by Chitrakant Banchhor
XML: An Introduction XML: An Introduction: by Chitrakant Banchhor by Chitrakant Banchhor
By
CHITRAKANT BANCHHOR
References
1. Robert Henderson, XML made simple, Elsevier .
2. XML in a Nutshell, Oreily.
3. Compiled from XML tutorial from w3schools.com
What is XML?
Background to XML
• Markup: the ASCII system works for unformatted text files such as
computer programs, but is not adequate for transmitting documents with
typographical information.
• In the late 1960s Charls Goldfab and his team of IBM researchers were
studying electronic document system and created a comprenshive language
that described text formatting and markup.
• This was known as GML (Generalized Markup Language), and also the initials
of its creators (Goldfarb, Mosher, Lorie).
• The first working draft of the new Standard Generalized Markup Language
(SGML), was released in 1980, and finally accepted by the ISO in 1986.
• SGML is a metalanguage, that is, a language used to describe the structure of
another language.
• All of the markup languages that we use today are in essence the subsets of
SGML .
The XML
• The SGML is comprensive and powerful, while HTML is simple to implement
and a standard recognized by over 100 million web users.
• But each has major disadvantages – SGML’s complexity is not needed for
typical applications, while HTML’s lack of data description renders it impotent
for data processing.
• There is no such thing as an XML compiler that reads XML files and produces
executable code.
2. XML is not a network transport protocol
• Data sent across the network using HTTP, FTP, NFS, or some other protocol
might happen to be encoded in an XML format, but again there has to be some
software outside the XML document that actually does the sending.
3. XML is not a DBMS (database)
ok
Passes the contents piece by piece to
the application Divides into different pieces
elements attributes
…..
• The application that receives data from the parser may be:
1. A web browser.
2. A word processor such as StarOffice Writer that loads the XML document for editing.
3. A database such as Microsoft SQL Server that stores the XML data in a new record.
4. A Java program, C program, Python program.
XML Fundamentals
XML Documents
• It can be opened with any program that knows how to read a text file.
<person>
Harry Jacks
</person>
person.xml
• XML parser about document: the parser can concern this as:
• person.txt,
• Simply person, or
• there's some XML in this here file!
• Underlying operating system may or may not like these names, but an XML
parser won't care.
• XML document in data base: The document might not even be in a file at all.
It could be a record or a field in a database.
• Stored in more files: It could even be stored in more than one file.
Elements, Tags, and Character Data
Element <person>
Harry Jacks Element’s content
</person>
person.xml
Tag Syntax
• Start
Start--tags begin with < and end
end--tags begin with </
</.
• Both of these are followed by the name of the element and are closed by >.
• Case sensitivity: XML, unlike HTML, is case sensitive. <Person> is not the
same as <PERSON> is not the same as <person>.
XML Trees
• XML documents form a tree structure that starts at “the root" and branches
to "the leaves".
<message>
<to>Minu</to>
<from>Rajat</from>
<heading>Remainder</ heading >
<body> Do not forget me this week end</body>
</message >
• The first line describes the root element of the document (like saying: "this
document is a message"): <message>
• And finally the last line defines the end of the root element: </message>
XML Documents Form a Tree Structure
• XML documents must contain a root element. This element is "the parent" of
all other elements.
• The elements in an XML document form a document tree. The tree starts at
the root and branches to the lowest level of the tree.
<root>
<child>
<subchild>………….</subchild>
<child>
</root>
Attributes
• XML elements can have attributes.
• Names are separated from values by an equals sign and optional whitespace.
<person>
<name first=“Rajat” last=“Verma”/>
<profession value=“Computer Scientist” />
</ person >
XML Elements
• An XML element is everything from (including) the element's start tag to
(including) the element's end tag.
• Names should be short and simple, like this: <book_title> not like this:
<the_title_of_the_book>.
XML Syntax Rules
1. All XML Elements Must Have a Closing Tag
• XML tags are case sensitive: tag <Letter> is different from the tag <letter>.
• Opening and closing tags must be written with the same case.
3. XML Elements Must be Properly Nested
• XML documents must contain one element that is the parent of all other
elements. This element is called the root element.
<root>
<child>
<subchild>………….</subchild>
<child>
</root>
5. XML Attribute Values Must be Quoted
• XML elements can have attributes in (name, value) pairs just like in HTML.
• In XML the attribute value must always be quoted. The first one is incorrect,
the second is correct:
• "<" inside an XML element generates an error because the parser interprets
it as the start of a new element.
• To avoid this error, replace the "<" character with an entity reference:
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than
character is legal, but it is a good habit to replace it.
7. Comments in XML
• The syntax for writing comments in XML is similar to that of HTML:
• <!
<!--
-- This is a comment --
-->>
8. White-
White-space is Preserved in XML
<person sex=“female”>
<first-name>Alma</first-name> 1
<last-name>Chikana</last-name>
</person>
<person>
<sex>female</sex>
<first-name>Alma</first-name> 2
<last-name>Chikana</last-name>
</person>
Preferable approach
• The following three XML documents contain exactly the same information:
<message date=“12/12/2005”>
<to>Meenu</to>
<from>Rajat</from> 1
<heading>Remainder</ heading >
<body>Don’t forget me this weekend!</body>
</message>
<message>
<date>12/12/2005</date>
<to>Meenu</to> 2
<from>Rajat</from>
<heading>Remainder</ heading >
<body>Don’t forget me this weekend!</body>
</message>
<message>
<date>
<day>12</day>
<month>12</month>
<year>2005</year>
</date> 3
<to>Meenu</to>
<from>Rajat</from>
<heading>Remainder</ heading >
<body>Don’t forget me this weekend!</body>
</message>
• Avoid XML Attributes?
• These IDs can be used to identify XML elements in much the same way as the
ID attribute in HTML.
• Metadata (data about data) should be stored as attributes, and that data
itself should be stored as elements.
Thank You!