XML: An Introduction XML: An Introduction: by Chitrakant Banchhor by Chitrakant Banchhor

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

XML: An Introduction

By
CHITRAKANT BANCHHOR
References
1. Robert Henderson, XML made simple, Elsevier .
2. XML in a Nutshell, Oreily.
3. Compiled from XML tutorial from w3schools.com
What is XML?
Background to XML
• Markup: the ASCII system works for unformatted text files such as
computer programs, but is not adequate for transmitting documents with
typographical information.

• To display a newspaper or magazine article we need to know how the text


is spaced, the size of the letters, the look of the font, whether it is in rows
or columns.

• To solve this problems mark up languages were created – making electronic,


the typesetter’s hand – written (marked - up) instructions.
SGML

• In the late 1960s Charls Goldfab and his team of IBM researchers were
studying electronic document system and created a comprenshive language
that described text formatting and markup.

• This was known as GML (Generalized Markup Language), and also the initials
of its creators (Goldfarb, Mosher, Lorie).

• The first working draft of the new Standard Generalized Markup Language
(SGML), was released in 1980, and finally accepted by the ISO in 1986.
• SGML is a metalanguage, that is, a language used to describe the structure of
another language.

• Hence, SGML is not a document language itself but should be seen as a


platform independent basis for building adapted markup languages based on a
common structure.

• All of the markup languages that we use today are in essence the subsets of
SGML .
The XML
• The SGML is comprensive and powerful, while HTML is simple to implement
and a standard recognized by over 100 million web users.

• But each has major disadvantages – SGML’s complexity is not needed for
typical applications, while HTML’s lack of data description renders it impotent
for data processing.

• To overcome these problems, the W3C set to work on creating a new


standard
What XML Is Not ?

1. XML is not a programming language.


2. XML is not a network transport protocol.
3. XML is not a database
1. XML is not a programming language

• There is no such thing as an XML compiler that reads XML files and produces
executable code.
2. XML is not a network transport protocol

• XML doesn't send data across the network.

• Data sent across the network using HTTP, FTP, NFS, or some other protocol
might happen to be encoded in an XML format, but again there has to be some
software outside the XML document that actually does the sending.
3. XML is not a DBMS (database)

• We are not going to replace an Oracle or MySQL server with XML.

• A database can contain XML data, either as a VARCHAR or a BLOB or as some


custom XML data type, but the database itself is not an XML document.
How XML Works ?
• Consider the following example:
Read document

Program which understand


XML documents
XML parser
Parse error

ok
Passes the contents piece by piece to
the application Divides into different pieces

elements attributes

…..
• The application that receives data from the parser may be:

1. A web browser.
2. A word processor such as StarOffice Writer that loads the XML document for editing.
3. A database such as Microsoft SQL Server that stores the XML data in a new record.
4. A Java program, C program, Python program.
XML Fundamentals
XML Documents

• An XML document contains text, never binary data.

• It can be opened with any program that knows how to read a text file.

A very simple yet complete XML document

<person>
Harry Jacks
</person>

person.xml
• XML parser about document: the parser can concern this as:

• person.txt,
• Simply person, or
• there's some XML in this here file!

• Underlying operating system may or may not like these names, but an XML
parser won't care.
• XML document in data base: The document might not even be in a file at all.
It could be a record or a field in a database.

• Generated by a program: It could be generated on the fly by a CGI program in


response to a browser query.

• Stored in more files: It could even be stored in more than one file.
Elements, Tags, and Character Data

Element <person>
Harry Jacks Element’s content
</person>

person.xml
Tag Syntax
• Start
Start--tags begin with < and end
end--tags begin with </
</.

• Both of these are followed by the name of the element and are closed by >.

• User is allowed to make up new XML tags.


• Empty elements: a special syntax for an empty elements, i.e., elements
that have no content. ( < />)
/>

• Such an element can be represented by a single empty-element tag that


begins with < but ends with />
/>.

• Case sensitivity: XML, unlike HTML, is case sensitive. <Person> is not the
same as <PERSON> is not the same as <person>.
XML Trees
• XML documents form a tree structure that starts at “the root" and branches
to "the leaves".

• An Example XML Document

<message>
<to>Minu</to>
<from>Rajat</from>
<heading>Remainder</ heading >
<body> Do not forget me this week end</body>
</message >
• The first line describes the root element of the document (like saying: "this
document is a message"): <message>

• The next 4 lines describe 4 child elements of the root: (<message>


<message>)
• <to>
• <from> <to>Minu</to>
• <heading> <from>Rajat</from>
<heading>Remainder</ heading >
• <body>
<body> Do not forget me this week end</body>

• And finally the last line defines the end of the root element: </message>
XML Documents Form a Tree Structure

• XML documents must contain a root element. This element is "the parent" of
all other elements.

• The elements in an XML document form a document tree. The tree starts at
the root and branches to the lowest level of the tree.

• All elements can have sub elements (child elements):

<root>
<child>
<subchild>………….</subchild>
<child>
</root>
Attributes
• XML elements can have attributes.

• An attribute is a name-value pair attached to the element's start-tag.

• Names are separated from values by an equals sign and optional whitespace.

• Values are enclosed in single or double quotation marks.

<person first_year=“2002” last_year=“2005”>


Rajat Verma
</person>
An XML document that describes a person using attributes

<person>
<name first=“Rajat” last=“Verma”/>
<profession value=“Computer Scientist” />
</ person >
XML Elements
• An XML element is everything from (including) the element's start tag to
(including) the element's end tag.

• An element can contain other elements, simple text or a mixture of both.


Elements can also have attributes.
XML Naming Rules
• XML elements must follow these naming rules:

1. Names can contain letters, numbers, and other characters


2. Names cannot start with a number or punctuation character
3. Names cannot start with the letters xml (or XML, or Xml, etc)
4. Names cannot contain spaces
• Best Naming Practices:

• Make names descriptive. Names with an underscore separator are nice:


<first_name>, <last_name>.

• Names should be short and simple, like this: <book_title> not like this:
<the_title_of_the_book>.
XML Syntax Rules
1. All XML Elements Must Have a Closing Tag

• HTML: we can have elements without closing tag.

<p> This is a paragraph


<p> This is another paragraph

• XML: illegal to omit the closing tags.

<p> This is a paragraph </p>


<p> This is another paragraph </p>
2. XML Tags are Case Sensitive

• XML elements are defined using XML tags.

• XML tags are case sensitive: tag <Letter> is different from the tag <letter>.

• Opening and closing tags must be written with the same case.
3. XML Elements Must be Properly Nested

• In XML, all elements must be properly nested within each other:

• <outer_tag> <inner_tag> </inner_tag> </outer_tag>


4. XML Documents Must Have a Root Element

• XML documents must contain one element that is the parent of all other
elements. This element is called the root element.

<root>
<child>
<subchild>………….</subchild>
<child>
</root>
5. XML Attribute Values Must be Quoted

• XML elements can have attributes in (name, value) pairs just like in HTML.

• In XML the attribute value must always be quoted. The first one is incorrect,
the second is correct:

<message date=12/12/2005> <message date=“12/12/2005”>


<to> Meenu </to> <to> Meenu </to>
<from> Rajat </from> <from> Rajat </from>
</message> </message>
6. Entity References

• Some characters have a special meaning in XML.

<message> if salary < 10,000 then </ message >

• "<" inside an XML element generates an error because the parser interprets
it as the start of a new element.

• To avoid this error, replace the "<" character with an entity reference:

<message> if salary < 10,000 then </ message >


• There are 5 predefined entity references in XML:

&lt < Less than


&gt > Greater than
&amp & Ampersand
&apos ‘ Apostrophe
&quot “ Quotation mark

Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than
character is legal, but it is a good habit to replace it.
7. Comments in XML
• The syntax for writing comments in XML is similar to that of HTML:

• <!
<!--
-- This is a comment --
-->>
8. White-
White-space is Preserved in XML

• HTML truncates multiple white-space characters to one single white-space:

HTML Hello may name is Happoo


Output Hello may name is Happoo

With XML, the white-


white-space in a document is not truncated.
XML Elements vs. Attributes

<person sex=“female”>
<first-name>Alma</first-name> 1
<last-name>Chikana</last-name>
</person>

<person>
<sex>female</sex>
<first-name>Alma</first-name> 2
<last-name>Chikana</last-name>
</person>
Preferable approach

• The following three XML documents contain exactly the same information:

<message date=“12/12/2005”>
<to>Meenu</to>
<from>Rajat</from> 1
<heading>Remainder</ heading >
<body>Don’t forget me this weekend!</body>
</message>
<message>
<date>12/12/2005</date>
<to>Meenu</to> 2
<from>Rajat</from>
<heading>Remainder</ heading >
<body>Don’t forget me this weekend!</body>
</message>

An expanded date element is used in the third: (THIS IS PREFERABLE):

<message>
<date>
<day>12</day>
<month>12</month>
<year>2005</year>
</date> 3
<to>Meenu</to>
<from>Rajat</from>
<heading>Remainder</ heading >
<body>Don’t forget me this weekend!</body>
</message>
• Avoid XML Attributes?

• Some of the problems with using attributes are:

1. attributes cannot contain multiple values (elements can)


2. attributes cannot contain tree structures (elements can)
3. attributes are not easily expandable (for future changes)

• Attributes are difficult to read and maintain.


– Use elements for data.
– Use attributes for information that is not relevant to the data.
XML Attributes for Metadata

• Sometimes ID references are assigned to elements.

• These IDs can be used to identify XML elements in much the same way as the
ID attribute in HTML.

• This example demonstrates this:


• The ID above is just an identifier, to identify the different notes.

• It is not a part of the note itself.

• Metadata (data about data) should be stored as attributes, and that data
itself should be stored as elements.
Thank You!

You might also like