Stefano Ceri, Sara Comai, Ernesto Damiani, Piero Fraternali,
Stefano Paraboschi, Letizia Tanca
Email: ceri,comai,fraterna,parabosc,tanca@elet.polimi.it, edamiani@crema.unimi.it
Politecnico di Milano, Dipartimento di Elettronica e
Informazione
Università di Milano, Polo di Crema
We present XML-GL, a graphical query language for XML documents. XML-GL derives from GLog [1,2] a general purpose, logic- and graph-based language for querying structured and semi-structured data. Here we list a number of interesting requirements for a query language for XML documents, and give a few examples of XML-GL features addressing such requirements. 1
The use of XML will bring a major change in the structure of Web information, which will become more and more a collection of semistructured objects, i.e., pieces of content for which at least a partial representation of structure (known as schema in the database lexicon) is available. This evolution brings forth the necessity of novel languages for extracting information from XML content, much in the same way as traditional query languages (notably SQL) have been used for extracting information from structured data, e.g., relational database.
We believe that the definition of a query language for XML must be associated with the study and production of representation tools for metadata. Indeed, an XML document can be compliant to a Document Type Definition (DTD), that specifies the types of markup elements that can appear in the document, their attributes and their containment relationships. If an XML document adheres to a DTD, it is said to be valid. On the other hand, XML documents need not be valid; if they lack a DTD but respect some syntactic rules for tag placement they are said to be well-formed. The notion of DTD in XML well lends itself to be interpreted as a schema. Schema availability is very useful for query formulation as well as for query optimization; thus, we will list in the sequel requirements both for a query language and for a metadata representation language.
The specific novelty of XML-GL with respect to other proposals for querying XML documents, like XML-QL and XSL, is that queries are formulated visually using a graph-based formalism close to the structure of XML documents (e.g., comparable to the visual representations of XML documents offered by XML authoring tools like the Near & Far XML editor). However, XML-GL is not a visual interface over a conventional, textual query language, but a graph-based query language with both its syntax and semantics defined in terms of graph structures and operations.
Well-formed documents without a DTD can be queried in XML-GL in the same way as valid documents, thus addressing our first requirement. However, since no information about the structure of the document is available at query formulation time, querying of documents without DTD might require flexible interpretations of results. According to our second and third requirements, an XML query is addressed to all the documents in a site, and can be used for querying the DTD.
To present XML-GDM, we introduce an example of a DTD and an XML document conforming to it. Figure 1 represents a DTD which specifies the structure of documents containing book orders, together with the corresponding XML-GDM schema. A possible XML document conforming to this DTD is shown in Figure 2.
<!ELEMENT order (shipto, contact?, item+, date)>
<!ATTLIST order number PCDATA #REQUIRED>
<!ELEMENT shipto (fulladdress|reference)>
<!ELEMENT contact (reference|PCDATA)>
<!ELEMENT fulladdress (company, addressline+, city)>
<!ELEMENT reference EMPTY>
<!ATTLIST reference customer IDREF>
<!ELEMENT person (firstname?,lastname,fulladdress)>
<!ATTLIST person id ID>
<!ELEMENT company PCDATA>
<!ELEMENT addressline PCDATA>
<!ELEMENT city PCDATA>
<!ELEMENT date (day, month, year)>
<!ELEMENT day PCDATA>
<!ELEMENT month PCDATA>
<!ELEMENT year PCDATA>
<!ELEMENT item (book, quantity, discount?)>
<!ELEMENT book (isbn,title?,price)>
<!ELEMENT isbn PCDATA>
<!ELEMENT title PCDATA>
<!ELEMENT price PCDATA>
<!ELEMENT quantity PCDATA>
<!ELEMENT discount PCDATA>
<?xml version="1.0" standalone="no" encoding="UTF-8"?> <DOCTYPE ORDER SYSTEM "order.dtd"> <ORDER> <SHIPTO><REFERENCE customer="C00001"></REFERENCE></SHIPTO> <CONTACT>Tim Bell</CONTACT> <DATE><DAY>14</DAY><MONTH>11</MONTH><YEAR>1998</YEAR></DATE> <ITEM> <BOOK><ISBN>15536455</ISBN><TITLE>Introduction to XML</TITLE> <PRICE>24.95</PRICE> </BOOK> <QUANTITY>6</QUANTITY> <DISCOUNT>.40</DISCOUNT> </ITEM> <ITEM> <BOOK><ISBN>15532155</ISBN><TITLE>Introduction to Internet</TITLE> <PRICE>22.50</PRICE> </BOOK> <QUANTITY>10</QUANTITY> <DISCOUNT>.42</DISCOUNT> </ITEM> </ORDER> <ORDER> <SHIPTO> <FULLADDRESS><COMPANY>ASA</COMPANY> <ADDRESSLINE>18 Harvard str.</ADDRESSLINE> <CITY>Los Angeles</CITY> </FULLADDRESS> </SHIPTO> <CONTACT><REFERENCE customer="C00002"></REFERENCE></CONTACT> <DATE><DAY>20</DAY><MONTH>11</MONTH><YEAR>1998</YEAR></DATE> <ITEM> <BOOK><ISBN>15536455</ISBN><TITLE>Introduction to XML</TITLE> <PRICE>24.95</PRICE> </BOOK> <QUANTITY>6</QUANTITY> <DISCOUNT>.40</DISCOUNT> </ITEM> <ITEM> <BOOK><ISBN>15532155</ISBN><TITLE>Introduction to Internet</TITLE> <PRICE>22.50</PRICE> </BOOK> <QUANTITY>10</QUANTITY> <DISCOUNT>.42</DISCOUNT> </ITEM> </ORDER> <PERSON id="C00001"> <LASTNAME>Moore</LASTNAME><FULLADDRESS><COMPANY>ABC</COMPANY> <ADDRESSLINE>10 Michigan str.</ADDRESSLINE><CITY>Los Angeles</CITY></FULLADDRESS> </PERSON> <PERSON id="C00002"> <LASTNAME>Andrews</LASTNAME><FULLADDRESS><COMPANY>ASA</COMPANY><ADDRESSLINE>18 Harvard str.</ADDRESSLINE><CITY>Los Angeles</CITY></FULLADDRESS> </PERSON>
An XML-GL query can be applied either to a single XML document or to a set of documents, e.g., those composing a WWW site. The query produces a new XML document as the result. Thus, the application of a query results in a transformation of the source XML document(s) into a new XML document.
An XML-GL query consists of four parts:
Some examples illustrate the main features of XML-GL; the first one shows the simplest form of XML-GL query: the Match-Construct.
Select all the person elements from document orders.xml.
The query is represented in Figure 3. Assuming that orders.xml is the document of Figure 2, the query result is:
<RESULT>
<PERSON id="C00001">
<LASTNAME>Moore</LASTNAME>
<FULLADDRESS>
<COMPANY>ABC</COMPANY>
<ADDRESSLINE>10 Michigan str.</ADDRESSLINE>
<CITY>Los Angeles</CITY>
</FULLADDRESS>
</PERSON>
<PERSON id="C00002">
<LASTNAME>Andrews</LASTNAME>
<FULLADDRESS>
<COMPANY>ASA</COMPANY>
<ADDRESSLINE>18 Harvard str.</ADDRESSLINE>
<CITY>Los Angeles</CITY>
</FULLADDRESS>
</PERSON>
</RESULT>
The query is a pair of related graphs. The graph on the left describes the extract and match components; in this simple example, this graph includes the target object of the query (element person in this case), which may optionally contain the indication of the document or set of documents to which the query is targeted. Target documents may be identified using URLs with wildcards, so that the string ``http://www.elet.polimi.it/ceri/ord*.xml''inside the person box would target the query to all .xml documents of the ceri directory whose name starts with the string ``ord''. Without this identification, the query target is the whole site.
The graph on the right describes the construct and clip parts. In the case of Figure 3, the triangle denotes the simplest form of document construction: all elements found by the match part are inserted one after the other into the predefined result tag.
A more complex query follows:
Select all orders not containing any item priced more than 25$ and shipped to an address in Los Angeles (Figure 4).
The query result is the following:
<RESULT>
<ORDER>
<SHIPTO>
<FULLADDRESS><COMPANY>ASA</COMPANY>
<ADDRESSLINE>18 Harvard str.</ADDRESSLINE>
<CITY>Los Angeles</CITY>
</FULLADDRESS>
</SHIPTO>
<CONTACT><REFERENCE customer="C00002"></REFERENCE></CONTACT>
<DATE><DAY>20</DAY><MONTH>11</MONTH><YEAR>1998</YEAR></DATE>
<ITEM>
<BOOK><ISBN>15536455</ISBN><TITLE>Introduction to XML</TITLE>
<PRICE>24.95</PRICE>
</BOOK>
<QUANTITY>6</QUANTITY>
<DISCOUNT>.40</DISCOUNT>
</ITEM>
<ITEM>
<BOOK><ISBN>15532155</ISBN><TITLE>Introduction to Internet</TITLE>
<PRICE>22.50</PRICE>
</BOOK>
<QUANTITY>10</QUANTITY>
<DISCOUNT>.42</DISCOUNT>
</ITEM>
</ORDER>
</RESULT>
Besides the basic features shown by these examples, XML-GL addresses the remaining requirements by allowing:
Finally, the examples shown already demonstrate that readability is a major feature of XML-GL, since a graph-based formalism is well suited to the representation of hypermedia information.