This document outlines our requirements for a web query language for these products.
Example 1The metadata in the repositories will conform to a variety of domain specific metadata standards. We want the ability to be explicit about the origin of the metadata element being queried and the structure of our query values.
Find all records with Creator equal to "J. Smith" and Date after "January 1997".
Example 2We also would like to be able to specify the metadata fields to return in results and the number of records returned.
Find records with VCARD Name equal to "J. Smith" AND with ISO 8601 encoded Dublin Core Date after "19970701".
Example 3Some metadata has nested structure. For example, metadata describing a film may contain metadata describing sequences within the film. The sequence metadata may contain metadata describing individual scenes and so on. The query language should support queries on metadata with nested structure:
Find records with VCARD Name equal to "J. Smith". Return the Dublin Core Date, Dublin Core Description and Dublin Core Identifier of the first 20 records.
Example 4Some information communities use distributed search engines to simultaneously query existing heterogeneous information sources. Such applications are enhanced if it is possible to dynamically discover the schema of the underlying information sources.
Return any Dublin Core Descriptions for the first, second and third MPEG7 Scenes from movies with Dublin Core Creator "Martin Scorsese"
Example 5
What query attributes does this repository support?
Multiple attribute sets. Different communities will require their own sets of attributes. For this reason, the query language must be flexible enough to allow attributes from different communities. The query language and attribute sets should be able to be developed separately. That is, the W3C should develop the query infrastructure and information communities should develop the attribute sets they require.
Sharing of Attributes. Communities will not want to reinvent the wheel every time they need a new attribute. Attributes must be able to be shared between communities. An important part of sharing is the identification of the origin and definition of the attributes in a query.
Identifying the source of attributes also allows attributes from different communities to be mapped. For example, an application can know that Dublin Core Creator is the same as GILS Author and map a Dublin Core query onto a GILS database.
Attribute Categories. Attributes tell the server how to interpret the values given in the query. There are a number of categories of attributes that an information community may wish to define. For example
Interoperability and Extensibility. A number of us have the dream that one day there will be a "Lowest Common Denominator" or "Cross Domain" attribute set that every metadata repository supports. This allows a base level of interoperability across metadata repositories.
Information communities should obviously be allowed to extend on this base set of attributes for their private use.
Discovery of Attributes. It should be possible to discover the attributes (and possibly attribute definitions) being used by a metadata repository. This enhances interoperability by allowing an information client to configure itself to query newly discovered metadata repositories.
Ease of Implementation. It should be easy to implement a search engine supporting the query language.
The DSTC recommends that the query language use HTTP as the transport mechanism and that the syntax of returned metadata records should be based on XML, possibly in RDF format.
Security/Authentication. Some customers require secure or authenticated access to their data or subsets of their data. The query infrastructure should support this.
Specification of Returned Results. Including the specification of result format and fields, and the size of the result set to be returned.
Internationalisation. The query infrastructure should support queries and results described using the Unicode character set. Additionally, the query infrastructure should be able to identify the language of the query values and returned records.
The new Z39.50 Attribute Architecture provides a (non-web) infrastructure for supporting most of our requirements.
The Stanford STARTS project examined using HTTP/CGI as the transport for Z39.50 queries.
DSTC Pty Ltd
Resource Discovery Unit
Research Data Network CRC
Level 7, General Purpose South Building, The University of Queensland, Qld
4072, Australia.
Last modified: Wed Nov 18 15:12:12 EST 1998