Metadata Lessons - Ilumina Digital Library
Metadata Lessons - Ilumina Digital Library
Metadata Lessons - Ilumina Digital Library
FROM THE
DIGITAL
iLumina
LIBRARY
They follow from the five-year effort to implement metadata
standards for learning objects in the iLumina digital library of
T
undergraduate teaching resources in science, technology, engineering,
and mathematics education.
ized metadata—essential for enabling distributed to describe digital materials on the Web. We are also
access—is less critical when Web-based resources are aware that metadata standards are evolving and that
full text, since they can be examined directly by search our practical implementation results, coupled with
engines. But fewer and fewer resources on the Web, everyday use, will continue to shape the evolution of
especially those with educational value, are text alone; the standards.
many are composites of data types and formats. Even
for full-text resources, metadata enables users to ILUMINA DIGITAL LIBRARY
browse resources, pursuing smart federated searches. The iLumina digital library (www.ilumina-dlib.org),
The two main metadata schemas and standards funded by the National Science Foundation, con-
used today by digital libraries of educational resources tains undergraduate teaching materials in science,
are Dublin Core (DC) and IEEE Learning Object technology, engineering, and mathematics. Main-
Metadata (LOM) (see Table 1). LOM was released
as IEEE 1484.12.1 in June 2002 [2]. DC was
approved by the American National Standards Insti-
tute in September 2001 as ANSI/NISO Z39.85 and
ratified by the International Standards Organization
in January 2003 as ISO 15836 [5]. DC and LOM
are two approaches to providing standard-
ized metadata. DC takes a minimal
approach, keeping elements simple,
H
perhaps by trading off limited
expressive power. LOM is
structural, offering rich descrip-
tion, perhaps by trading off size
and cataloging effort.
ere, we
share our five
years of experi-
ence with an imple-
mentation of LOM (with
imports/exports of metadata in DC), drawing
general lessons useful to anyone who wants to
understand the practical challenges of using metadata
M
iLumina resources populate 44 different metadata ele- tered as data providers with the The iLumina
home page user
ments (only a subset of the 78 possible unique ele- Open Archives Initiative, which interface for
ments in LOM). Table 2 also includes the mapping of develops and promotes interoper- accessing science
LOM to DC (as suggested in [4]) when importing or ability standards for content dis- and mathematics
educational
exporting data to services based on DC. Simple DC semination (www.openarchives. resources.
is based on a 15-element set. org). It found that of the 15 DC
elements, two—identifier and creator—accounted
any theoretical discussions focus on for almost 50% of element use. Overall, the top seven
individual metadata standards, but DC elements—creator, identifier, title, date, type,
few empirical studies focus Tableon the 1. Examples of Learning
subject, and Technology Initiatives
description—accounted for over 70% of
and their
patterns of use of the standards that the elements used in theMetadata Schema.
records; 50% of the data
would be useful for guiding imple- providers never populated any other elements; and
mentation decisions. Although
Learning Technology Web Site Metadata Schema
many elements require vocabular- Initiative
ies as values for elements, few Educational Network Australia www.edna.edu.au DC
standards are available for using ARIADNE Foundation for the www.ariadne-eu.org LOM
them, and few established policies European Knowledge Pool
are available for selecting them. Advanced Distributed Learning www.adlnet.org Sharable Content Object Reference Model
LOM offers some best-practice Initiative
vocabularies [4], though they are National Science Digital Library www.nsdl.org Requires DC and three selected LOM data
elements, developing crosswalks to other
often provisional, and end users schemas, including LOM
frequently find it necessary to Open Archives Initiative www.openarchives.org Recommends DC and supports other schemas
establish their own terms and tax- including LOM
the least-used elements were accessed only 6% of the LOM No. LOM Element Name DC mapping
1 General
time. In this large data set, the use of DC metadata 1.2 Title dc:title
elements by data providers registered with the OAI 1.3 language dc:language
1.4 description dc:description
was selective and sparse. 1.5 keyword dc:subject
The main mechanisms for interoperability, 1.7 structurei
2 lifecycle
explored in [1], are controlled vocabularies for meta- 2.2 statusi
data, taxonomies for classification, and thesauri and 2.3 contribute
2.3.1 rolei
crosswalks between vocabularies and taxonomies. 2.3.2 Entity dc:creator, dc:publisher
Prior research with the Open Archives repositories [7] 2.3.3 Date dc:date
3 metametadata
indicates that most taxonomies use a controlled 3.1 catalogentry
vocabulary rather than freeform data input for an ele- 3.1.2 Entry dc:identifier
4 technical
ment; most also use different controlled vocabularies 4.1 Format dc:format
[6]. Conducting effective federated searches across 4.2 Size
4.3 location dc:identifier
multiple repositories involves identifying the source 4.4 requirement
for the vocabularies and developing a discipline- 4.4.1 orComposite
specific thesaurus. How to best implement or extend 4.4.1.1 typei
4.4.1.2 namei
standardized vocabularies and taxonomies is an open 4.4.1.3 minimumversion
question. Meanwhile, recognizing that LOM is still in 4.6 otherplatformrequirements
5 educational
the early stages of its development as a standard, the 5.2 learningresourcetypei dc:type
IEEE Learning Technology Standards Committee 5.3 interactivitylevels
5.5 intendedenduserrole s
Metadata Working Group is investigating the experi- 5.8 difficultys
ence of implementers, as well as users, in order to fur- 6 Rights
6.1 Cost
ther refine the standard [3]. 6.2 copyrightandotherdescription
6.3 description dc:rights
7 relation
ILUMINA EXPERIENCE 7.1 kind s
The iLumina implementation continues to be dri- 7.2 resource
7.2.1 identifier
ven by a joint application development team of 7.2.1.1 Catalog
users, digital library aficionados/academics, and IT 7.2.1.2 Entry dc:source, dc:relation
9 classification
specialists, including faculty, staff, and students at 9.2 taxonpath
the University of North Carolina Wilmington. (The 9.2.1 Source
9.2.2 Taxon
team meets monthly to review submitted resources 9.2.2.1 Id
and integration issues with the University’s Randall 9.2.2.2 Entry dc:subject
Library computer system.) The team initially used a Notes:
• The iLumina subset includes 44 elements in total (LOM has 78).
rapid application development process model to – 30 “active” elements (LOM has 59), and active elements can be given
values or populated.
develop a mockup, then implemented a prototype • iLumina includes 14 placeholder elements or complex types (LOM has 19),
application for review. It provided feedback and the ones in light gray; placeholder elements contain other elements.
• If elementnamei then the element uses an iLumina-controlled
revisions that were then incorporated into the next vocabulary. If elementnames then the element uses a subset of the LOM-
controlled vocabulary.
version. The team decided early to use a relational
database with a schema to support all LOM ele-
ments, though only a subset would be populated. ing resources would need more Table 2. iLumina
element subset of
Leaving some LOM elements unpopulated means descriptive information than we LOM and
flexibility in the addition/removal of metadata ele- originally anticipated. We thus corresponding
ments to library services. created a modified LOM specifi- DC elements.
Many team members from the various disciplines, cation table to capture this
including chemistry, biology, mathematics, physics, information (www.ilumina-dlib.org/documents/
and computer science, have contributed digital vocabulary_comparison_chart.htm).
resources to the library. It was during this population Decisions regarding implementation of the meta-
process that we learned about the importance of data elements and associated vocabularies were influ-
LOM metadata elements and how to use a standard, enced by the fact that responsibility for cataloging
controlled vocabulary. would eventually shift from trained catalogers to less-
In order to catalog resources for iLumina, we devel- experienced submitters of resources. With this in
oped sets of vocabularies and taxonomies based on the mind, the development team insisted that the con-
LOM specification. After an initial cataloging of a set trolled vocabulary use standard language. The result
of resources, it was evident that the educational learn- was two major changes: modification of the original
LOM vocabulary and an addi- Number Name LOM Vocabulary iLumina Vocabulary Reason for Change
tional metadata element. 1.8 structure Collection, Mixed, Linear, Collection, Individual Simplification of language to
Hierarchical, Networked, Learning Resource increase consistency. Identified
Branched, Parceled, collections for featured browse.
ost iLumina Atomic
vocabularies are 2.2 status Draft, Final, Revised, Submitted, In Review, Status is based on how a resource
Unavailable Accepted, Unavailable moves from submitted, to
a modified set in-review, to accepted. This provides
of the recom- the ability to “hold” a resource
and to “sort” resources for review
mended LOM purposes and to aid software in
managing the review process.
vocabulary (see Table 3).
For example, educational.
Unavailable status is used to
disable resources from public view
learningresourcetype
without having to delete them.
O
MIME type of the resource, though in keeping with Classification Scheme. Chemists used a modified
the focus on standard language, we found it useful to version of the Library of Congress taxonomy. Biolo-
categorize the MIME type list by media type gists developed their own taxonomy. And mathe-
(www.ilumina-dlib.org/documents/datacategories.htm). maticians created a common taxonomy for all
We referred to these categories as the educational levels (people.uncw.edu/hermanr/
technical.mediatype. They are often quite help- MathTax/index.htm).
ful to library users who may want an image but don’t
care if it’s gif, jpeg, or in some other format. The media ur experience with LOM also revealed
type is presented in the advanced search as a simple an Achilles heel in the standard-spec-
means of searching specific file types in iLumina. It is ified (Internet Mail Consortium RFC
also included in the resource-contribution form where 2426) way of dealing with the sub-
it functions as a filter to limit MIME type choices. mission of directory information for
electronic business cards; LOM includes the use of a
ILUMINA TAXONOMIES vCard, or a standard way of providing vital directory
To assist with the placement of resources within information (such as name, street address, phone
iLumina, we created three levels of taxonomies: number, and email address) as its preferred format
D
turned out to be a good way to standardize the inter- way. We find certain parts of LOM useful for
nal representation of directory information; it was describing resources to be added to the iLumina
also easier to integrate, parse, and maintain and was library, as both an individual collection and as part
generally helpful simplifying the software code used of a distributed digital library, the National Science
to find information in the directory (such as author Digital Library, funded by the National Science
last name). Foundation.
The iLumina project hired students to catalog the
library’s digital resources. Beginning early in the o LOM benefits outweigh LOM costs,
development effort, 2000–2001, the cataloging especially when compared to the DC
process sought out digital resources submitted by minimal-metadata approach? The fol-
some of our members. It informed us about the lowing paragraphs cover eight proposi-
arrangement of the input form, difficulties with tions based on the iLumina
vocabularies, errors in programming, and standard- implementation. Several follow directly from the
ization of metadata appearance. We used this infor- experience discussed earlier. Others are generalizations
mation to create the final versions of the metadata suggested, though not fully established, by our effort
T
specification, input form, metadata page, and organi- defining metadata language for incorporating library
zation of iLumina’s resources. This work, completed resources. We include them here because they repre-
in 2002, included an initial set of 200 learning sent broad claims that still need to be refined and
resources contributed by the development team. tested by future implementations of metadata stan-
Limiting the number of resources at this stage made dards, including LOM and DC.
it easier to make global changes to the metadata. LOM elements. Many LOM elements are useful in
describing learning resources; however, the most use-
he metadata review process is simple ful ones are also in (and mappable to) DC. The excep-
and efficient. Resources submitted to tion is the classification.taxon element, which
iLumina are initially categorized as is more expressive than subjective.
submitted and not available to the No evidence. A few of the LOM education elements
public. From a pending-items list, the (not shared by DC) may be valuable, but we found no
iLumina librarian views the resource, metadata, and compelling evidence for these additional fields. Though
date submitted, then forwards the resource to the the NSDL Core Integration team has suggested adding
appropriate discipline editor for review. The disci- three LOM educational elements to DC, iLumina
pline editor receives an email message with a link to has found little use for them. In
the review materials. The review itself includes 22 particular, NSDL suggests using educational.
questions in three categories: metadata, content, and interactivitytype (not populated in iLumina),
technical. The reviewer then emails completed educational.typicallearningtime (not popu-
review forms to the discipline editor. The editor lated in iLumina), and educational.interactiv-
checks the status of the review and determines itylevel (populated as low, high, or unspecified in
whether the resource is acceptable, accepted with iLumina). Some users like being able to distinguish the
revisions, or rejected. Once the resource is reviewed, high interactivity of a particular resource, meaning
it is tagged with its status. For a review flowchart, that more user interaction is required than just pressing
sample review checklist, and review summary see a button or clicking a mouse. iLumina has also
www.ilumina-dlib.org/documents/. cataloged educational.difficulty and educa-
Usability is a major consideration in presenting tional.intendedenduserrole, though it is
information on the iLumina Web site. In addition to unclear how useful this information is to the users of the
adhering to best practices for usability and accessi- library.
bility, as well as to addressing feedback from end Other categories. Elements in other categories (not
users, the iLumina Web site was scrutinized at our education-specific) appear to be equally important in
request in 2002 by two independent, outside usabil- describing resources in educational digital libraries,
ity studies, one by a group at Virginia Tech, the including NSDL. Although we cataloged techni-
other by a group at the University of North Carolina cal.size, finding it useful as a caution alert for
Chapel Hill. time-consuming downloads, it could be automated by