Metadata Lessons - Ilumina Digital Library

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

METADATA LESSONS

FROM THE

DIGITAL
iLumina
LIBRARY
They follow from the five-year effort to implement metadata
standards for learning objects in the iLumina digital library of

T
undergraduate teaching resources in science, technology, engineering,
and mathematics education.

ry finding high-quality educa- water) could be described by its format


tional resources on the Web. The (MPEG), its running time (10 seconds), and
difficulty often stems from the the discipline for which it was created (chem-
lack of good metadata attached to istry). The terms MPEG, 10 seconds, and
the learning objects of interest. chemistry are metadata values for the video
Content creators in any teaching repository clip, while the terms format, running time, and
must distinguish between the descriptions, or discipline are metadata elements.
metadata, of a learning object and the learning For users, metadata is critical to finding
object itself; doing so would help search resources, especially as the collections housing
engines quickly and accurately find the objects them are increasingly distributed across the
that meet the criteria specified by the user. For Web. Content creators must federate informa-
example, a learning object (such as a video clip tion about resources, making them accessible
depicting the chemical reaction of sodium and through centralized sites or portals. Standard-

68 July 2005/Vol. 48, No. 7 COMMUNICATIONS OF THE ACM


By Barbara P. Heath, David J. McArthur,
Marilyn K. McClelland, and Ronald J. Vetter

ized metadata—essential for enabling distributed to describe digital materials on the Web. We are also
access—is less critical when Web-based resources are aware that metadata standards are evolving and that
full text, since they can be examined directly by search our practical implementation results, coupled with
engines. But fewer and fewer resources on the Web, everyday use, will continue to shape the evolution of
especially those with educational value, are text alone; the standards.
many are composites of data types and formats. Even
for full-text resources, metadata enables users to ILUMINA DIGITAL LIBRARY
browse resources, pursuing smart federated searches. The iLumina digital library (www.ilumina-dlib.org),
The two main metadata schemas and standards funded by the National Science Foundation, con-
used today by digital libraries of educational resources tains undergraduate teaching materials in science,
are Dublin Core (DC) and IEEE Learning Object technology, engineering, and mathematics. Main-
Metadata (LOM) (see Table 1). LOM was released
as IEEE 1484.12.1 in June 2002 [2]. DC was
approved by the American National Standards Insti-
tute in September 2001 as ANSI/NISO Z39.85 and
ratified by the International Standards Organization
in January 2003 as ISO 15836 [5]. DC and LOM
are two approaches to providing standard-
ized metadata. DC takes a minimal
approach, keeping elements simple,

H
perhaps by trading off limited
expressive power. LOM is
structural, offering rich descrip-
tion, perhaps by trading off size
and cataloging effort.

ere, we
share our five
years of experi-
ence with an imple-
mentation of LOM (with
imports/exports of metadata in DC), drawing
general lessons useful to anyone who wants to
understand the practical challenges of using metadata

• Illustration by Serge Bloch

COMMUNICATIONS OF THE ACM July 2005/Vol. 48, No. 7 69


tained at the University of North Carolina Wilming-
ton, it covers a range of granular resources, from
individual data items (such as pictures and audio
clips) to complex data items (such as complete books
and online courses; see the figure). Our experience
developing and running iLumina and adding meta-
data to items in its collections suggests that educators
in these disciplines throughout the U.S. have created
a wealth of digital resources for teaching and are
happy to share them. iLumina and other such digital
libraries (see Table 1) provide repositories where edu-
cators submit metadata for their materials, find
related resources (without having to reinvent them),
create new content (either individually or through
collaboration), and collectively improve the quantity
and quality of digital teaching resources.
We derive iLumina metadata from the IEEE LOM
standards. Table 2 outlines the subset of LOM meta-
data elements we’ve identified through the iLumina
browse, search, and search-results pages, along with
their abstract structural organization. LOM consists
of complex types and placeholder elements; complex
types are active elements that hold metadata values,
collectively forming a hierarchical metadata structure.

M
iLumina resources populate 44 different metadata ele- tered as data providers with the The iLumina
home page user
ments (only a subset of the 78 possible unique ele- Open Archives Initiative, which interface for
ments in LOM). Table 2 also includes the mapping of develops and promotes interoper- accessing science
LOM to DC (as suggested in [4]) when importing or ability standards for content dis- and mathematics
educational
exporting data to services based on DC. Simple DC semination (www.openarchives. resources.
is based on a 15-element set. org). It found that of the 15 DC
elements, two—identifier and creator—accounted
any theoretical discussions focus on for almost 50% of element use. Overall, the top seven
individual metadata standards, but DC elements—creator, identifier, title, date, type,
few empirical studies focus Tableon the 1. Examples of Learning
subject, and Technology Initiatives
description—accounted for over 70% of
and their
patterns of use of the standards that the elements used in theMetadata Schema.
records; 50% of the data
would be useful for guiding imple- providers never populated any other elements; and
mentation decisions. Although
Learning Technology Web Site Metadata Schema
many elements require vocabular- Initiative
ies as values for elements, few Educational Network Australia www.edna.edu.au DC
standards are available for using ARIADNE Foundation for the www.ariadne-eu.org LOM
them, and few established policies European Knowledge Pool
are available for selecting them. Advanced Distributed Learning www.adlnet.org Sharable Content Object Reference Model
LOM offers some best-practice Initiative
vocabularies [4], though they are National Science Digital Library www.nsdl.org Requires DC and three selected LOM data
elements, developing crosswalks to other
often provisional, and end users schemas, including LOM
frequently find it necessary to Open Archives Initiative www.openarchives.org Recommends DC and supports other schemas
establish their own terms and tax- including LOM

onomies. BiosciEdNet www.biosciednet.org LOM


The few empirical studies Digital Library for Earth System www.dlese.org LOM
available today do not provide Education
encouraging evidence that users Gateway to Educational Materials www.geminfo.org DC plus GEM-defined pedagogical elements
adopt standards easily or system-
atically. For example, [9] examined the DC metadata Table 1. Examples of learning technology initiatives
and element use of more than 100 collections regis- and their metadata schema.

Vetter table 1 (7/05)


70 July 2005/Vol. 48, No. 7 COMMUNICATIONS OF THE ACM
Vetter table 2 (7/05)

the least-used elements were accessed only 6% of the LOM No. LOM Element Name DC mapping
1 General
time. In this large data set, the use of DC metadata 1.2 Title dc:title
elements by data providers registered with the OAI 1.3 language dc:language
1.4 description dc:description
was selective and sparse. 1.5 keyword dc:subject
The main mechanisms for interoperability, 1.7 structurei
2 lifecycle
explored in [1], are controlled vocabularies for meta- 2.2 statusi
data, taxonomies for classification, and thesauri and 2.3 contribute
2.3.1 rolei
crosswalks between vocabularies and taxonomies. 2.3.2 Entity dc:creator, dc:publisher
Prior research with the Open Archives repositories [7] 2.3.3 Date dc:date
3 metametadata
indicates that most taxonomies use a controlled 3.1 catalogentry
vocabulary rather than freeform data input for an ele- 3.1.2 Entry dc:identifier
4 technical
ment; most also use different controlled vocabularies 4.1 Format dc:format
[6]. Conducting effective federated searches across 4.2 Size
4.3 location dc:identifier
multiple repositories involves identifying the source 4.4 requirement
for the vocabularies and developing a discipline- 4.4.1 orComposite
specific thesaurus. How to best implement or extend 4.4.1.1 typei
4.4.1.2 namei
standardized vocabularies and taxonomies is an open 4.4.1.3 minimumversion
question. Meanwhile, recognizing that LOM is still in 4.6 otherplatformrequirements
5 educational
the early stages of its development as a standard, the 5.2 learningresourcetypei dc:type
IEEE Learning Technology Standards Committee 5.3 interactivitylevels
5.5 intendedenduserrole s
Metadata Working Group is investigating the experi- 5.8 difficultys
ence of implementers, as well as users, in order to fur- 6 Rights
6.1 Cost
ther refine the standard [3]. 6.2 copyrightandotherdescription
6.3 description dc:rights
7 relation
ILUMINA EXPERIENCE 7.1 kind s
The iLumina implementation continues to be dri- 7.2 resource
7.2.1 identifier
ven by a joint application development team of 7.2.1.1 Catalog
users, digital library aficionados/academics, and IT 7.2.1.2 Entry dc:source, dc:relation
9 classification
specialists, including faculty, staff, and students at 9.2 taxonpath
the University of North Carolina Wilmington. (The 9.2.1 Source
9.2.2 Taxon
team meets monthly to review submitted resources 9.2.2.1 Id
and integration issues with the University’s Randall 9.2.2.2 Entry dc:subject
Library computer system.) The team initially used a Notes:
• The iLumina subset includes 44 elements in total (LOM has 78).
rapid application development process model to – 30 “active” elements (LOM has 59), and active elements can be given
values or populated.
develop a mockup, then implemented a prototype • iLumina includes 14 placeholder elements or complex types (LOM has 19),
application for review. It provided feedback and the ones in light gray; placeholder elements contain other elements.
• If elementnamei then the element uses an iLumina-controlled
revisions that were then incorporated into the next vocabulary. If elementnames then the element uses a subset of the LOM-
controlled vocabulary.
version. The team decided early to use a relational
database with a schema to support all LOM ele-
ments, though only a subset would be populated. ing resources would need more Table 2. iLumina
element subset of
Leaving some LOM elements unpopulated means descriptive information than we LOM and
flexibility in the addition/removal of metadata ele- originally anticipated. We thus corresponding
ments to library services. created a modified LOM specifi- DC elements.
Many team members from the various disciplines, cation table to capture this
including chemistry, biology, mathematics, physics, information (www.ilumina-dlib.org/documents/
and computer science, have contributed digital vocabulary_comparison_chart.htm).
resources to the library. It was during this population Decisions regarding implementation of the meta-
process that we learned about the importance of data elements and associated vocabularies were influ-
LOM metadata elements and how to use a standard, enced by the fact that responsibility for cataloging
controlled vocabulary. would eventually shift from trained catalogers to less-
In order to catalog resources for iLumina, we devel- experienced submitters of resources. With this in
oped sets of vocabularies and taxonomies based on the mind, the development team insisted that the con-
LOM specification. After an initial cataloging of a set trolled vocabulary use standard language. The result
of resources, it was evident that the educational learn- was two major changes: modification of the original

COMMUNICATIONS OF THE ACM July 2005/Vol. 48, No. 7 71


M
Table 3 – Changes to the LOM Controlled Vocabulary.

LOM vocabulary and an addi- Number Name LOM Vocabulary iLumina Vocabulary Reason for Change
tional metadata element. 1.8 structure Collection, Mixed, Linear, Collection, Individual Simplification of language to
Hierarchical, Networked, Learning Resource increase consistency. Identified
Branched, Parceled, collections for featured browse.
ost iLumina Atomic
vocabularies are 2.2 status Draft, Final, Revised, Submitted, In Review, Status is based on how a resource
Unavailable Accepted, Unavailable moves from submitted, to
a modified set in-review, to accepted. This provides
of the recom- the ability to “hold” a resource
and to “sort” resources for review
mended LOM purposes and to aid software in
managing the review process.
vocabulary (see Table 3).
For example, educational.
Unavailable status is used to
disable resources from public view

learningresourcetype
without having to delete them.

involves a suggested vocabulary 4.8-1 mediatype


(new
Audio, Animation,
Chemical Structure,
To provide user-friendly language
in advanced search form. Also
that proved to be of only limited element) Database, Executable,
Image, Java Applet, Math
assists limiting the choices for
MIME type in the contribute form
use for the scientific resources cat- Application, Portable to increase metadata consistency.
Document, Presentation,
aloged in iLumina. Because not Software Source Code,
all learning resources being sub- Spreadsheet, Video, Web
Page, Word Processing
mitted could be limited to the Document.
suggested list, we included other 5.2 learning Exercise, Simulation, Course, Lesson, Book, List is more inclusive for
resource types. Adding resource resourcetype Questionnaire, Diagram, Presentation, Example,
Figure, Graph, Index,
undergraduate science and
Demonstration, Simulation, mathematics resources.
types is an iterative process; as Slide, Table, Narrative Lab, Exercise, Assessment,
Text, Exam, Experiment, Project, Dataset, Syllabus,
new items were cataloged, we ProblemStatement, Lesson Plan, Teacher Tool,
SelfAssesment Learner Tool, Manager Tool
added new vocabularies until we
were confident we had addressed 7.1 kind Dublin Core: { IsPartOf, IsPartOf, IsBasedOn Restricted as a tracking tool for
HasPart, IsVersionOf, determining the amount of
the majority of resource- HasVersion, IsFormatOf, repurposing in the library. Also
HasFormat, References, assists connecting large
submission options (see www. IsReferencedBy, collections in which individual
ilumina-dlib.org/documents/ IsBasedOn, IsBasisFor,
Requires, IsRequiredBy }
resources are cataloged.

vocabulary.htm for a complete


9.2.2.2 entry iLumina disciplines include A set of taxonomies was created,
table). biology, chemistry, since the disciplines in iLumina
We also found it desirable to computer science,
mathematics, and physics.
lacked readily accepted taxonomies.

add an additional metadata ele- A taxonomy classification


was developed for each.
ment to the original IMS Global
Learning Consortium (LOM-
based) [4] specification. Calling Table 3. Changes to discipline, subject, and topic (www.ilumina-
it technical.mediatype, we the LOM-controlled Vetter table 3 (7/05)
vocabulary. dlib.org/ documents/ims_classifications.htm). For
used it to assist with the presenta- example, discipline representatives find or create
tion of information about the specific taxonomies for their respective disciplines.
technical.format of a resource. Technical.for- Computer scientists defined their taxonomies based
mat is the IMS specification that describes the on the ACM/IEEE Computing Curricula 2001

O
MIME type of the resource, though in keeping with Classification Scheme. Chemists used a modified
the focus on standard language, we found it useful to version of the Library of Congress taxonomy. Biolo-
categorize the MIME type list by media type gists developed their own taxonomy. And mathe-
(www.ilumina-dlib.org/documents/datacategories.htm). maticians created a common taxonomy for all
We referred to these categories as the educational levels (people.uncw.edu/hermanr/
technical.mediatype. They are often quite help- MathTax/index.htm).
ful to library users who may want an image but don’t
care if it’s gif, jpeg, or in some other format. The media ur experience with LOM also revealed
type is presented in the advanced search as a simple an Achilles heel in the standard-spec-
means of searching specific file types in iLumina. It is ified (Internet Mail Consortium RFC
also included in the resource-contribution form where 2426) way of dealing with the sub-
it functions as a filter to limit MIME type choices. mission of directory information for
electronic business cards; LOM includes the use of a
ILUMINA TAXONOMIES vCard, or a standard way of providing vital directory
To assist with the placement of resources within information (such as name, street address, phone
iLumina, we created three levels of taxonomies: number, and email address) as its preferred format

72 July 2005/Vol. 48, No. 7 COMMUNICATIONS OF THE ACM


for personal data, rather than an XML syntax format. CONCLUSION
Although XPath expressions can be written to parse International efforts continue on the specification
the vCard for desired data, we chose an xCard, or a and use of the LOM standard, including its various
vCard expressed in XML semantics. Using an xCard data models; implementation efforts are also under

D
turned out to be a good way to standardize the inter- way. We find certain parts of LOM useful for
nal representation of directory information; it was describing resources to be added to the iLumina
also easier to integrate, parse, and maintain and was library, as both an individual collection and as part
generally helpful simplifying the software code used of a distributed digital library, the National Science
to find information in the directory (such as author Digital Library, funded by the National Science
last name). Foundation.
The iLumina project hired students to catalog the
library’s digital resources. Beginning early in the o LOM benefits outweigh LOM costs,
development effort, 2000–2001, the cataloging especially when compared to the DC
process sought out digital resources submitted by minimal-metadata approach? The fol-
some of our members. It informed us about the lowing paragraphs cover eight proposi-
arrangement of the input form, difficulties with tions based on the iLumina
vocabularies, errors in programming, and standard- implementation. Several follow directly from the
ization of metadata appearance. We used this infor- experience discussed earlier. Others are generalizations
mation to create the final versions of the metadata suggested, though not fully established, by our effort

T
specification, input form, metadata page, and organi- defining metadata language for incorporating library
zation of iLumina’s resources. This work, completed resources. We include them here because they repre-
in 2002, included an initial set of 200 learning sent broad claims that still need to be refined and
resources contributed by the development team. tested by future implementations of metadata stan-
Limiting the number of resources at this stage made dards, including LOM and DC.
it easier to make global changes to the metadata. LOM elements. Many LOM elements are useful in
describing learning resources; however, the most use-
he metadata review process is simple ful ones are also in (and mappable to) DC. The excep-
and efficient. Resources submitted to tion is the classification.taxon element, which
iLumina are initially categorized as is more expressive than subjective.
submitted and not available to the No evidence. A few of the LOM education elements
public. From a pending-items list, the (not shared by DC) may be valuable, but we found no
iLumina librarian views the resource, metadata, and compelling evidence for these additional fields. Though
date submitted, then forwards the resource to the the NSDL Core Integration team has suggested adding
appropriate discipline editor for review. The disci- three LOM educational elements to DC, iLumina
pline editor receives an email message with a link to has found little use for them. In
the review materials. The review itself includes 22 particular, NSDL suggests using educational.
questions in three categories: metadata, content, and interactivitytype (not populated in iLumina),
technical. The reviewer then emails completed educational.typicallearningtime (not popu-
review forms to the discipline editor. The editor lated in iLumina), and educational.interactiv-
checks the status of the review and determines itylevel (populated as low, high, or unspecified in
whether the resource is acceptable, accepted with iLumina). Some users like being able to distinguish the
revisions, or rejected. Once the resource is reviewed, high interactivity of a particular resource, meaning
it is tagged with its status. For a review flowchart, that more user interaction is required than just pressing
sample review checklist, and review summary see a button or clicking a mouse. iLumina has also
www.ilumina-dlib.org/documents/. cataloged educational.difficulty and educa-
Usability is a major consideration in presenting tional.intendedenduserrole, though it is
information on the iLumina Web site. In addition to unclear how useful this information is to the users of the
adhering to best practices for usability and accessi- library.
bility, as well as to addressing feedback from end Other categories. Elements in other categories (not
users, the iLumina Web site was scrutinized at our education-specific) appear to be equally important in
request in 2002 by two independent, outside usabil- describing resources in educational digital libraries,
ity studies, one by a group at Virginia Tech, the including NSDL. Although we cataloged techni-
other by a group at the University of North Carolina cal.size, finding it useful as a caution alert for
Chapel Hill. time-consuming downloads, it could be automated by

COMMUNICATIONS OF THE ACM July 2005/Vol. 48, No. 7 73


programmatically looking at the file properties, then tal libraries relative to DC. However, such an effi-
notifying users of the expected download time. ciency would still not solve some of the more funda-
Non-LOM and non-DC elements. At least a few ele- mental and semantic problems of defining and using
ments not in LOM and DC may be important. For metadata.
example, because mediatype could be derived from
technical.format, it may not have to be added as References
an element. It is useful when designing the user inter- 1. Duval, E., Forte, E., Cardinaels, K., Verhoeven, B., Van Durm, R.,
face to map format into mediatype. However, iLu-
Hendriks, K., Forte, M., Ebel, N., Macowicz, M., Warkentyne, K., and
Haenni, F. The ARIADNE Knowledge Pool System. Commun. ACM
mina still needs a format for interoperability. For 44, 5 (May 2001), 73–78.
2. IEEE. Standard for Information Technology—Education and Training
example, a user must use .swf video, not .mov video, Systems—Learning Objects and Metadata. IEEE Standard 1484.12.1;
for Macromedia Flash videos. ltsc.ieee.org/wg12/.
Useful in the future. Although several LOM ele- 3. IEEE Learning Technology Standards Committee. Position Statement on
1484.12.1—2002 Learning Object Metadata (LOM) Standard Mainte-
ments, including many in the educational category, nance Revision; ltsc.ieee.org/wg12/index.html.
are of limited value, due to their semantic ambiguity, 4. IMS Global Learning Consortium, Inc. IMS Learning Resource Metadata
Best Practices and Implementation Guide, Version 1.0—Final Specifica-
they may be valuable in the future—but only if they tion, 2001; www.imsproject.org/metadata/mdbest01.html.
are first well defined and attract a user community 5. ISO. Information and Documentation—The Dublin Core Metadata Ele-
that applies them systematically. For example, seman- ment Set, ISO 15836: 2003; www.niso.org/international/SC4/n515.pdf.
6. Liu, X., Maly, K., Zubair, M., Hong, Q., Nelson, M., Knudson, F., and
tic density and difficulty are often highly dependent Holtkamp, I. Federated searching interface techniques for heterogeneous
on context. Furthermore, iLumina narrowed its OAI repositories. J. Digital Inform. 2, 4 (May 2002);
vocabulary for interactivity to low, high, or not spec- jodi.ecs.soton.ac.uk/Articles/v02/i04/Liu/.
7. Open Archives Initiative. The Open Archives Initiative Protocol for Meta-
ified; determining whether or not a resource is data Harvesting, 2002; www.openarchives.org/OAI/openarchivesproto-
“medium” or “very high” (as allowed in LOM) is sub- col.htm.
jective. Moreover, technical.requirements are 8. Soergel, D. A framework for digital library research. D-Lib Magazine 8,
12 (Dec. 2002); www.dlib.org/dlib/december02/soergel/12soergel.html.
most useful for resources that require some sort of 9. Ward, J. A quantitative analysis of unqualified Dublin Core metadata
runtime environment (such as source code, executa- element set usage within data providers registered with the Open
Archive Initiative. In Proceedings of the 2003 Joint Conference on Digital
bles, and plug-ins). Resource types do not all benefit
equally from the technical.requirements
Libraries (Houston, May 27–31). IEEE Computer Society, Washington,
D.C., 2003, 315–317.
element.
Vocabularies. Vocabularies represent a much greater Barbara P. Heath ([email protected]) is president of East Main
challenge than specifying metadata elements in terms Educational Consulting, LLC, Southport, NC.
David J. McArthur ([email protected]) is a program director
of interoperability in distributed libraries, especially in the Division of Undergraduate Education at the National Science
when trying to share metadata for interoperability. Foundation, Arlington, VA.
Dealing with mismatched vocabularies is much more Marilyn K. McClelland ([email protected]) is a professor
difficult, as we discovered. In order to achieve aggre- of computer information systems at North Carolina Central University
in Durham.
gation across repositories, we need a common con- Ronald J. Vetter ([email protected]) is a professor and chair
trolled vocabulary and taxonomies. An NSDL of the Department of Computer Science at the University of North
working group researching vocabularies and establish- Carolina Wilmington.
ing standards for elements (such as LearningRe-
sourceType and EducationalLevel) recently This material is based on work supported by the National Science Foundation under
Grant No. 0002935. Any opinions, findings, and conclusions or recommendations
held a workshop and set up an online community to expressed here are those of the authors and do not necessarily reflect the views of the
debate the issues (metamanagement. comm.nsdl. National Science Foundation.
org/cgi-bin/wiki.pl?VocabDevel).
Permission to make digital or hard copies of all or part of this work for personal or
Less is more. Descriptive information should not all classroom use is granted without fee provided that copies are not made or distributed
be encapsulated in a single metadata record or for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. To copy otherwise, to republish, to post on servers or to redis-
schema; for example, administrative records should tribute to lists, requires prior specific permission and/or a fee.
be kept separate from object metadata; so should
vCards and xCards. Any distinct schema information
should be held separately, including review data (sub-
mitted, accepted, in-review) and annotations.
Cost/benefit balance. The automation of metadata
record creation could change the cost/benefit balance
between LOM and DC metadata. Also worth point-
ing out is that the high cost of creating LOM records
could be reduced, making it more attractive for digi- © 2005 ACM 0001-0782/05/0700 $5.00

74 July 2005/Vol. 48, No. 7 COMMUNICATIONS OF THE ACM

You might also like