DIGITAL LIBRARIES - GOOD OR BAD CHOICES ON ORGANIZING INFORMATION - Ubiquitous Computing and Communication Journal

DIGITAL LIBRARIES – GOOD OR BAD CHOICES ON ORGANIZING
INFORMATION
Adi-Cristina Mitea, Daniel Volovici, Antoniu Pitic

“Lucian Blaga” University of Sibiu - Computer Science Department, Romania
[email protected], [email protected], [email protected]
ABSTRACT
Digital documents as the real ones have to be classified and indexed in a library for
proper future exploitation. Classification and indexation process is a hard one for
librarians all over the world. A software system can ease their work and make the
process more accurate. We present in our paper methods for classifying and
indexing publications, suitable for such a system and analyze different storage and
index database management systems capabilities in order to use them as support
for classification, indexation and retrieval processes in an integrated software
system for libraries. Furthermore, the problem of storing and retrieval of full
content of a publication is taken into consideration.
Keywords: digital library, classification, indexation, storage and index structures.
1 INTRODUCTION method for classifying and indexing documents from

everywhere. Another very hard problem is that
Significant progress was made in computers and libraries do not used or even use today the same
information technologies in last decades. So, today format to store their data. If data will be put in the
we have computers at work everywhere, from same format, this will make possible a distributed
technical to socio-human and services fields. Growth search in all connected libraries. With a computer aid
in computing and storing capabilities, also with the it is possible to automate the classification and
possibility to interconnect different computer indexation process and also to retrieve publications
systems determined a radical change in the way we which match some particular criteria from different
perceive and interact with a lot of today’s real world interconnected digital libraries. Libraries data,
concepts. One of them is the library concept. classification data, indexation data have to be stored
Computers and information technology introduced a in a database for future processing so it is very
new concept, that of digital library. The classical important to fully understand their characteristics in
management methods used in a public library had to order to make the best selection.
be changed and improved so they can benefit from Application designers must decide whether to
the new technologies. A digital library may permit store binary large objects, in our case the actual
not only to store, in a digital form, classical content of a digital library, in a filesystem or in a
information about books and other publications like database. Generally, this decision is based on factors
author, title, publishing house, publication year, such as application simplicity, manageability or
ISBN, ISSN, table of contents, abstract/full text, etc, system performance.
but also has to offer users an easily, rapidly and
accurate method for retrieving desired publications. 2 CLASSIFICATION AND INDEXATION
Often readers do not know the title and author of METHODS FOR LIBRARIES
a book or publication they need or they want a
publication that covers a specific field or subject of Librarians developed over the years different
interest. The librarian has to be able to deliver the methods to classify and index library publications to
right books for them. To be able to do that, the be able to manage more easily the library content and
librarian might use classification and indexation to deliver to readers the right books. Many of these
methods. methods lost over the time, because they were
Classifying and indexing publications in a difficult to apply and laborious, but some of them are
library is a very important task for a librarian and it still known and applied in different libraries.
is essential for future successful exploitation of the Unfortunately, there is not in the present a unique
library assets. Digital libraries can be interconnected, method accepted and applied by everyone from
so it is very important to have and use a similar every library in the world. This weakness makes the
information exchange between libraries very hard in indexes are used.
practice. If such a method will be adopted,
computers could be used to manage more easily 2.2 Subject Headings Indexation Method
library publications classification, indexation and Document indexation is the process that
retrieval aspects. Our first work consists of analyzing describes the content of a document with the aid of
different classification and indexation methods special terms called descriptors. The principles and
developed for libraries and we identify the best rules for select and validate the descriptors and to
solutions from the point of view of a future digital index the documents are subjects of standardization
library which has to be connected with other digital with the aim of a consistent and similar information
libraries. Below, we present three classification and process. One language all descriptors are called
indexation methods suitable for digital libraries. linguistic thesaurus of that language.
The linguistic thesaurus of a language is a
2.1 Universal Decimal Classification Method standard descriptors list, alphabetically ordered,
A standard method for publications classification which indicates the semantic, hierarchical and
is universal decimal classification (UDC). UDC is a associative logic relationships between them.
system of library classification developed by the Descriptors are in indexing unique accepted
Belgian bibliographers Paul Otlet and Henri La forms, they have authority, and this is the reason why
Fontaine at the end of 19th century. It is based on the vocabulary and linguistic thesaurus are called in
Dewey Decimal Classification (DDC), but uses librarian’s literature authority lists.
auxiliary signs to indicate various special aspects of Nowadays there is more than one linguistic
a subject and relationships between subjects. This thesaurus used for indexing purposes. The best
was developed as a classification system for all known and used are LCSH (Library of Congress
human knowledge [1]. It can be used for so called Subject Headings) in English [2] and RAMEAU
primary documents like books, periodicals, audio- (Repertoire d' Autorite-Matiere Encyclopedique et
video documents, or for secondary documents like Alfabetique Unifie) in French [3]. Both are
catalogs, syllabuses, bibliographies, etc. encyclopedic linguistic thesaurus and are for one and
UDC offers the possibility to group together all only one language. This is a constraint which limits
the materials referring to the same subject, expressed for the moment information exchange.
and localized in an undoubting manner. Digits are Specialized linguistic thesaurus, with descriptors
used as universal decimal codes and this is very dedicated to a specific domain, was developed also
important because digits have the same meaning in by professional associations, research centers or
entire world. In this case linguistic barriers do not international organizations. Some of them are multi-
exist and international information exchange is linguistic to facilitate information exchange, but this
possible. is still at the beginning.
Universal decimal classification can be A successful access to documents is determined
considered as a base for terminology comparisons in a great deal by a correct and complete analysis of
and can be used as an international terms code in all its content. Access points to publications subjects are
domains. In essence, UDC is a practical system for called in literature subject headings. Indexing
numerical codification of information, so that through subject headings mean to classify
information can be easily retrieved regardless of the publications by access points to publications
way it is perceived. Human knowledge, seen as a subjects, principal subjects developed in publication
unit, is divided in ten big classes symbolized by content.
decimal fractions. Each of these classes is divided in The process of indexation has to take in
ten subclasses by adding a new digit to the code. The consideration the following aspects:
rule of dividing a class in ten subclasses by adding a
• Subject headings concision – a subject
new digit to the code is extended with respect to the
heading has to express one and only one
principle of deriving from general to particular.
idea. Document subject headings have to
In practice, the subject of a document which
express in a concise and brief manner the
needs to be classified is not always simple and
document content.
clearly delimited. This makes it necessary to have
• Subject headings objectivity – subject
more than a main UDC index for a document.
headings have to reflect only the document
Auxiliary universal decimal classification indexes
content, they do not issue value judgment.
are also used.
The document subject can be a complex • Subject headings specificity – a document
combination of multiple aspects and it will be will not be indexed in the same time at
represented by different classification indexes tied General Terms and Specific Terms for the
together, or it can be a particular aspect of a main same subject heading.
universal decimal classification aspect which implies • Subject headings coherence – documents
that auxiliary universal decimal classification will be indexed using subject headings
which are conform to standard rules • Pattern subheading – express the
introduced by LCSH and RAMEAU for presentation format of the document: article,
defining subject headings. They transform poster, biography, bibliography, dictionary,
natural language in a special language for etc. (for example, Agriculture**Corn
indexation. plant**Romania**Poster)
Indexation has three phases: 2.3 UNIMARC Method

UNIMARC (UNIversal MAchine Readable
• Document analysis – the document must be Cataloging) is a standard format to tag library’s
read (table of content, abstract, full text, publications in a machine readable form [4]. Using it
bibliography) with the aim to understand its will make possible an easy information exchange
content and to identify the subjects which are between different libraries, because they all speak
treated in it and can be used for indexation.
the same language. UNIMARC is recommended by
• Subject headings selection –this process is IFLA (International Federation of Library
governed by special information needs Associations) to all public and private libraries.
expressed by library’s users and by library In UNIMARC format a publication is recorded
and users profile. by its content in a block of subjects. This block
describes document content and is made of
• Subject headings validation – for a successful
indexation the selected subject headings have classification fields and indexation fields.
to be correct concepts. These subjects must Usually libraries working with UNIMARC
be validated by comparison with widely format use both methods: classification and
accepted subjects who are present in indexation, for analyzing and tagging publications.
librarian’s specific literature or specialized These make the process of information retrieving
scientific and academic databases. All those more precise and efficient in case of both detailed
are considered auxiliary indexation tools. and specific information research. UNIMARC
format permits also to localize the publication in the
Subject headings are made of a principal entry library shelf if the quote indexes are used for
called heading entrance and one or more publications [5].
subheadings. The heading-entrance must express the In Romanian public libraries is very important to
essence of document content: the concept, the main use in future parallel classification and indexation
notion, the phenomenon or the process. It is possible methods, because until now only classification
to have a simple document and in this case the through universal decimal codes was used.
heading-entrance is enough to express the document UNIMARC format uses subject headings method for
content, but usually documents are complex and the indexation and is suitable for recording data in a
document heading-entrance is followed by several library database. Pre-coordination in indexation
subheadings to be more accurate. Those subheadings suppose to access authorization lists and linguistic
express more information about the subject, space thesaurus created in advance and is very difficult to
localization, time period and pattern of the be done by librarians in absence of those indexation
document. instruments in their native language. For example,
The unit heading entrance-subheadings is called we don’t have yet a complete linguistic thesaurus
subject headings. Internal parts are separated by
and authorization lists defined in Romanian
double stars ** or double lines – like this:
language. The Romanian National Library’s
subject subheading **space localization specialists are working on it. The librarian has also
subheading **time period subheading**pattern to translate the subject headings from his native
subheading language to English or French to be able to validate
• Subject subheading – means a document it according with LCSH or RAMEAU, which are the
content aspect which is important and is not best known and used linguistic thesaurus. This
covered by the heading-entrance (for burden is a very difficult one for Romanian librarians
example, Automotive**Engine design) and our project intend to develop a software tool to
assist them.
• Space localization subheading – express the A specialized information system would be
spatial localization implied by the document developed for an automatic indexation and
content if this exists (for example, classification process of library assets: books,
Agriculture**Corn plant**Romania) periodicals, audio-video documents, catalogs,
• Time period subheading – express the syllabuses, bibliographies, etc. Specialized
temporal localization implied by the informatics systems could be used for an automatic
document content if this exists (for example, indexation process if library publications are in a
Agriculture**Corn plant**Romania**19th digital form.
century)
3 DBMS EVALUATION FOR DIGITAL for a particular type of application then other.
LIBRARIES SUPPORT Oracle 10i can store data in one of the following
data tables [6]:
An automatic classification and indexation Heap tables store table rows in file data blocks
system implies undoubting a database. These as variable length records. It is the most common
systems have to be able to store and manage all the table structure. Data types can be system defined
data needed in the process of classifying and data types or user defined data types. System defined
indexing publications, such as universal decimal data types can be classical scalar types like: CHAR,
codes with their principal and auxiliary indexes, NCHAR, VARCHAR2, NUMBER, DATE, etc.,
language linguistic thesaurus, subject headings with types for storing large data objects like: LONG,
their entrance-heading and subheadings, in order to LONG RAW, LOB, BLOB, etc., collection data
process libraries digitized documents. Classifying types like: VARRAY and TABLE (nested table), or
and indexing documents also produces outputs that reference type REF used for implementing object
had to be stored in a database for better management. identity in an object-relational database. There are
In order to develop our system, we made a study also data types extensions called Data Cartridges that
about today’s database management systems can be used for complex data types like: text, audio-
characteristics at the physical level of the database video, images, time series, spatial data, etc. These
architecture. Our goal was to fully understand these tables are suitable for storing publications data.
characteristics and to identify the best solutions to be Partitioned tables store table rows in different
implemented for a digital library support. data segments according with a partitioning method.
Today’s software systems requirements are more All rows with the same partitioning method value are
and more complex and variable so database stored together in a data segment and these segments
management systems (DBMS) producers have to can be stored in different table spaces on the same or
face new challenges. This is the reason way they on different storage spaces. Partitioned tables are
permanently improve their systems, implementing useful for large data tables with a lot of concurrent
new capabilities. Storage structures and access processing. System performance can be improved
methods of database management systems have been because queries can be directed only to those
changed lately and today systems designers have to partitions containing data, or DML (data
be able to choose the best solution for physical manipulation language) operations can be performed
database model from a variety of possibilities. in parallel with a higher grade on different partitions.
Informatics systems usually rely on a database and Also join operation between tables can benefit from
the success of future applications are in a great deal partitioned tables if tables are partitioned on the
determined by database structure and the way data same rule.
accesses are made. Systems performance in their Oracle 10i offers several table partitioning
operational phase is influenced in a big percentage methods designed to handle different application
by physical database design. If the designer makes scenarios:
the wrong choices during the database design • Range partitioning uses ranges of column
process, the data could not be easily accessed later as values to map rows to partitions. Partitioning
they are needed and the success of the entire system by range is particularly well suited for
will be compromised. So, it is very important to historical databases or for large databases in
know and understand these new capabilities of which an old data package must be replaced
DBMS with the aim to choose the right solution for from time to time with a new one.
the problem you want to solve. • Hash partitioning uses a hash function on the
We evaluated storage and index characteristics partitioning columns to stripe data into
of three of the most important today’s database partitions. Hash partitioning is an effective
management systems in order to support an means of evenly distributing data.
automated classifying and indexing library system.
Our analyze was made on Oracle 10i, Oracle • List partitioning allows users to have explicit
Corporation product, SQL Server, Microsoft control over how rows map to partitions. This
Corporation product, and DB2, IBM Corporation is done by specifying a list of discrete values
for the partitioning column in the description
product. We choose these products because they are
for each partition.
the best database management systems on the market
today. We studied storage structures for data tables • Range-hash partitioning uses a mixture of
and index structures implemented in these DBMS. range and hash partitioning methods to map
rows to partitions.
3.1 Oracle 10i
• Range-list partitioning uses a mixture of
In Oracle 10i data can be stored in different range and list partitioning methods to map
types of tables. Table structure characteristics are rows to partitions.
different from type to type and make it more suitable
Partitioned tables are suitable for a distributed is set means that the row contains the key value.
digital library system or in case of interconnected They are more compact, suitable for low-cardinality
digital libraries for a central metadata repository. columns and are very useful when DML operations
Index-organized tables store table rows directly are seldom.
in an index structure. Leaf nodes of the B-tree index Function-based index – a function is applied to
store table rows directly and this eliminates the the index key columns before the index is created.
additional storage required for ROWID (row Bitmap join index – is an index structure which
identifier), which store the addresses of rows in spans multiple tables and improves join operations
ordinary tables and are used in conventional indexes performance on those tables. A bitmap join index can
to link the index values and the row data. Index- be used to avoid actual joins of tables, or to greatly
organized tables are build on primary key and reduce the volume of data that must be joined, by
provide fast access to table data for queries involving performing restrictions in advance. Queries using
exact match and/or range search on the primary key. bitmap join index can be sped up via bit-wise
Queries involving other columns values are much operations. They are very useful for tables with
slower. Also DML operations can be slower when frequently join operations between them.
they imply index structure reorganization. Local partitioned index – is an index for a
Index-organized tables are suitable for storing partitioned table which has the same index key as the
universal decimal classification codes or linguistic partition key. If database tables are partitioned their
thesaurus data or subject headings data of an existence is imperative.
automated classification and indexation system. Global partitioned index – is an index structure
Clustered tables store table rows offering some for a normal or partitioned table, which is partitioned
degree of control over how rows are stored. Oracle and stored separately using a partition key. It is
server stores all rows that have the same cluster key suitable for multiple concurrent accesses on the
value in the same block if this is possible. When data database.
are searched by cluster key value all records are Global non-partitioned index – is an index
together and they could be obtained in a single disk structure for a partitioned table. . If database tables
access. A clustered table can be used also to store are partitioned their existence is imperative.
related sets of rows from different database tables
within the same Oracle server block. This is very 3.2 DB2
efficient when database queries imply joins on those DB2 offers two structure possibilities for storing
tables on cluster key. The cluster can be an index data in a database [7]. These are:
cluster or a hash cluster according to the way the Heap tables store table rows in no particular
rows location is generated. For an automated order in files data blocks. It is the most common
classification and indexation system, universal table structure. Classical scalar system defined data
decimal codes table and linguistic thesaurus table are types or user defined data types are possible. To
good candidates for clustered tables. manage new complex data types like text, audio,
Systems performance can be also improved if video, images, spatial data, etc., IBM introduced
supplementary access data structures like indexes are DB2 Extenders. These tables are suitable for storing
used. An index is a tree structure that allows direct publications data.
access to a row in a table. Indexes are built on an Partitioned tables are present also in DB2, but
index key, which can be a single column key or a only hash partitioning method is available. This is a
concatenated column key. An index can be a unique considerable limitation compared with Oracle
index or a non-unique one. partitioning capabilities.
Oracle implements different types of index Partitioned tables are suitable for a distributed
structures [6]: digital library system or in case of interconnected
Normal key B-tree index – is a single column digital libraries for a central metadata repository.
key or a concatenated one with unique or non-unique Indexing capabilities in DB2 are a little bit
values. Index can be created on ascending or reduced than in Oracle. DB2 support following index
descending values of the index key. This is the most structures [7]:
common index structure and every database table Normal key B-tree index – is a single column
could have several indexes created on index keys key or a concatenated one with unique or non-unique
used in search criteria. values. Index can be created on ascending or
Reverse key B-tree index – index key bytes are descending values of the index key. DB2 doesn’t
reversed before the index is build. This structure is have reverse key indexes but it allows reverse scans
suitable for massive parallel data processing because on normal key indexes. This is the most common
it reduces concurrency conflicts. index structure and every database table could have
Bitmap index – the leaf nodes of the index several indexes created on index keys used in search
structure tree contain a bitmap not ROWID-s. Each criteria.
bit in the bitmap corresponds to a table row and if it Clustered indexes are built like index-organized
table structures but they are an additional structure Feature Database Management System
for a data table and columns are duplicated in both Oracle 10i DB2
SQL
Server
the table and the index. They provide fast access to
table data for queries involving exact match and/or Bitmap index Yes Yes No
range search on the index key because table rows are Function-based index Yes Yes Yes
stored in the leaf nodes. Only one clustered index per
Bitmap join index Yes No No
table can be created.
Bitmap index – DB2 supports only dynamic Local partitioned index Yes Yes Yes
bitmap indexes created at run time by taking the Global partitioned
Yes No No
ROWID from existing regular indexes and creating a index
Global non-partitioned
bitmap out of all the ROWID-s either by hashing or index
Yes No No
sorting. For this reason, they do not provide the same
query performance like static bitmap indexes and Member tables store table rows in federated
databases do not receive any of the space savings or database architecture. SQL Server does not support
index-creation time savings compared with static partitioning as generally defined in the database
bitmap indexes. industry. A federation of databases is a group of
Function-based index – the index can be created servers administered independently, but which
based on the expression used to derive the value of cooperate to share the processing load of a system.
the generated column. The data are divided between the different servers
Local index – is a local index for a partitioned and are stored in member tables. Because federation
table which has the same index key as the partition servers do not share the same system catalog, in fact
key. Global indexes are not possible in DB2. If each database server has his own system catalog,
database tables are partitioned their existence is system performance and scalability is very low.
imperative. When a user connects to a federated database he is
connected to one server. If he requests data reside on
a different server, the retrieval takes significantly
3.3 SQL Server longer than retrieving data stored on the local server
SQL Server offers also two structure possibilities and all remote servers has to be consulted. To
for storing data in a database [8], but it is much more improve a little bit this situation SQL Server
restrictive then the other two DBMS systems. introduces distributed partition view concept. A
Heap tables store table rows in no particular distributed partition view joins horizontally
order in files data blocks. It is the most common partitioned data from a set of member tables across
table structure in SQL Server. Data types can be one or more servers, making the data appear as if
classical scalar system defined data types or user from one table. The data can be partitioned between
defined data types. For complex data SQL Server has member tables only on ranges of data values in one
new data types like: TEXT, NTEXT, IMAGE, etc. of the table column.
These tables are suitable for storing publications SQL Server implements much less index
data. structures [8]:
Non-clustered index – it is a normal key B-tree
Table 1: Analized DBMS characteristics index with a single column key or a concatenated
Database Management System
Feature one, with unique or non-unique values. Index can be
SQL
Oracle 10i DB2
Server created on ascending or descending values of the
Heap tables Yes Yes Yes index key. This is the most common index structure
and every database table could have several indexes
Partitioned tables Yes Yes Partial created on index keys used in search criteria.
Hash partitioning Yes Yes No Clustered indexes are built like index-organized
table structures but they are an additional structure
Range partitioning Yes No No
for a data table. They provide fast access to table
List partitioning Yes No No data for queries involving exact match and/or range
Range-hash
Yes No No
search on the primary key because table rows are
partitioning stored in the leaf nodes of the primary key index.
Range-list partitioning Yes No No Only one clustered index per table can be created.
Index-organized tables Yes Partial Partial Partitioned index - it is a local index on a
member table. SQL Server does not support global
Clustered tables Yes No No indexes. If a federation database architecture is used
Normal key B-tree their existence is imperative.
Yes Yes Yes
index
Function-based index – a function is applied to
Reverse key B-tree
Yes No No the index key columns before the index is build.
index
Table 1 presents a synthesis of found
characteristics on analyzed database management even though Jpeg2000 is deemed superior to Jpeg,
systems. few migrate towards it, due to the wide adoption of
the latter.
4 STORING THE CONTENT
4.2 BLOBs and external files
The purpose of a digital library is to provide a We have the choice of storing large objects as
central location for accessing information on a files in the filesystem, as BLOBs (binary large
specific topic. An essential decision that has to be objects) in a database, or as a combination of both.
made in the process of designing a digital library is Only folklore is available regarding the right path to
the choice on how to store the data. take – often the design decision is based on which
technology the designer knows best. Most designers
4.1 File formats used in DL will tell you that a database is probably best for small
In [9] we can find an overview on the main binary objects and that that files are best for large
concepts surrounding file formats in a digital library objects. A good study on the subject can be found in
environment, and the importance of choosing a file [12]. The study indicates that if objects are larger
format that can suit the needs of such a system. than one megabyte on average, NTFS has a clear
In the context of digital libraries, the file format advantage over SQL Server. If the objects are under
is a set of specifications on how to represent 256 kilobytes, the database has a clear advantage.
information on a physical drive or in a database. File Inside this range, it depends on how write intensive
formats are targeted towards specific types of the workload is, and the storage age of a typical
information, as for instance JPEG and TIFF for raster replica in the system. However, using different
images, PDF for document exchange or TXT for DBMS or file systems can change the results.
plain text. Filesystems and databases take different
A number of factors have to be taken into approaches to modifying an existing object.
account before venturing to choose one format or Filesystems are optimized for appending or
another. A few formats have gained a more truncating a file. In-place file updates are efficient,
considerable share of use due to certain advantages, but when data are inserted or deleted in the middle of
also with this widespread use being an advantage in a file, all contents after the modification must be
itself. However, all formats must be taken into completely rewritten. Some databases completely
account, also bearing in mind that acquisition and rewrite modified BLOBS; this rewrite is transparent
storage can be done in a different format than the to the application. To ameliorate the fact that the
distribution. database poorly handles large fragmented objects,
A series of criteria must be studied and the application could do its own de-fragmentation or
correlated with the individual needs of the client. It is garbage collection
also important to keep in mind future requirements Applications that store large objects in the
and prospects of expansion, so as to avoid the need filesystem encounter the question of how to keep the
for migration. database object metadata and the filesystem object
Migration is the transferring of data to newer data synchronized. A common problem is the
system environments ([10], [11]). This may include garbage collection of files that have been “deleted”
conversion of resources from one file format to in the database but not the filesystem. Operational
another (e.g., conversion of Microsoft Word to PDF issues such as replication, backup, disaster recovery,
or OpenDocument), or from one operating system to and fragmentation must be also considered.
another (e.g., Windows to Linux), so the resource Storing BLOB data in the database offers a
remains fully accessible and functional. number of advantages such as offering an easier way
Migration can be necessary as formats become to keep the BLOB data synchronized with the
obsolete, or as files need to be transferred on another remaining items in the row. BLOB data is backed up
system. Resources that migrate run the risk of losing with the database. Having a single storage system
some of their functionality, since newer formats can ease administration. Full Text Search (FTS)
might be incapable of rendering all of it from the operations can be performed against columns that
original format, or, more so, the converter itself may contain fixed or variable-length character data or
be unable to interpret the original format in its against formatted text-based data, for example
entirety. Conversion is often a concern with Microsoft Word or Microsoft Excel documents.
proprietary data formats. Therefore, migration is an A well thought out metadata strategy can remove
undesirable process, and a good choice of file the need for resources such as images, movies, and
formats can reduce the risk of ending up in the need even text documents to be stored in the database. The
of migrating data. associated metadata could be indexed and include
Generalised use of a specific format can be an pointers to resources stored on the file system.
argument in favour of migrating data to that format,
or against migrating data away from it. For example,
5 CONCLUSIONS Romanian National Council of Academic Research
(CNCSIS) through the grant CNCSIS no.
Computers era brought radical changes in our 12099/2008-2011.
life. The classical management methods used in a
public library had to be changed and improved so 6 REFERENCES
they can benefit from the new technologies.
Classification and indexation process has to be [1] Universal Decimal Classification Handbook,
tailored to be suitable for a computer aid. New Central Library of the “Lucian Blaga” University
methods are proposed but today’s public libraries, of Sibiu, (1995).
Romanian or world around, do not have their data in [2] Library of Congress Subject Headings, Library
a uniform format so it is a very difficult task to make of Congress, USA, (1999).
information exchange between them work properly. [3] RAMEAU-Repertoire d 'Autorite - Matiere
If data will be put in a standard format, this will Encyclopedique et Alfabetique Unifie, France
make possible a distributed search in all connected National Library, (2002).
libraries. With a computer aid it will be possible to [4] UNIMARC Handbook: Bibliographic format.
automate the classification and indexation process Concise version, France National Library,
and to perform semantic searches in different (1994).
interconnected digital libraries. Libraries data, [5] UNIMARC Handbook: Authorities lists format,
classification data, indexation data have to be stored Manual UNIMARC : Format des notices
in a database for future processing, so it is very d'Autorite, France National Library, 2004.
important to fully understand their characteristics in [6] Oracle 10i Technical Report. www.oracle.com
order to make the best selection. Our goal was to [7] DB2 UDB Technical Report. www.ibm.com
analyze different classification and indexation [8] SQL Server Technical Report
methods used today in public libraries and to identify www.microsoft.com
the best suitable method for a computer automated [9] D. Volovici, A.G. Pitic, A. C. Mitea, A.E. Pitic:
system. We also evaluated storage and index An analysis of file formats used in digital
characteristics of three of the most important today’s libraries, First International Conference on
database management systems in order to support an Information Literacy in Romania, Sibiu, 2010
automated classifying and indexing library system [10] J. Garrett, D. Waters, et all: Preserving digital
and distributed semantic searches among information: Report of the task force on
interconnected libraries. The good and the bad archiving of digital information, Commission on
choices were revealed for each particular data Preservation and Access and the Research
structure and access method. This study can be Libraries Group, 1996
useful, too, for other software applications [11] H. M. Gladney: Principles for digital
developers who had to make the best DBMS preservation, Communications of the ACM 49,
selection for their future software system. 2006
The choice between a DBMS and a filesystem [12] R. Sears, C. van Ingen, J. Gray: To BLOB or
for storing usual DL data is considered also. Not To BLOB: Large Object Storage in a
Database or a Filesystem? , Technical Report,
MSR-TR-2006-45, 2006
ACKNOWLEDGEMENT
This work was partially supported by the

DIGITAL LIBRARIES - GOOD OR BAD CHOICES ON ORGANIZING INFORMATION - Ubiquitous Computing and Communication Journal

Uploaded by

Copyright:

Available Formats

DIGITAL LIBRARIES - GOOD OR BAD CHOICES ON ORGANIZING INFORMATION - Ubiquitous Computing and Communication Journal

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DIGITAL LIBRARIES - GOOD OR BAD CHOICES ON ORGANIZING INFORMATION - Ubiquitous Computing and Communication Journal

Uploaded by

Copyright:

Available Formats

DIGITAL LIBRARIES – GOOD OR BAD CHOICES ON ORGANIZING

Adi-Cristina Mitea, Daniel Volovici, Antoniu Pitic

Keywords: digital library, classification, indexation, storage and index structures.

1 INTRODUCTION method for classifying and indexing documents from

Indexation has three phases: 2.3 UNIMARC Method

This work was partially supported by the

You might also like