Golden Rules for Repository Managers
Why the Golden Rules?
We are indexing the metadata of all kinds of academically relevant resources – such as journals, institutional repositories, digital collections etc. – which provide an OAI interface and use OAI-PMH for providing their contents (more information on OAI can be found on the pages of the Open Archives Initiative or in Wikipedia). The indexed data is stored on servers at the University of Bielefeld.
These "Golden Rules" help you optimize the delivery of your data via your OAI interface. If you adhere to these rules, a problem-free and fast indexing in BASE is guaranteed. Documents from your source are optimally and completely presented in BASE and of course all other services that index data via your OAI interface also benefit from this.
You can check some of the points listed here with our OAI-PMH Validator OVAL.
If your source does not have its own OAI interface
If your source does not have an OAI interface, direct indexing of your source is currently not possible. In this case, upload your documents in aggregators (for example DataCite or Zenodo) or in specialized repositories already indexed in BASE (see our list of content providers) or register your Open Access journal in the DOAJ. We index these content providers regularly. If your documents are contained in such an aggregator, they can usually already be found in BASE. A separate registration of your source in BASE is then not necessary.
The best way for your documents to be indexed by us and found in BASE, however, is to operate your own OAI interface. Only then, for example, can the name of your source appear under "Content Provicer" in the results list with supplementary information and your source also appears directly as an independent entry in the list of content providers.
We also index open access content from academic publishers that is made available via the Crossref publishing platform. In this case, please let us know the name of your source and your Crossref ID via the contact form. We will then check whether indexing of your source is possible.
General OAI Interface
-
Functioning OAI interface
Your OAI interface is freely accessible, stable and constantly responding. The request for ListRecords in oai_dc format returns results without a timeout or output error. You should check the functionality of your OAI interface at regular intervals, for example using a browser. If your OAI interface does not function correctly, it is not possible to index your source. The format oai_dc is also mandatory.
-
Records per page
For each ListRecords response of your OAI interface, you should provide 50 to 1000 records per page. The so-called resumptionToken at the end of an OAI-PMH response file works and delivers the next 50 to 1000 records.
If less than 50 records per page are delivered, this will result in many individual calls when we harvest your source. More than 1000 records per page, on the other hand, make the delivered files relatively large and increase the risk of aborts when harvesting the records. If the resumptionToken does not work, complete indexing is not possible.
-
Contact Persons
In the Identify data of your OAI interface, an e-mail address is specified in the adminEmail field, which can be used to contact the technical operator of the OAI interface. An e-mail address is available on the homepage of your source, which guarantees direct contact with the operator of the content provider.
Only if the e-mail addresses work and e-mails are read and answered can we contact you in the event of problems or questions.
Changes / Deletions / Updates
-
Identification of changes to metadata of individual records
Each subsequent change to a record must be marked in your OAI interface by delivering changed records during incremental harvesting. Generally, only records that have been newly created, modified, or deleted should be delivered during incremental harvesting.
All indexed data providers are regularly incrementally updated in BASE. This means, we check your OAI interface for any updates since the last indexing run via a date query (from). If newly created, changed or deleted records are not delivered correctly, an update in the BASE index is not possible and the document is e.g. not indexed, it remains unchanged and thus incorrect in the index or it is not deleted from the index.
If, on the other hand, all records are always redelivered by your OAI interface during incremental harvesting - i.e. also data records to which no change has been made - the indexing time is considerably extended as a result. Your source may then only be updated by us at larger intervals.
-
Deletion of records
If a document is deleted from your source, the record must be marked as deleted in the OAI interface and must be delivered during incremental harvesting. Under no circumstances may the record be completely removed from the OAI interface.
If a document is not delivered and marked as deleted during incremental harvesting, a deletion of the record in the BASE index is not possible and the document remains incorrect in the index. In this case, it can only be deleted by re-harvesting and re-indexing your source completely. This can only be done in larger intervals.
-
Information about fundamental changes
If the name of the content provider or the URL of the OAI interface should change (for example due to moving to another system), please let us know via our contact form. If necessary, let us know the old and new URL of the content provider and, if possible, the collection name of your source in BASE (you can find this in our list of content providers by clicking on the number in the Documents column of the content provider).
We check all content providers at irregular intervals and correct information (name, system, URL) if necessary. If you actively inform us about changes, make sure that your source is always fully and correctly captured and indexed by BASE. This information is then shared with the worldwide community via our OAI-PMH blog.
Contents / Metadata
-
Character Encoding
All content in your OAI interface (titles, author names, abstracts) is encoded in UTF-8. Other encodings or duplicate encodings cause errors in the display of hits from your source.
-
Separation of Multiple Entries in a Metadata Field
If you provide several entries in a metadata field (for example, the name of the author and his ORCID iD), separate them with spaces, semicolons, and spaces. This separation enables us to index the information separately and make it searchable.
-
Completeness of Metadata
BASE harvested the metadata of your source in the standard format oai_dc. Each record of your OAI interface should have metadata for a document that is as complete as possible and use standardized vocabularies. The specification of a functioning URL in <dc:identifier> is mandatory.
The more complete metadata you provide, the easier it will be to find documents from your source in BASE. Standardized vocabularies help us to assign documents from your source to the correct document type, for example, or to the right of subsequent use. Documents that do not have a URL in the identifier are not indexed.
-
Notes on Individual Metadata Fields
Information Element in oai_dc Criteria URL of the publication <dc:identifier> Must be Title <dc:title> Should be Author <dc:creator> Should be Publication type <dc:type> Should be Publication date <dc:date> Should be Language of the document <dc:language> Should be Access and re-use rights <dc:rights> Should be Reference / Citation <dc:source> Should be Other parties involved in the publication <dc:contributor> Can be File format <dc:format> Can be Description <dc:description> Can be Keywords <dc:subject> Can be Publisher <dc:publisher> Can be Related Documents <dc:relation> Can be Content delimitation <dc:coverage> Can be -
URL of the publication <dc:identifier>
Each record contains a working URL in the field <dc:identifier> (starting with http:, https:, doi: or urn:nbn:de:). This leads, if possible, to the front door of the document (info page with bibliographic information and link to the full text) or directly to the Open Access full text in PDF format. If a record has several <dc:identifier> or if the full text is not offered in a common file format (HTML, PDF) or if it is not "Open Access", the first identifier should always lead to the front door of the document.
Provide persistent identifiers (DOI, handle, URN) that will continue to function even if the server is relocated and the URL is changed. Make sure that the DOIs etc. are registered and working with the appropriate registration agency. Especially for DSpace installations the handle has to be configured, otherwise it will lead to a "dummy-URL" (handle.net/123456789), which generates an error message (see www.handle.net/hnr_documentation.html).
Only documents are indexed whose identifiers begin with http:, https:, doi: or urn:nbn:de: and do not lead to a "dummy-URL" (123456789). If a DOI etc. is not registered, the document is indexed, but the link in the BASE hit list leads to an error message. Content providers where most of the links do not work may be removed from the index.
Code Examples
- <dc:identifier>https://pub.uni-bielefeld.de/record/2710028</dc:identifier>
- <dc:identifier>http://hdl.handle.net/10760/12746</dc:identifier>
- <dc:identifier>https://doi.org/10.1108/07378830610715473</dc:identifier>
- <dc:identifier>doi:10.1108/07378830610715473</dc:identifier>
- <dc:identifier>https://nbn-resolving.de/urn:nbn:de:0070-pub-27663089</dc:identifier>
- <dc:identifier>urn:nbn:de:0070-pub-27663089</dc:identifier>
Note on ISBN, ISSN etc.
In the current DINI certificate 2019 (in German only) it is recommended to add information such as ISBN or ISSN in the field <dc:identifier>. However, currently only URLs are indexed in BASE in the field <dc:identifier>. Other specifications without URLs are not indexed and therefore cannot be found if these specifications are only made in the field <dc:identifier>. If you want to work both DINI-compliant and BASE-optimized, insert specifications such as ISBN in both <dc:identifier> and <dc:source>.
-
Title <dc:title>
Enter the title in the <dc:title> field as in the original. If the publication has several titles (for example in different languages), repeat the field.
Code Example
- <dc:title>Advanced calculus: student handbook</dc:title>
-
Author <dc:creator>
Indicate in <dc:creator> those persons or institutions who are the author of the publication. Specify author names according to the pattern Last Name, First Name. Specify the ORCID iD as part of the author name.
Encourage the dissemination of ORCID iDs (and other person identifiers, if applicable) to make authors uniquely identifiable (even if they have the same name). Encourage authors who publish in your source to register with ORCID to get an ORCID iD and add the ORCID iDs in the metadata directly to the author. Specify the ORCID iD separated by spaces, semicolon, spaces from author and insert before the number "orcid:" or the full URL of the ID. If an ORCID iD exists, authors can also be found using the ORCID iD when searching in BASE.
Code Examples
- <dc:creator>Smit, J.H. (John) de</dc:creator>
- <dc:creator>Utrecht University. Department of Computer Sciences</dc:creator>
- <dc:creator>Summann, Friedrich ; orcid:0000-0002-6297-3348</dc:creator>
- <dc:creator>Summann, Friedrich ; https://orcid.org/0000-0002-6297-3348</dc:creator>
-
Publication type <dc:type>
In the <dc:type> field, enter the publication type of the document (for example article, chapter). If possible, use a standardized vocabulary, for example the info:eu-repo Vocabulary for Publication Types or the COAR Resource Type Vocabulary. The designations used by your source must be known to BASE so that we can correctly assign your documents to our document types.
Code Examples
- <dc:type>info:eu-repo/semantics/article</dc:type>
- <dc:type>journal article</dc:type>
- <dc:type>http://purl.org/coar/resource_type/c_6501</dc:type>
-
Publication date <dc:date>
Each record should contain in the <dc:date> field the publication year or date of the document in ISO 8601 format (according to the Gregorian calendar). Otherwise the restriction / sorting according to years of publication in BASE does not work correctly for your source.
The field <dc:date> should only be filled once. If there is no concrete publication date, estimate. Inaccurate data such as 17th century should be given as 1650.
Code Examples
- <dc:date>2000-12-25</dc:date>
- <dc:date>1978-02</dc:date>
- <dc:date>1650</dc:date>
-
Language of the document <dc:language>
You provide information on the language of a document according to ISO 639 (2- or 3-letter code) in the <dc:language> field. Otherwise, language information is not output in BASE for documents from your source or is output incorrectly, and the restriction to one language does not work correctly for your source.
Code Examples
- <dc:language>eng</dc:language>
- <dc:language>deu</dc:language>
- <dc:language>en</dc:language>
- <dc:language>de</dc:language>
- <dc:language>nld/dut</dc:language>
-
Access and re-use rights <dc:rights>
-
Access rights (Access status)
The <dc:rights> field contains access information to the full text after the info-eu-repo-Access-Rights vocabulary or the COAR-Access-Rights vocabulary. The alternative is: Open Access documents are available in their own OAI set. The name of this set is contained in the setSpec field for each record. Name the set as uniquely as possible, for example open access.
For our users, information about access to a document in the hit list is of particular importance. If this information is not or only insufficiently available, information on access to documents from your source is output incompletely, not at all or incorrectly, and the restriction to certain types of access does not work correctly for your source.
Code Examples
- <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
- <dc:rights>closed access</dc:rights>
- <dc:rights>http://purl.org/coar/access_right/c_abf2</dc:rights>
-
Subsequent use rights (licenses)
Offer your authors the opportunity to place documents under a license. Use licenses that are as widely used as possible, such as Creative Commons licenses. Enter the corresponding license in your OAI interface in another <dc:rights> field.
If this information is not or only insufficiently available, information on the re-use of documents from your source is output incompletely, not at all or incorrectly and the restriction to re-use options does not work correctly for your source.
Code Examples
- <dc:rights>http://creativecommons.org/licenses/by-sa/2.0/uk/</dc:rights>
- <dc:rights>https://creativecommons.org/licenses/by/4.0/</dc:rights>
-
-
Reference / Citation <dc:source>
Information about the source or the citation (for example for articles the name, volume, issue of the journal) can be found in <dc:source>. Pay particular attention to the ISSN of the journal containing the ISSN. This information allows users to better find your documents in BASE.
Code Examples
- <dc:source>Ecology Letters (1461023X) vol.4 (2001)</dc:source>
- <dc:source>ISSN: 0928-0987</dc:source>
- <dc:source>Pieper D, Summann F.: Bielefeld Academic Search Engine (BASE). An end-user oriented institutional repository search service. Library Hi Tech. 2006; 24(4):614–619. ISSN 0737-8831.</dc:source>
Note on ISBN, ISSN etc.
In the current DINI certificate 2019 (in German only) it is recommended to add information such as ISBN or ISSN in the field <dc:identifier>. However, currently only URLs are indexed in BASE in the field <dc:identifier>. Other specifications without URLs are not indexed and therefore cannot be found if these specifications are only made in the field <dc:identifier>. If you want to work both DINI-compliant and BASE-optimized, insert specifications such as ISBN in both <dc:identifier> and <dc:source>.
-
Other parties involved in the publication <dc:contributor>
Indicate persons and institutions who have contributed to a publication without being an author (for example editor, reviewer) in <dc:contributor>. The recommendations given in the Author <dc:creator> section apply.
-
File format <dc:format>
In <dc:format> you should specify the file format of the publication. It is best to use the Internet Media Types (MIME types) used by IANA for this purpose. The complete list can be found at: http://www.iana.org/assignments/media-types.
Code Examples
- <dc:format>video/quicktime</dc:format>
- <dc:format>application/pdf</dc:format>
-
Description <dc:description>
Use <dc:description> to describe the content of the publication (abstract).
-
Keywords <dc:subject>
In the field <subject> both keywords and notations of classifications can be specified. If notations are used, the corresponding classification scheme should also be indicated (preferably as URI). The content should also be provided as human-readable text, preferably in English, in another <dc:subject> field.
Code Examples
- <dc:subject>info:eu-repo/classification/ddc/641</dc:subject>
- <dc:subject>Anatomy</dc:subject>
If no specific vocabulary is to be used, we recommend the general Dewey Decimal Classification (DDC): https://www.oclc.org/en/dewey/resources.html.
-
Publisher <dc:publisher>
The field <dc:publisher> indicates the publisher of the publication, which may be either an institution or a natural person. For university theses, the name of the university should be entered in this field. If there is a hierarchically structured organization, the different hierarchical levels should be separated from each other by points.
Code Examples
- <dc:publisher>Peter Langford</dc:publisher>
- <dc:publisher>Springer Fachmedien</dc:publisher>
- <dc:publisher>Loughborough University. Department of Computer Science</dc:publisher>
-
Related documents <dc:relation>
Related/referenced publications are indicated in the field <dc:relation>.
Code Example
- <dc:relation>http://hdl.handle.net/10</dc:relation>
-
Content delimitation <dc:coverage>
The field <dc:coverage> is used to describe the spatial and temporal limitation of the subject of the publication. This includes location information, geocoordinates, time information or the indication of a jurisdiction.
Code Examples
- <dc:coverage>Netherlands</dc:coverage>
- <dc:coverage>name=Western Australia; northlimit=-13.5; southlimit=-35.5; westlimit=112.5; eastlimit=129</dc:coverage>
- <dc:coverage>1800-1850</dc:coverage>
- <dc:coverage>52.031629, 8.541202</dc:coverage>
-
Beyond BASE and OAI ...
-
Web address of the repository
If possible, offer the start page under your own subdomain (without port and subdirectory). If the start page of the repository is accessible via a port (for example repository.domain.com:8080) or a subdirectory (repository.domain.com/xmlui), create a redirectory from the subdomain (repository.domain.com).
Every change at the port or a subdirectory leads to the fact that the links to the content provider, for example in the BASE hit list under "Content provider" or from our list of content providers, no longer work.
-
Use generic names for subdomains
Avoid version numbers in the subdomain or directory names (for example ojs3.domain.com or ojs.domain.com/ojs-3/)
Every software update can cause the URL to change or your URL to contain an incorrect version number, for example if you have updated a software from version 2 to version 3, but the URL is still ojs2.domain.com. As mentioned in the previous point, any change to the URL will cause links to the content provider to stop working.
-
If the URL (domain / subdomain) is changed, set redirects.
If there is a change in the Internet address of your source (even if it is only a character), please set a forwarding from the old address to the new one. Also make sure that the OAI interface can still be reached by setting a forwarding.
Missing forwarding, links lead into emptiness. We check the addresses of the indexed content providers regularly. If we come across faulty links and there is no forwarding, we will carry out a brief search for the new address of the content provider in individual cases. If this is unsuccessful, the content provider will be deleted from the index. Other search engines also delete sources that are no longer accessible from their index.
-
Title of the repository / journal in plain text
The name of the repository or the title of a journal should always be found in the source code of your website at a place in plain text, either in <title>, the heading (<h1>) or as an alternative text of a logo.
If the title is missing in plain text, a correct input of the name in our database is cumbersome. In addition, a missing name as plain text leads to the fact that your source can't by found at all or only insufficiently by search engines such as Google, when you search for it's name.
-
Offer start page also in English
Offer at least the start page of your repository in English. BASE has a global user community. With its website in English, you can give an international audience uncomplicated access to your documents.
If the start page is not available in German or English, we are usually dependent on automatic translation services to inform us about the content of your source. An English-language start page also leads to better findability of your source both in our list of content providers and in other general search engines.
-
"Contact" link with active e-mail address
An item "Contact" is linked from the start page of your source. There the functioning e-mail address of the webmaster / responsible person is mentioned. E-mails sent to these addresses are regularly read and answered by those responsible.
If a contact link is missing or e-mails are not read, it is hardly possible to contact you if there are problems for example concerning indexing your source or if queries arise. This may result in your source not being indexed.
-
Announcement of your source / indexing in search engines
Register your source in OAI directories (for example OpenDOAR, ROAR, re3data or Open Archives) and update your information in the directories if changes are made. In this way you make your source known worldwide and enable other search engines to index documents from your source.
Use a "search engine friendly" folder structure. Offer for example a sitemap, by which all documents (Frontdoor / PDFs) are directly attainable and announce this sitemap in search machines like Google over appropriate registration tools. Use search engine friendly metatags (for example Google Scholar metatags).
Good findability of your source in general and academic search engines will make it easier, that documents from your source are more frequently accessed and used. If we do not yet know your source, we may also find your source during a search in OAI directories or search engines. Your sources will then - if technically possible - be actively entered into our database and indexed.