Wikidata:Property proposal/BIBFRAME Hub ID

From Wikidata
Jump to navigation Jump to search

BIBFRAME Hub ID

[edit]

Originally proposed at Wikidata:Property proposal/Creative work

   Done: BIBFRAME Hub ID (P11859) (Talk and documentation)
Descriptionidentifier for a BIBFRAME Hub description from the Library of Congress Catalog, primarily generated by converting the Library's MARC Bibliographic records and MARC title authority records to BIBFRAME Hub descriptions
RepresentsBIBFRAME Hub (Q107364908)
Data typeExternal identifier
Domainitem; work (Q386724), version, edition or translation (Q3331189)
Allowed values[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12}
Example 1Bound East for Cardiff (Q110985024)96d00ae2-c605-6fd0-a64b-dc950f0648fe
Example 2Cat on a Hot Tin Roof (Q713979)a95c89a5-9e56-a5ab-8bf4-7320ad676a9d
Example 3Maus (Q59696)6ffd5037-87d5-2593-97c6-d9847a00b547
Example 4Lucy in the Sky with Diamonds (Q723178)f1588874-1ce4-5f3e-4ce2-7ceec4885588
Example 5Mona Lisa (Q12418)f63713db-caa9-18d4-d027-609a420acd40
Example 6Journal of the Burma Research Society (Q12900604)19070ce2-4f1d-094f-2ce0-68cedd44ab6d
Example 7Endangered Species Act of 1973 (Q2743374)42c198d7-532d-ee6d-e897-5677afd6c13c
Example 8Blade Runner (The Final Cut) (Q113799157)5828c6d9-8c99-651e-2945-8dfe88f84595
Sourcehttp://id.loc.gov/resources/hubs
Planned useadding to items being edited or to new items being created
Expected completenessalways incomplete (Q21873886)
Formatter URLhttps://id.loc.gov/resources/hubs/$1.html
See alsoLibrary of Congress authority ID (P244)
Applicable "stated in"-valueLibrary of Congress Hubs (Q113798862)

Motivation

[edit]

The Library of Congress Hub descriptions from the Library of Congress Catalog were primarily generated by converting the Library's MARC Bibliographic records and MARC title authority records to BIBFRAME Hub descriptions. Hubs may link to other Hubs or Bibframe Works. Bibframe Hubs are abstract resources that function as bridges between two Bibframe Works. UWashPrincipalCataloger (talk) 01:20, 7 September 2022 (UTC)[reply]

Discussion

[edit]
  •  Support --Emwille (talk) 02:28, 7 September 2022 (UTC)[reply]
  •  Support --Jimfhahn (talk) 12:08, 7 September 2022 (UTC)[reply]
    Is this still being considered? How do we follow up now? Jimfhahn (talk) 19:26, 10 April 2023 (UTC)[reply]
  •  Support --Mvajen (talk) 08:50, 7 September 2022 (UTC)[reply]
  •  Support --Metadataguy (talk) 22:37, 7 September 2022 (UTC)[reply]
  •  Question -- I have reservations about the value these entities and identifiers would add to Wikidata. Definition and implementation of the BIBFRAME Hub at the Library of Congress website is not well-developed at this stage, and the relationship or "bridge" between works accumulated by any given Hub are not readily available. Does Wikidata have statements to be made or items to be created to correspond to specific BIBFRAME Hubs? Or, would Hubs quickly become confused with Works (BIBFRAME Works and Works from other schemas), and Hub ID's erroneously attached to items which do not represent the same "things"? --Crystal Clements, University of Washington Libraries (talk) 18:57, 13 September 2022 (UTC)[reply]
  •  Support We can use it to extract creative works, dates and authors/constibutors. In response to @Clements.UWLib:, is there any better RDF representation of "Lucy in the sky with diamonds" than this below? --Vladimir Alexiev (talk) 12:03, 14 September 2022 (UTC)[reply]
  • @Vladimir Alexiev: I think Wikidata's representation of this song is better, because the classes used in P31 are well-established and defined. While "Lucy in the sky with diamonds" includes relatively detailed description, what would Wikidata do with the BIBFRAME Hub identifier for this Hub for War and Peace, for instance? This RDF is more representative of the type of data included in bf:Hubs. They are being extracted from name/title authority data (which is generally very well-formed and defined) and MARC bibliographic records (which are inconsistent, often contradictory, and frequently incomplete. Most of my job is creating and editing these bibliographic records, so I say this with respect.) The authority records in the Library of Congress Name Authority File are much more detailed, serve a similar purpose to Wikidata's items for creative works, and are based on a well-defined conceptual model. BIBFRAME Hubs are very experimental, and my understanding of the class is that they can serve as bridges between any number of works to describe any number of relationships. For this reason, I believe that extracting creative works, dates, and authors/contributors from Hub data is not a good idea, and Wikidata ought to favor Name/title authority records from the Library of Congress Name Authority File instead. At least until we can see a more consistent pattern of RDF coming out of Hubs. Hubs themselves are extracted from name/title authority data and uncontrolled access points from MARC bibliographic records. --Crystal Clements, University of Washington Libraries (talk) 14:38, 14 September 2022 (UTC)[reply]
    @Clements.UWLib:
    "Lucy is better in WD": but I believe WD has a lot fewer Works than LoC in particular and libraries in general.
    What is it you dislike about the War and Peace cluster? From a brief examination:
    • It has a hundred expressions (eg audio-book, various editions, adaptations...). I don't know that WD tracks expressions actively, but it has some linkages to OpenLibrary, and that mostly tracks expressions.
    • And two books about it: this is valuable info for WD to set "main subject".
    Replies:
    • "Hubs can serve as bridges between any number of works to describe any number of relationships": give us evidence please.
    • "authority records in the Library of Congress [are better]": please give us links to records for Lucy in the Sky and War and Peace that have similar data to what we find in those two hubs.
    • "ought to favor Name/title authority records from the Library of Congress Name Authority File": WD already links to LCNAF, so the data of John Lennon should be taken from there. But the Hub states he's the main author of Lucy in the Sky, which is not found in LCNAF
    • "BIBFRAME Hubs are very experimental": that is ok, WD takes experimental. Accepting this property doesn't mean people will start pumping data right away: and then there are always patrollers to stop low-quality data imports
    • "until we can see a more consistent pattern of RDF coming out of Hubs": what is it you find inconsistent?
    I don't doubt your library experience, but for us to discard this dataset, you need to give us concrete examples of crap, and indicate that the crap is a significant percentage of all data.
    • We all know that VIAF has some percentage of crap, and a lot of the WorldCat Identifier pages based on VIAF have conflations... But still it provides a huge amount of good data, so we use it
    Cheers! Vladimir Alexiev (talk) 16:00, 15 September 2022 (UTC)[reply]
  • "What is it you dislike about the War & Peace Cluster?" What I dislike about the War and Peace cluster is that the "expressions" are not LRM/RDA (Library Reference Model/Resource Description & Access) Expressions, but BIBFRAME Works (which can be LRM Works or LRM Expressions), and assertions that a bf:Hub is expressed by another bf:Work are frequently inaccurate in this dataset. The first listed "expression" of this bf:Hub is a bf:Work representing an audio book. The audio book is recorded here as an expression of [this bf:Hub https://id.loc.gov/resources/hubs/969cc1d9-64a0-75c2-78aa-4cf93a86f6cd.html], which identifies both this audio book and a duplicate bf:Work for another LRM:Manifestation of the same LRM:Expression as values of "has Expression". Each of these ought to share the same bf:Work identifier and link out to separate Instances. But, because this data has been created in haste, they are falsely represented as separate bf:Works/RDA/LRM Expressions, and artificially drawn together as expressions of a bf:Hub, which is not conceptually well-defined anywhere. How can one verify that something is an expression of something which lacks a definition? The relationship between this Hub and the children under "Expression" are not "has expression" in reality. This example, which I found with one click, is not ready for use, and ingesting BIBFRAME Works and Hubs from id.loc.gov is going to result in false statements in Wikidata.
  • "Give us evidence please": this is the definition of a bf:Hub.
  • "please give us links to records for Lucy in the Sky and War and Peace that have similar data to what we find in those two hubs": Authority records for Lucy in the Sky and War and Peace do not have the same _amount_ of data as the Hub, but the data they do have is better defined, and has been created by a human being who is trained to conform to an internationally agreed-upon set of standards. More is not always better. In this case, it is my opinion that the larger amount of data is worse because it can result in false statements.
  • WD does have a lot fewer works than LOC and Libraries in general. The Library of Congress Name Authority File is a fabulous source of data about works and should absolutely be used in Wikidata.
  • "WD already links to LCNAF, so the data of John Lennon should be taken from there. But the Hub states he's the main author of Lucy in the Sky, which is not found in LCNAF" : The access point in the LCNAF begins with the authorized access point for John Lennon, as follows: "Lennon, John, 1940-1980. Lucy in the sky with diamonds". Additionally, it is asserted to be the same as VIAF 264196672, which does state the author as John Lennon in RDF.
  • "Accepting this property doesn't mean people will start pumping data right away: and then there are always patrollers to stop low-quality data imports": This point is well taken. Patrollers are going to be very busy once low-quality data starts rolling in. Does low-quality data get cleaned up when the quality is very low, such as the data from ORCiD that only includes an ORCiD and a label?
  • "what is it you find inconsistent?" Some data, such as the record for Lucy in the Sky, have been edited and fixed up by highly trained catalogers (Lucy in the Sky was published in 2012 by LC, and was enhanced in 2015 by someone from the University of Washington (I suspect this was User:UWashPrincipalCataloger)). A lot of data looks like this, which is classed as a Hub, which is supposed to be a bridge between two or more works, but is only related in any way to one Work. The purpose of this Hub being defined as such rather than as a bf:Work is a mystery, as it is not in fact a bridge between two bf:Works...most things that are works with a work-expression relationship to a single expression (bf:work) are classed as bf:Works. So, both the quality of metadata and the application of the rdf:Class bf:Hub vs. bf:Work are inconsistent.
  • "for us to discard this dataset, you need to give us concrete examples of crap, and indicate that the crap is a significant percentage of all data.": This data set is immense. I provided a few examples above, but don't have time to comb through and empirically prove that the crap is a significant percentage of all the data, but I strongly suspect this to be the case and wanted to caution the Wikidata community before they start to try to import it without lots of critical eyes combing through it.
  • "We all know that VIAF has some percentage of crap, and a lot of the WorldCat Identifier pages based on VIAF have conflations... But still it provides a huge amount of good data, so we use it": I agree that VIAF has conflations and many inaccuracies. However, they (1) fix these when they are pointed out to them and (2) serve as a mostly-accurate identifier Hub for the semantic web (as does Wikidata). BIBFRAME data on id.loc.gov is not set up to do either of these things very well, so the cost-benefit, to me, is different. On an unrelated note, I would be interested to find out why VIAF has not yet integrated bf:Hub data.
  • "And two books about it: this is valuable info for WD to set "main subject"." I think the presence of only two books about War & Peace is a great example of how inconsistent this dataset is, and begs the question, "What is a bf:Hub, anyway?" The Library of Congress has 119 locally-cataloged books in its catalog with the subject "Tolstoy, Leo, ǂc graf, ǂd 1828-1910. ǂt Voĭna i mir" (the title name authority for the creative work War & Peace) according to their MARC data in OCLC. Why does this Hub only have two? Where did this data come from?
  • "Lennon, John, 1940-1980": ah, that's a string not thing. I don't think we have these labels in WD with any distinguished status, so it won't be easy to match by them
  • http://viaf.org/viaf/264196672 has these problems:
    • AFAIK, very few VIAF Works have been ingested to WD, yet.
    • lacks the year of creation
    • the RDF doesn't link to McCartney
    • schema:author points to Lennon's VIAF record, but schema:creator is crap (another Lennon that's local to the song):
<http://viaf.org/viaf/264196672>
        rdf:type              schema:CreativeWork ;
        schema:alternateName  "Lucy in the sky with diamonds" ;
        schema:author         <http://viaf.org/viaf/196844> ;
        schema:creator        <http://viaf.org/viaf/264196672/#Agent/lennon_john> ;

<http://viaf.org/viaf/sourceID/LC%7Cn+2012061856#skos:Concept>
        rdf:type        skos:Concept ;
        skos:altLabel   "McCartney, Paul. Lucy in the sky with diamonds" ;
        skos:prefLabel  "Lennon, John, 1940-1980. | Lucy in the sky with diamonds" ;
        foaf:focus      <http://viaf.org/viaf/264196672> .
# MARC 5xx's: Related Names includes a statement from DNB: 
# "500 1 _ ‎‡a  McCartney, Paul‏ ‎‡d  1942-‏ ‎‡4  koma‏ ‎‡4  https://d-nb.info/standards/elementset/gnd#composer‏ ‎‡e  Komponist‏"
  • "Data from ORCiD that only includes an ORCiD and a label?" If the name is unique enough, that still can be matched to a researcher on WD and is useful. Or do you mean some large dump of WD items with just these two fields? Link?
  • "singleton hubs": VIAF also includes singleton clusters and if these represent "local heroes" (people of local not world fame), that's still useful
  • https://id.loc.gov/resources/hubs/76f63d7d-ee5f-6969-d750-dfbfd86370c5.html "Visions of Europe (2014)": ok, that's crap
  • BTW I noticed that Bibframe Hubs are closed for crawling, eg googling "site:id.loc.gov/resources/hubs/ war" finds nothing. In contrast "site:viaf.org war" finds 30k records. Is this because of the low quality you describe?
  • "VIAF fix problems when they are pointed out to them": hmmm, I'm not so sure about that. There are a lot of errors reported on https://en.wikipedia.org/wiki/Wikipedia:VIAF/errors and https://www.wikidata.org/wiki/Wikidata:WikiProject_Authority_control/WorldCat_Identities_errors, and we have zero visibility as to whether and when VIAF fixes them. And I'm yet to know anyone having successful communication with them at bibchange@oclc.org


  • To summarize: I want to know more about the BIBFRAME Hub as an entity than "it's an abstract bridge between bf:Works" and the method of programmatically generating Hubs at id.loc.gov than "Generation Process: DLC marc2bibframe2 v1.8.0-SNAPSHOT-MTA; Status: prepublication; Encoding Level: minimal" before it starts getting ingested into Wikidata and bloating items about creative works with data that has demonstrated very little evidence of quality control. I'm not supportive or opposed to this property being created, but looking at bf:Hub data on id.loc.gov has had me worried about bf:Hubs showing up in my library's catalog (which I would oppose with every ounce of energy I could muster) for a number of years, and I thought I ought to share these concerns with the Wikidata community before they start drowning in crappy data they expected to be as high-quality as the authority data the Library of Congress publishes. --Crystal Clements, University of Washington Libraries (talk) 20:07, 15 September 2022 (UTC)[reply]

@UWashPrincipalCataloger, Jimfhahn, Clements.UWLib, Vladimir Alexiev: ✓ Done though I recommend limiting data imports to known good data. Harej (talk) 20:16, 1 July 2023 (UTC)[reply]

Library of Congress RDF

[edit]

Vladimir Alexiev (talk) 12:11, 14 September 2022 (UTC)[reply]

Go to http://id.loc.gov/resources/hubs/f1588874-1ce4-5f3e-4ce2-7ceec4885588, get N-Triples (Compact), prepend these prefixes:

@prefix bf:       <http://id.loc.gov/ontologies/bibframe/> .
@prefix bflc:     <http://id.loc.gov/ontologies/bflc/> .
@prefix dct:      <http://purl.org/dc/terms/> .
@prefix lc-agent: <http://id.loc.gov/rwo/agents/> .
@prefix lc-desc:  <http://id.loc.gov/vocabulary/descriptionConventions/> .
@prefix lc-dt:    <http://id.loc.gov/datatypes/> .
@prefix lc-gf:    <http://id.loc.gov/authorities/genreForms/> .
@prefix lc-hub:   <http://id.loc.gov/resources/hubs/> .
@prefix lc-lang:  <http://id.loc.gov/vocabulary/languages/> .
@prefix lc-mstat: <http://id.loc.gov/vocabulary/mstatus/> .
@prefix lc-note:  <http://id.loc.gov/vocabulary/mnotetype/> .
@prefix lc-org:   <http://id.loc.gov/vocabulary/organizations/> .
@prefix lc-rel:   <http://id.loc.gov/vocabulary/relators/> .
@prefix lc-work:  <http://id.loc.gov/resources/works/> .
@prefix rdf:      <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:     <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:      <http://www.w3.org/2001/XMLSchema#> .

Convert to turtke: you get decent info about that song. Most importantly, the creators (eg lc-agent:n80017868 is Lennon), which use LCNAF identifiers.

lc-hub:f1588874-1ce4-5f3e-4ce2-7ceec4885588
        rdf:type             bf:Hub , bf:Work ;
        bflc:aap             "Lennon, John, 1940-1980 Lucy in the sky with diamonds" ;
        bflc:aap-normalized  "lennonjohn19401980lucyintheskywithdiamonds" ;
        bflc:subjectOf       lc-work:17466979 ;
        bf:adminMetadata     [ rdf:type                   bf:AdminMetadata ;
                               bf:assigner                lc-org:dlc ;
                               bf:changeDate              "2015-08-28T07:39:45"^^xsd:dateTime ;
                               bf:creationDate            "2012-09-20"^^xsd:date ;
                               bf:descriptionConventions  lc-desc:rda , lc-desc:local ;
                               bf:descriptionLanguage     lc-lang:eng ;
                               bf:descriptionModifier     [ rdf:type  bf:Agent ;
                                                            bf:code   "WaU"
                                                          ] ;
                               bf:generationProcess       [ rdf:type           bf:GenerationProcess ;
                                                            rdfs:label         "DLC marc2bibframe2 v1.8.0-SNAPSHOT-MTA" ;
                                                            bf:generationDate  "2022-06-10T15:13:13.98774-04:00"^^xsd:dateTime
                                                          ] ;
                               bf:identifiedBy            [ rdf:type     bf:Local ;
                                                            rdf:value    "f1588874-1ce4-5f3e-4ce2-7ceec4885588" ;
                                                            bf:assigner  lc-org:dlc
                                                          ] ;
                               bf:status                  lc-mstat:c
                             ] ;
        bf:contribution      [ rdf:type  bflc:PrimaryContribution , bf:Contribution ;
                               bf:agent  lc-agent:n80017868 ;
                               bf:role   lc-rel:ctb
                             ] ;
        bf:contribution      [ rdf:type  bf:Contribution ;
                               bf:agent  lc-agent:n50012135 ;
                               bf:role   lc-rel:ctb
                             ] ;
        bf:genreForm         lc-gf:gf2014027103 , lc-gf:gf2014027024 ;
        bf:identifiedBy      [ rdf:type     bf:Local ;
                               rdf:value    "oca09307757" ;
                               bf:assigner  lc-org:ocolc
                             ] ;
        bf:identifiedBy      [ rdf:type   bf:Lccn ;
                               rdf:value  "n 2012061856"
                             ] ;
        bf:note              [ rdf:type    lc-note:descsource , bf:Note ;
                               rdfs:label  "Created from auth."
                             ] ;
        bf:note              [ rdf:type              lc-note:datasource , bf:Note ;
                               rdfs:label            "ecip galley (Lucy in the sky with diamonds, 1967)" ;
                               bf:preferredCitation  "Lucy in the mind of Lennon, c2013"
                             ] ;
        bf:note              [ rdf:type              lc-note:datasource , bf:Note ;
                               rdfs:label            "(\"Lucy in the Sky with Diamonds\" is a song written primarily by John Lennon and credited to Lennon-McCartney, for the Beatles' 1967 album Sgt. Pepper's Lonely Hearts Club Band; released 1 June 1967; recorded: 1 March 1967; genre: Psychedelic rock; writer: Lennon-McCartney; producer: George Martin)" ;
                               bf:preferredCitation  "Wikipedia, August 27, 2015"
                             ] ;
        bf:originDate        "1967"^^lc-dt:edtf ;
        bf:title             [ rdf:type      bf:Title ;
                               bf:mainTitle  "Lucy in the sky with diamonds"
                             ] ;
        dct:isPartOf         <http://id.loc.gov/resources/works> .

If you get "Verbose", you also get descriptions of the used terms (objects of triples), which is convenient. Eg:

lc-agent:n80017868  rdf:type  bf:Person , bf:Agent ;
        rdfs:label  "Lennon, John, 1940-1980" ;
        mads:isIdentifiedByAuthority lc-naf:n80017868 .
        
lc-gf:gf2014027024  rdf:type  bf:GenreForm ;
        rdfs:label  "Psychedelic rock music" ;
        bf:source   <http://id.loc.gov/vocabulary/genreFormSchemes/lcgft> .

lc-mstat:c  rdf:type  bf:Status ;
        rdfs:label  "changed" .