Wikidata:Property proposal/Goodreads work ID

From Wikidata
Jump to navigation Jump to search

Goodreads work ID

[edit]

Originally proposed at Wikidata:Property proposal/Authority control

DescriptionThis is a unique identifier for written works (Q47461344) on Goodreads. This property should not be confused with Goodreads book ID, which is a unique identifier for versions, editions, or translations (Q3331189). For any one written work like War and Peace there should be only one Goodreads work ID but there may be multiple Goodreads book IDs. You can get this value from the "all editions" link on a Goodreads book page.
RepresentsGoodreads (Q2359213)
Data typeExternal identifier
Domainliterary work (Q7725634)
Allowed values[1-9]\d*
Example 1The Art of Electronics (Q3985697)556821 (Goodreads version/edition ID (P2969)556821 resolves to an edition of "History and the Idea of Progress", which has a Goodreads work ID of 5020760)
Example 2War and Peace (Q161531)4912783 (Goodreads version/edition ID (P2969)4912783 resolves to an edition of "Robotics in Alpe-Adria Region: Proceedings of the 2nd International Workshop (Raa '93), June 1993, Krems, Austria", which has a Goodreads work ID of 4978325)
Example 3The Forever War (Q5406934)423 (Goodreads version/edition ID (P2969)423 resolves to an edition of "Where I Was From", which has a Goodreads work ID of 1371028)
Example 4Demian (Q860577)5334697 (Goodreads version/edition ID (P2969)5334697 resolves to an edition of "Tell Me What to Eat If I Have Type II Diabetes", which has a Goodreads work ID of 5402183)
Sourcehttps://www.goodreads.com/
Planned useI want to use this when enhancing literary work (Q7725634) items. No exact plan for automated use at the moment. This could eventually be used to link different ISBNs to a single Item.
Number of IDs in sourceMore than 5000000, less than 6000000.
Expected completenessalways incomplete (Q21873886)
Formatter URLhttps://www.goodreads.com/work/editions/$1
See alsoOCLC work ID (P5331), OCLC control number (P243), LibraryThing work ID (P1085), Open Library ID (P648), Goodreads version/edition ID (P2969)

Motivation

[edit]

I think more identifiers for literary work (Q7725634) will be useful, and this one seems decent. Iwan.Aucamp (talk) 11:37, 29 April 2020 (UTC)[reply]

WikiProject Books has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

Vladimir Alexiev Jonathan Groß Andy Mabbett Jneubert Sic19 Wikidelo ArthurPSmith PKM Ettorerizza Fuzheado Daniel Mietchen Iwan.Aucamp Epìdosis Sotho Tal Ker Bargioni Carlobia Pablo Busatto Matlin Msuicat Uomovariabile Silva Selva 1-Byte Alessandra.Moi CamelCaseNick Songceci moz AhavaCohen Kolja21 RShigapov Jason.nlw MasterRus21thCentury NGOgo Pierre Tribhou Ahatd JordanTimothyJames Silviafanti Back ache AfricanLibrarian M.roszkowski Rhagfyr 沈澄心 MrBenjo S.v.Mering Hiperterminal (talk) מקף Lovelano Ecravo Chado07 Soufiyouns

Notified participants of WikiProject Authority control

Discussion

[edit]

 Oppose - there is already Goodreads version/edition ID (P2969) and it is not clear to me how the creation of this property would offer anything extra. This new property would be linking to a page that links to, and is linked from, the location of Goodreads version/edition ID (P2969). For example, The Art of Electronics (Q3985697)Goodreads version/edition ID (P2969)569775 links to the proposed Goodreads work ID 556821, which has a reciprocal link. Simon Cobb (User:Sic19 ; talk page) 16:56, 29 April 2020 (UTC)[reply]

@Sic19:: Goodreads version/edition ID (P2969) is an identifier for ?iteminstance of (P31)version, edition or translation (Q3331189) where as Goodreads work ID would be a identifier for ?iteminstance of (P31)written work (Q47461344). This is similar to the distinction between OCLC work ID (P5331) and OCLC control number (P243). Yes OCLC work ID (P5331) will link back to the same value that is in OCLC control number (P243), but these belong on different Wikidata items (if they exist). This will also facilitate the lookup of a Wikidata item relating to a work from ISBN even if the ISBN is not registered in wikipedia as you can lookup the ISBN on another service and link it back to an identifier that is on wikidata (the same can be done with Open Library ID (P648) as there are IDs there for works and editions and with OCLC work ID (P5331) and OCLC control number (P243)). The alternative would be to create 1,683 items with edition or translation of (P629)War and Peace (Q161531) and add Goodreads version/edition ID (P2969) and ISBN-13 (P212) on each individual item. Iwan.Aucamp (talk) 21:21, 29 April 2020 (UTC)[reply]
@Iwan.Aucamp:, the type constraints for Goodreads version/edition ID (P2969) are instance of written work (Q47461344) or version, edition or translation (Q3331189), so there would be overlap between the properties. I agree with everything else you've said and will withdraw my opposition (and most likely support) the proposed property if it is updated to clearly explain the relationship to Goodreads version/edition ID (P2969) and how the usage differs. It also seems reasonable that there is agreement about the necessary changes to Goodreads version/edition ID (P2969) before the new property is created. Fundamentally, the new property will be useful but, for me, it is sensible to spend a little extra time now getting the details right instead of fixing problems later. Simon Cobb (User:Sic19 ; talk page) 20:51, 1 May 2020 (UTC)[reply]
@Sic19: There are already overlap between identifiers for written works and identifiers for editions. Both OCLC control number (P243) and ISBN-13 (P212) (which are identifiers for editions and not works) are allowed and used on both written work (Q47461344) and version, edition or translation (Q3331189) and I don't really have a serious problem with it, I think the semantics and usage instructions could be made clearer for both OCLC control number (P243) and ISBN-13 (P212) but it is not that unclear. To me it is clear that if a written work has ISBN-13 (P212) it that it is an identifier of an edition of said written work. And it is also clear to me that if there is an Q-item for that specific edition then the ISBN-13 (P212) belongs there instead of on the written work. It is also clear that when ISBN-13 (P212) is used on an edition it should be with qualifier.
I have created a discussion section for Goodreads version/edition ID (P2969) here where I have raised some inconsistency with the property and suggested we clarify the use a bit through documentation. If you have any other problems with it you should raise it otherwise please provide feedback on the proposal.
I have expanded the description of this property and hope it is clear enough now. I want to avoid documenting other properties or the ontology used for written works and editions in this property - that should be and is documented on WikiProject books. The Domain of this property has been literary work (Q7725634) from the start - which already actually makes it pretty clear. Iwan.Aucamp (talk) 22:00, 9 May 2020 (UTC)[reply]

 Support Note that I'm not a books specialist. Still I've done a bit of research and here's what I found:

  • The Goodreads work ID (https://www.goodreads.com/work/editions/$1) is different from the book ID ( https://www.goodreads.com/book/show/$1). Both seem to be very different concept. For instance Free as in Freedom "edition" contains the same text in different format, and possibly translated in many languages, whereas Free as in Freedom(2.0) has, in addition to the same base text, many comments from the person that is subject of the biography were added. The additional comments takes a significant part of the book (more than 10% if my memory is correct, maybe 20% or 30%).
  • Having more identifier is a good thing as it helps better understand what constitute a work and what constitute an edition. External identifiers may not map exactly to Wikidata definition of edition and work. Even if both had exactly the same definition, mistakes could occur in both data sets, and having more identifiers helps spotting these. It's also probably impossible to have definitions that can clearly distinguish between different things and there is often some grey area between things: As I understand from the Functional_Requirements_for_Bibliographic_Records Wikipedia page, there is no crystal clear boundary to distinguish if two expressions are from the same work or from a different work that is inspired from the first one. Collective works that are done in an acentralized fashion like stories that are told between people and generations but not written down (yet) and drift from each other in different places are probably hard to characterize. For instance, as I understand the Commedia dell'arte "work" was a story that was told accros people and generations Commedia_dell'arte which was then written down by Carlo Goldoni.
  • However the description of the properties should be crystal clear on both goodreads work ID and goodreads book ID on which is which in order not to have people confuse both. As the URL are different there may also be ways to check for mistakes if someone uses a book ID instead of a work ID and vice versa.

GNUtoo (talk) 15:05, 30 April 2020 (UTC)[reply]

 Comment How do we determine whether a specific identifier is for a work, an edition, or something else? --EncycloPetey (talk) 01:19, 10 May 2020 (UTC)[reply]

 Oppose - there is already Goodreads version/edition ID (P2969), and there is no means to distinguish a work from an edition,there is no reason to have two properties. --EncycloPetey (talk) 18:42, 14 May 2020 (UTC)[reply]

  • @EncycloPetey: If you put the edition identifier as a book identifier it will just be wrong. For example, if you take the Bookreads work identifier for The Art of Electronics (Q3985697) which is 556821 (clearly stated in the proposal), and resolve it using the Goodreads version/edition ID (P2969)formatter URL (P1630)https://www.goodreads.com/book/show/$1 (instead of the formatter url which is in the proposal) - it will resolve to an edition of "History and the Idea of Progress" with ISBN-13 of "9780801481826". If your reasoning here is valid then why is OCLC work ID (P5331) and OCLC control number (P243) both justified? Iwan.Aucamp (talk) 18:38, 15 May 2020 (UTC)[reply]
    Yes, it would be wrong, but the discussion above has established that we have no means of making that distinction. --EncycloPetey (talk) 19:47, 15 May 2020 (UTC)[reply]
    @EncycloPetey: I really don't get your objection. Say I give you a value 24041 and ask you is it a Tidal artist ID (P4576), Tropicos publication ID (P4904), C-SPAN organization ID (P4725), Gamebase64 ID (P4917), Goodreads author ID (P2963), Goodreads character ID (P6327) or a Goodreads character ID (P6327)? What would your answer be? I mean the right answer is that it is a valid value for all of those identifiers, but since they are not the same identifier we have different properties for them instead of suggesting someone put values for Tidal artist ID (P4576) inside Goodreads character ID (P6327). Can you answer why you don't have similar objections to OCLC work ID (P5331) and OCLC control number (P243)? Unless of course you do have similar objections, in which case I would be grateful if you could raise them on whichever of those you want deprecated to help clarify what the problem is. Iwan.Aucamp (talk) 21:14, 15 May 2020 (UTC)[reply]
    I don't understand what you're asking or how it applies to the current issue. OCLC (WorldCat) is a total mess. --EncycloPetey (talk) 22:28, 15 May 2020 (UTC)[reply]
    @EncycloPetey: Maybe I'm misunderstanding here, but it seems like you have some hangup to create properties that have positive integer values as there is no way to look at a positive integer value and decide what it identifies. If so, do you mind proposing this rule in for all properties? And if you do mind, or do not think the rule should apply in to all properties, then why do you want to enforce this rule here?
    You asked "how we distinguish the identifier value inserted into this property as being a for a work, an edition, or something else.". I said that we cannot distinguish, the value is a positive integer without distinguishing features on it, the same as the value for Tidal artist ID (P4576), Tropicos publication ID (P4904), C-SPAN organization ID (P4725), Gamebase64 ID (P4917), Goodreads author ID (P2963), Goodreads character ID (P6327) or a Goodreads character ID (P6327) (and 100s of other properties). Yet we still manage to use them correctly, because we know they are not the same identifiers. When I get a value for C-SPAN organization ID (P4725) from somewhere my process of adding it wikidata is not to find any external identifier property that can take an positive integer and add it there and seize up if there is more than one such property. My process is to add it to the property that is intended for C-SPAN organization ID (P4725).
    Failure to do so is not a reason to not have a property for C-SPAN organization ID (P4725) either in my view, in such a case someone should take it up with the editor adding incorrect values, but with enough authority control and automation we can also very easily check things, we can check the structure against Goodreads, and OCLC, and other databases to see if there are discrepancies. So what I'm asking is, why do you expect that there should be something about a value for a numeric identifier, that is generated as sequential positive integers, and associated with works, to distinguish it from a value for a numeric identifier, that is generated as sequential positive integers, associated with editions?
    What is the use case we are trying to satisfy here? You just have a bunch of positive integers and you want to put it somewhere on wikidata? For that you can maybe do a query of all external identifiers that accept positive integers, get their formatter URLs, lookup their metadata on the external database, match it to wikidata and then if the metadata matches add the identifier to the item. So even this quite quirky use case could be covered without having something in the value itself which distinguishes it.
    You keep saying OCLC is a mess, please be more specific here. Some parts of WikiData is a mess, yet we don't stop editing it because of this, we work to make it better, cleaning up the mess. We describe what we think is wrong and try and get consensus on how to address it. Is the mess with OCLC that people mix up OCLC work ID (P5331) and OCLC control number (P243)? Iwan.Aucamp (talk) 11:12, 16 May 2020 (UTC)[reply]
    Within the OCLC database, they do not always distinguish between works, editions, and instances. That makes it a mess. A data item might be for one of those things or simultaneously two or more of those, and each work / edition might have multiple values. Hence, it is a mess. There are values in their database that cannot be aligned with our database or with any library database because of the sloppiness in the values within their database. I still don't understand how the rest of your comments apply here. When you say "Goodreads character ID (P6327) or a Goodreads character ID (P6327)", those are the same thing, and most of the items you mention have no bearing on distinguishing works from editions because they are not works or editions. --EncycloPetey (talk) 16:33, 16 May 2020 (UTC)[reply]

 Comment related: Wikidata_talk:WikiProject_Books#Clarifying_the_use_of_identifiers_for_editions_of_written_works_on_written_works_themselves. Iwan.Aucamp (talk) 18:10, 13 May 2020 (UTC)[reply]

clearly a consensus has been established to separate the book from its editions, makes sense. --Hannes Röst (talk) 19:56, 24 June 2020 (UTC)[reply]
@Iwan.Aucamp, Sic19, GNUtoo, EncycloPetey, Hannes Röst, Bodhisattwa: Goodreads work ID (P8383) has been created. Pamputt (talk) 14:44, 27 June 2020 (UTC)[reply]