Wikidata:Property proposal/Goodreads work ID
Goodreads work ID
[edit]Originally proposed at Wikidata:Property proposal/Authority control
Description | This is a unique identifier for written works (Q47461344) on Goodreads. This property should not be confused with Goodreads book ID, which is a unique identifier for versions, editions, or translations (Q3331189). For any one written work like War and Peace there should be only one Goodreads work ID but there may be multiple Goodreads book IDs. You can get this value from the "all editions" link on a Goodreads book page. |
---|---|
Represents | Goodreads (Q2359213) |
Data type | External identifier |
Domain | literary work (Q7725634) |
Allowed values | [1-9]\d* |
Example 1 | The Art of Electronics (Q3985697) → 556821 (Goodreads version/edition ID (P2969)556821 resolves to an edition of "History and the Idea of Progress", which has a Goodreads work ID of 5020760) |
Example 2 | War and Peace (Q161531) → 4912783 (Goodreads version/edition ID (P2969)4912783 resolves to an edition of "Robotics in Alpe-Adria Region: Proceedings of the 2nd International Workshop (Raa '93), June 1993, Krems, Austria", which has a Goodreads work ID of 4978325) |
Example 3 | The Forever War (Q5406934) → 423 (Goodreads version/edition ID (P2969)423 resolves to an edition of "Where I Was From", which has a Goodreads work ID of 1371028) |
Example 4 | Demian (Q860577) → 5334697 (Goodreads version/edition ID (P2969)5334697 resolves to an edition of "Tell Me What to Eat If I Have Type II Diabetes", which has a Goodreads work ID of 5402183) |
Source | https://www.goodreads.com/ |
Planned use | I want to use this when enhancing literary work (Q7725634) items. No exact plan for automated use at the moment. This could eventually be used to link different ISBNs to a single Item. |
Number of IDs in source | More than 5000000, less than 6000000. |
Expected completeness | always incomplete (Q21873886) |
Formatter URL | https://www.goodreads.com/work/editions/$1 |
See also | OCLC work ID (P5331), OCLC control number (P243), LibraryThing work ID (P1085), Open Library ID (P648), Goodreads version/edition ID (P2969) |
Motivation
[edit]I think more identifiers for literary work (Q7725634) will be useful, and this one seems decent. Iwan.Aucamp (talk) 11:37, 29 April 2020 (UTC)
WikiProject Books has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
Notified participants of WikiProject Authority control
Discussion
[edit]Oppose - there is already Goodreads version/edition ID (P2969) and it is not clear to me how the creation of this property would offer anything extra. This new property would be linking to a page that links to, and is linked from, the location of Goodreads version/edition ID (P2969). For example, The Art of Electronics (Q3985697) → Goodreads version/edition ID (P2969) → 569775 links to the proposed Goodreads work ID 556821, which has a reciprocal link. Simon Cobb (User:Sic19 ; talk page) 16:56, 29 April 2020 (UTC)
- @Sic19:: Goodreads version/edition ID (P2969) is an identifier for ?iteminstance of (P31)version, edition or translation (Q3331189) where as Goodreads work ID would be a identifier for ?iteminstance of (P31)written work (Q47461344). This is similar to the distinction between OCLC work ID (P5331) and OCLC control number (P243). Yes OCLC work ID (P5331) will link back to the same value that is in OCLC control number (P243), but these belong on different Wikidata items (if they exist). This will also facilitate the lookup of a Wikidata item relating to a work from ISBN even if the ISBN is not registered in wikipedia as you can lookup the ISBN on another service and link it back to an identifier that is on wikidata (the same can be done with Open Library ID (P648) as there are IDs there for works and editions and with OCLC work ID (P5331) and OCLC control number (P243)). The alternative would be to create 1,683 items with edition or translation of (P629)War and Peace (Q161531) and add Goodreads version/edition ID (P2969) and ISBN-13 (P212) on each individual item. Iwan.Aucamp (talk) 21:21, 29 April 2020 (UTC)
- @Iwan.Aucamp:, the type constraints for Goodreads version/edition ID (P2969) are instance of written work (Q47461344) or version, edition or translation (Q3331189), so there would be overlap between the properties. I agree with everything else you've said and will withdraw my opposition (and most likely support) the proposed property if it is updated to clearly explain the relationship to Goodreads version/edition ID (P2969) and how the usage differs. It also seems reasonable that there is agreement about the necessary changes to Goodreads version/edition ID (P2969) before the new property is created. Fundamentally, the new property will be useful but, for me, it is sensible to spend a little extra time now getting the details right instead of fixing problems later. Simon Cobb (User:Sic19 ; talk page) 20:51, 1 May 2020 (UTC)
- @Sic19: There are already overlap between identifiers for written works and identifiers for editions. Both OCLC control number (P243) and ISBN-13 (P212) (which are identifiers for editions and not works) are allowed and used on both written work (Q47461344) and version, edition or translation (Q3331189) and I don't really have a serious problem with it, I think the semantics and usage instructions could be made clearer for both OCLC control number (P243) and ISBN-13 (P212) but it is not that unclear. To me it is clear that if a written work has ISBN-13 (P212) it that it is an identifier of an edition of said written work. And it is also clear to me that if there is an Q-item for that specific edition then the ISBN-13 (P212) belongs there instead of on the written work. It is also clear that when ISBN-13 (P212) is used on an edition it should be with qualifier.
- I have created a discussion section for Goodreads version/edition ID (P2969) here where I have raised some inconsistency with the property and suggested we clarify the use a bit through documentation. If you have any other problems with it you should raise it otherwise please provide feedback on the proposal.
- I have expanded the description of this property and hope it is clear enough now. I want to avoid documenting other properties or the ontology used for written works and editions in this property - that should be and is documented on WikiProject books. The Domain of this property has been literary work (Q7725634) from the start - which already actually makes it pretty clear. Iwan.Aucamp (talk) 22:00, 9 May 2020 (UTC)
- @Iwan.Aucamp:, the type constraints for Goodreads version/edition ID (P2969) are instance of written work (Q47461344) or version, edition or translation (Q3331189), so there would be overlap between the properties. I agree with everything else you've said and will withdraw my opposition (and most likely support) the proposed property if it is updated to clearly explain the relationship to Goodreads version/edition ID (P2969) and how the usage differs. It also seems reasonable that there is agreement about the necessary changes to Goodreads version/edition ID (P2969) before the new property is created. Fundamentally, the new property will be useful but, for me, it is sensible to spend a little extra time now getting the details right instead of fixing problems later. Simon Cobb (User:Sic19 ; talk page) 20:51, 1 May 2020 (UTC)
- @Sic19:: Goodreads version/edition ID (P2969) is an identifier for ?iteminstance of (P31)version, edition or translation (Q3331189) where as Goodreads work ID would be a identifier for ?iteminstance of (P31)written work (Q47461344). This is similar to the distinction between OCLC work ID (P5331) and OCLC control number (P243). Yes OCLC work ID (P5331) will link back to the same value that is in OCLC control number (P243), but these belong on different Wikidata items (if they exist). This will also facilitate the lookup of a Wikidata item relating to a work from ISBN even if the ISBN is not registered in wikipedia as you can lookup the ISBN on another service and link it back to an identifier that is on wikidata (the same can be done with Open Library ID (P648) as there are IDs there for works and editions and with OCLC work ID (P5331) and OCLC control number (P243)). The alternative would be to create 1,683 items with edition or translation of (P629)War and Peace (Q161531) and add Goodreads version/edition ID (P2969) and ISBN-13 (P212) on each individual item. Iwan.Aucamp (talk) 21:21, 29 April 2020 (UTC)
Support Note that I'm not a books specialist. Still I've done a bit of research and here's what I found:
- The Goodreads work ID (https://www.goodreads.com/work/editions/$1) is different from the book ID ( https://www.goodreads.com/book/show/$1). Both seem to be very different concept. For instance Free as in Freedom "edition" contains the same text in different format, and possibly translated in many languages, whereas Free as in Freedom(2.0) has, in addition to the same base text, many comments from the person that is subject of the biography were added. The additional comments takes a significant part of the book (more than 10% if my memory is correct, maybe 20% or 30%).
- Having more identifier is a good thing as it helps better understand what constitute a work and what constitute an edition. External identifiers may not map exactly to Wikidata definition of edition and work. Even if both had exactly the same definition, mistakes could occur in both data sets, and having more identifiers helps spotting these. It's also probably impossible to have definitions that can clearly distinguish between different things and there is often some grey area between things: As I understand from the Functional_Requirements_for_Bibliographic_Records Wikipedia page, there is no crystal clear boundary to distinguish if two expressions are from the same work or from a different work that is inspired from the first one. Collective works that are done in an acentralized fashion like stories that are told between people and generations but not written down (yet) and drift from each other in different places are probably hard to characterize. For instance, as I understand the Commedia dell'arte "work" was a story that was told accros people and generations Commedia_dell'arte which was then written down by Carlo Goldoni.
- However the description of the properties should be crystal clear on both goodreads work ID and goodreads book ID on which is which in order not to have people confuse both. As the URL are different there may also be ways to check for mistakes if someone uses a book ID instead of a work ID and vice versa.
GNUtoo (talk) 15:05, 30 April 2020 (UTC)
- @GNUtoo: Please see the reply to Simon Cobb. Iwan.Aucamp (talk) 22:00, 9 May 2020 (UTC)
Comment How do we determine whether a specific identifier is for a work, an edition, or something else? --EncycloPetey (talk) 01:19, 10 May 2020 (UTC)
- @EncycloPetey: in this case it is stated in the domain. The domain in this proposal is and has always been literary work (Q7725634). I guess part of the issue is that for Goodreads version/edition ID (P2969) there is a lack of clarity, but to be fair, the same lack of clarity exists for ISBN-13 (P212) and OCLC control number (P243). I have suggested the documentation be updated for Goodreads version/edition ID (P2969) (proposal here, no responses) to clarify this though but maybe another property to represent the domain independently of the constraints is needed. Again though, this property is perfectly clear on the matter, I don't mind fixing other properties but other properties will always be broken and new properties are created in the face of that reality all the time. Iwan.Aucamp (talk) 18:10, 13 May 2020 (UTC)
- That doesn't answer the question I asked. I did not ask whether this property was for a work, an edition, or something else. What I asked was how we distinguish the identifier value inserted into this property as being a for a work, an edition, or something else. --EncycloPetey (talk) 21:36, 13 May 2020 (UTC)
- @EncycloPetey: from just the value itself you cannot determine whether it is for a work, edition ("book"), character, author or series. Iwan.Aucamp (talk) 16:37, 14 May 2020 (UTC)
- Since there is no way to make the determination, why create a separate property? --EncycloPetey (talk) 18:41, 14 May 2020 (UTC)
- @EncycloPetey: Because it identifies something different? If I have a number, say 3000, and no other information, how do I know whether it is a OCLC work ID (P5331), LibraryThing work ID (P1085), OCLC control number (P243) or RfC ID (P892)? I don't, but we still have different properties because they are different identifiers. If I put a RFC ID in OCLC work ID (P5331) then it will just be wrong and resolve to the wrong thing. Their are different identifier spaces, not any single one claims to be a UUID. And neither this proposal nor Goodreads version/edition ID (P2969) is a UUID. Iwan.Aucamp (talk) 18:38, 15 May 2020 (UTC)
- I don't actually understand why that even matters, if you just have a bunch of numbers with no other information, why would you use them for anything? Just discard them. Actual use cases, like scraping would involve you getting the identifer from a place where it is clear what it identifies, like if I got the identifier from an RFC page, I would not put it in Goodreads version/edition ID (P2969) even though there is no way for me to tell, from the value alone whether it is a RFC ID or a Goodreads version/edition ID (P2969). The exact same thing applies here, if you got the identifier from a work page or link on Goodreads, why would you be confused as to whether or not it identifies a work? Iwan.Aucamp (talk) 18:51, 15 May 2020 (UTC)
- By this standard almost all identifiers should be deprecated and we will be left more or less just with URI based identifiers. Iwan.Aucamp (talk) 18:54, 15 May 2020 (UTC)
- I asked if there was any way to tell them apart, and you said no. Most other identifiers have the means to distinguish works from editions. Those that do not have led to serious problems. Because there is no means of distinguishing them, there is no reason to have two properties. --EncycloPetey (talk) 19:59, 15 May 2020 (UTC)
- So did OCLC work ID (P5331) and OCLC control number (P243) lead to serious problems? And if so could you elaborate on what they were? Iwan.Aucamp (talk) 21:23, 15 May 2020 (UTC)
- The only identifier that I am aware of that encodes this information in the identifier value itself is Open Library ID (P648). Are there others? To me it seems most identifiers don't encode this in the identifier. Iwan.Aucamp (talk) 12:34, 16 May 2020 (UTC)
- I asked if there was any way to tell them apart, and you said no. Most other identifiers have the means to distinguish works from editions. Those that do not have led to serious problems. Because there is no means of distinguishing them, there is no reason to have two properties. --EncycloPetey (talk) 19:59, 15 May 2020 (UTC)
- Since there is no way to make the determination, why create a separate property? --EncycloPetey (talk) 18:41, 14 May 2020 (UTC)
- @EncycloPetey: from just the value itself you cannot determine whether it is for a work, edition ("book"), character, author or series. Iwan.Aucamp (talk) 16:37, 14 May 2020 (UTC)
- That doesn't answer the question I asked. I did not ask whether this property was for a work, an edition, or something else. What I asked was how we distinguish the identifier value inserted into this property as being a for a work, an edition, or something else. --EncycloPetey (talk) 21:36, 13 May 2020 (UTC)
Oppose - there is already Goodreads version/edition ID (P2969), and there is no means to distinguish a work from an edition,there is no reason to have two properties. --EncycloPetey (talk) 18:42, 14 May 2020 (UTC)
- @EncycloPetey: If you put the edition identifier as a book identifier it will just be wrong. For example, if you take the Bookreads work identifier for The Art of Electronics (Q3985697) which is 556821 (clearly stated in the proposal), and resolve it using the Goodreads version/edition ID (P2969)formatter URL (P1630)https://www.goodreads.com/book/show/$1 (instead of the formatter url which is in the proposal) - it will resolve to an edition of "History and the Idea of Progress" with ISBN-13 of "9780801481826". If your reasoning here is valid then why is OCLC work ID (P5331) and OCLC control number (P243) both justified? Iwan.Aucamp (talk) 18:38, 15 May 2020 (UTC)
- Yes, it would be wrong, but the discussion above has established that we have no means of making that distinction. --EncycloPetey (talk) 19:47, 15 May 2020 (UTC)
- @EncycloPetey: I really don't get your objection. Say I give you a value 24041 and ask you is it a Tidal artist ID (P4576), Tropicos publication ID (P4904), C-SPAN organization ID (P4725), Gamebase64 ID (P4917), Goodreads author ID (P2963), Goodreads character ID (P6327) or a Goodreads character ID (P6327)? What would your answer be? I mean the right answer is that it is a valid value for all of those identifiers, but since they are not the same identifier we have different properties for them instead of suggesting someone put values for Tidal artist ID (P4576) inside Goodreads character ID (P6327). Can you answer why you don't have similar objections to OCLC work ID (P5331) and OCLC control number (P243)? Unless of course you do have similar objections, in which case I would be grateful if you could raise them on whichever of those you want deprecated to help clarify what the problem is. Iwan.Aucamp (talk) 21:14, 15 May 2020 (UTC)
- I don't understand what you're asking or how it applies to the current issue. OCLC (WorldCat) is a total mess. --EncycloPetey (talk) 22:28, 15 May 2020 (UTC)
- @EncycloPetey: Maybe I'm misunderstanding here, but it seems like you have some hangup to create properties that have positive integer values as there is no way to look at a positive integer value and decide what it identifies. If so, do you mind proposing this rule in for all properties? And if you do mind, or do not think the rule should apply in to all properties, then why do you want to enforce this rule here?
- You asked "how we distinguish the identifier value inserted into this property as being a for a work, an edition, or something else.". I said that we cannot distinguish, the value is a positive integer without distinguishing features on it, the same as the value for Tidal artist ID (P4576), Tropicos publication ID (P4904), C-SPAN organization ID (P4725), Gamebase64 ID (P4917), Goodreads author ID (P2963), Goodreads character ID (P6327) or a Goodreads character ID (P6327) (and 100s of other properties). Yet we still manage to use them correctly, because we know they are not the same identifiers. When I get a value for C-SPAN organization ID (P4725) from somewhere my process of adding it wikidata is not to find any external identifier property that can take an positive integer and add it there and seize up if there is more than one such property. My process is to add it to the property that is intended for C-SPAN organization ID (P4725).
- Failure to do so is not a reason to not have a property for C-SPAN organization ID (P4725) either in my view, in such a case someone should take it up with the editor adding incorrect values, but with enough authority control and automation we can also very easily check things, we can check the structure against Goodreads, and OCLC, and other databases to see if there are discrepancies. So what I'm asking is, why do you expect that there should be something about a value for a numeric identifier, that is generated as sequential positive integers, and associated with works, to distinguish it from a value for a numeric identifier, that is generated as sequential positive integers, associated with editions?
- What is the use case we are trying to satisfy here? You just have a bunch of positive integers and you want to put it somewhere on wikidata? For that you can maybe do a query of all external identifiers that accept positive integers, get their formatter URLs, lookup their metadata on the external database, match it to wikidata and then if the metadata matches add the identifier to the item. So even this quite quirky use case could be covered without having something in the value itself which distinguishes it.
- You keep saying OCLC is a mess, please be more specific here. Some parts of WikiData is a mess, yet we don't stop editing it because of this, we work to make it better, cleaning up the mess. We describe what we think is wrong and try and get consensus on how to address it. Is the mess with OCLC that people mix up OCLC work ID (P5331) and OCLC control number (P243)? Iwan.Aucamp (talk) 11:12, 16 May 2020 (UTC)
- Within the OCLC database, they do not always distinguish between works, editions, and instances. That makes it a mess. A data item might be for one of those things or simultaneously two or more of those, and each work / edition might have multiple values. Hence, it is a mess. There are values in their database that cannot be aligned with our database or with any library database because of the sloppiness in the values within their database. I still don't understand how the rest of your comments apply here. When you say "Goodreads character ID (P6327) or a Goodreads character ID (P6327)", those are the same thing, and most of the items you mention have no bearing on distinguishing works from editions because they are not works or editions. --EncycloPetey (talk) 16:33, 16 May 2020 (UTC)
- I don't understand what you're asking or how it applies to the current issue. OCLC (WorldCat) is a total mess. --EncycloPetey (talk) 22:28, 15 May 2020 (UTC)
- @EncycloPetey: I really don't get your objection. Say I give you a value 24041 and ask you is it a Tidal artist ID (P4576), Tropicos publication ID (P4904), C-SPAN organization ID (P4725), Gamebase64 ID (P4917), Goodreads author ID (P2963), Goodreads character ID (P6327) or a Goodreads character ID (P6327)? What would your answer be? I mean the right answer is that it is a valid value for all of those identifiers, but since they are not the same identifier we have different properties for them instead of suggesting someone put values for Tidal artist ID (P4576) inside Goodreads character ID (P6327). Can you answer why you don't have similar objections to OCLC work ID (P5331) and OCLC control number (P243)? Unless of course you do have similar objections, in which case I would be grateful if you could raise them on whichever of those you want deprecated to help clarify what the problem is. Iwan.Aucamp (talk) 21:14, 15 May 2020 (UTC)
- Yes, it would be wrong, but the discussion above has established that we have no means of making that distinction. --EncycloPetey (talk) 19:47, 15 May 2020 (UTC)
Comment related: Wikidata_talk:WikiProject_Books#Clarifying_the_use_of_identifiers_for_editions_of_written_works_on_written_works_themselves. Iwan.Aucamp (talk) 18:10, 13 May 2020 (UTC)
- clearly a consensus has been established to separate the book from its editions, makes sense. --Hannes Röst (talk) 19:56, 24 June 2020 (UTC)
- Support - Definitely. -- Bodhisattwa (talk) 04:50, 14 May 2020 (UTC)
- Support there seems to be a lot of conceptual mixup higher up in the thread, but as far as I understand there would be a Property for the Goodreads work id (e.g. 556821 for The Art of Electronics) which links to all the 14 editions of the book that can be addressed with the existing Goodreads version/edition ID (P2969) (eg 569775, 7831175 etc). That is just consistent modelling and common sense. --Hannes Röst (talk) 19:56, 24 June 2020 (UTC)
- @Iwan.Aucamp, Sic19, GNUtoo, EncycloPetey, Hannes Röst, Bodhisattwa: Goodreads work ID (P8383) has been created. Pamputt (talk) 14:44, 27 June 2020 (UTC)