Wikidata:Property proposal/LAGL author ID

From Wikidata
Jump to navigation Jump to search

LAGL author ID

[edit]

Originally proposed at Wikidata:Property proposal/Authority control

   Done: LAGL author ID (P12869) (Talk and documentation)
Descriptionidentifier for an Ancient Greek or Latin author in the LAGL Catalog of Authors and Works
Data typeExternal identifier
Example 1Acesias (Q3604255) -> urn:cts:greekLit:lagl0001
Example 2Sulla (Q483783) -> urn:cts:latinLit:phi0652
Example 3Xenophon (Q129772) -> urn:cts:greekLit:tlg0032

Motivation

[edit]
Epìdosis Geraki Azertus Alexander Doria Shisma Sp!ros Xena the Rebel Girl Alexmar983 DerHexer Lykos EncycloPetey Jahl de Vautban JBradyK Mathieu Kappler Ahc84 Liber008 JASHough User:Tolanor User:Jonathan Groß

Notified participants of WikiProject Ancient Greece LAGL Catalog has the main aim of providing permanent identifiers for authors (and works: I think I will propose soon a property also for works) in Ancient Greek and Latin, especially for the ones whose works have not been preserved, or are fragmentary, and for this reason aren't well mapped by existing databases, which mainly deal with authors of fully or mostly preserved works. In particular, as of now the database identifies with new IDs nearly 300 ancient Greek authors (and I think also Latin authors will be similarly added in the future) and all the authors are already linked to Wikidata, so the import of all the IDs will be very easy and will help building a reliable framework for referring univocally to ancient Greek fragmentary authors. The database also provides a very useful function, i.e. giving an easy link to the passage(s) of ancient source(s) mentioning each author. --Epìdosis 20:19, 27 June 2024 (UTC)[reply]

Discussion

[edit]
  •  Comment Some of the authors on LAGL have a TLG prefix and I'm guessing that the addition of Latin authors and works will need the inclusion of the greekLit/latinLit prefix as well. Also, shall this property be approved, will we still have a use for P12600 (P12600)? If we are to use CTS URN anyway it would probably be a good idea to have a single property for that. --Jahl de Vautban (talk) 21:22, 27 June 2024 (UTC)[reply]
    @Jahl de Vautban: you are perfectly right; I have just spoken with @Digitalphilologist: and she said that the property for Digital Athenaeus (P12600) is obsolete and she has just asked for its deletion; in fact all the IDs of Digital Athenaeus are now a subset of the IDs of the LAGL Catalog. I also edited the examples above to include all the possible types of IDs. --Epìdosis 15:30, 29 June 2024 (UTC)[reply]
    @Epìdosis, Digitalphilologist: it looks better now, but in my understanding CTS URN are universal and thus not limited to LAGL; the last two examples at least resolve in Perseus as well [1] [2] and the last one in Scaife [3]. I'm not really understanding the technical know-how, but I think we should leverage that and go right away with a more universal approache, that would consist in 1) a CTS URN author/work property, perhaps two if we are to distinguish between them and 2) a CTS URN resolver property, to store the base URL to which you could append the author/work property and that would go on the project Qid. In my opinion this would allow for a much wider use. --Jahl de Vautban (talk) 17:01, 29 June 2024 (UTC)[reply]
    @Jahl de Vautban: yes, the issue of CTS URNs is fairly complex, you are perfectly right in raising it! So, firstly I now created an embrional CTS URN (Q126941791): unique identifier for authors and literary works for initial reference. The use of CTS URNs originated from the studies about Ancient Greek and Latin literature, although, being a more general specification (https://cite-architecture.github.io/ctsurn_spec/), it could expand (and sometimes has expanded) beyond it. The concept of CTS URN, being a specification, is not specifically connected to a single database, but each database, using the rules of the specification CTS URN, can construct its own CTS URNs; the rule is urn:cts:CTSNAMESPACE:WORK:PASSAGE, of course we are only interested in the part urn:cts:CTSNAMESPACE:WORK; although the specification doesn't say it, since in Perseus CTS URNs WORK is obtained through AUTHORID.WORKID, in fact we can consider (broadly speaking) urn:cts:CTSNAMESPACE:AUTHORID as a CTS URN for authors.
    CTS URNs are not unique, even inside the same database: e.g. considering the most important collection of CTS URNs for ancient Greek and Latin texts, the one offered by Perseus Catalog, we find e.g. both https://catalog.perseus.org/catalog/urn:cts:latinLit:phi0978 and https://catalog.perseus.org/catalog/urn:cts:latinLit:stoa0232 for Pliny the Elder (Q82778): 1st-century Roman military commander and writer; this is because the same author has different numbers in different collections and CTS URNs are not unique IDs strictly speaking, but only unique for one collection of texts (e.g. Stoa vs PHI). The LAGL project (of which of course @Digitalphilologist: knows much more than me) has chosen to use CTS URNs relating on TLG for Ancient Greek authors (so the same as Perseus) and CTS URNs relating to PHI whenever possible or otherwise to Stoa for Latin authors (so the same as Perseus, but always unique unlike Perseus); however, for Greek authors absent in TLG and for Latin authors absent in PHI and in Stoa, it has chosen to create new CTS URNS, relating to LAGL itself. So yes, the issue is very much complex. I try to articulate the following proposal, as follows:
    1. since CTS URN is a specification that can be used to produce different but equally valid IDs (insofar as the follow the rules of the CTS URN specification), I think having one single property "CTS URN" is not feasible, mainly because some IDs would only work with a certain formatter URL (e.g. linking to Perseus Catalog) and some others with a different formatter URL (e.g. linking to LAGL Catalog)
    2. I would generally distinguish properties for CTS URNs for authors and for works, mainly because having different properties allows to set more precise format constraint (Q21502404) and subject type constraint (Q21503250) for each
    3. given these two premises, I think we could be interested in creating a total of 4 properties: 1) Perseus Catalog CTS URN for authors 2) Perseus Catalog CTS URN for works 3) LAGL CTS URN for authors 4) LAGL CTS URN for works. Since works in LAGL are still experimental (whilst authors in LAGL are now stable), I would wait for number 4; I would keep this proposal for number 3 (I have now put "urn:cts:" in the ID instead of in the formatter URL); finally, I can propose separately numbers 1 and 2 (we can easily massive import values of number 1 on the basis of our Perseus author ID (P7041) values).
    Do you agree with my proposal? Of course we can discuss further improvements! --Epìdosis 21:00, 29 June 2024 (UTC)[reply]
    @Epìdosis: Thanks for the torough reply! What I am really concerned about is the storage of duplicate values: excepts for fragmentary authors for which none identifiers exists, and for which indeed LAGL would be needed, Sylla and Xenophon above will respectively use the same identifier than PHI Latin Texts author ID (P6941) and TLG author ID (P3576), albeit with a different resolver. I don't ultimately think that some authors having two identifiers is probablematic in itself, insofar as the value are differents, but I didn't realise than my previous proposition would also need to supersede those two properties. We could in theory only store the IDs created specificaly for LAGL, but then how would one construct a working query targeting the whole site with the different identifiers I don't know. The fact that some identifiers might only work with certain formatters is certainly a huge liabilty. So  Support the current proposam I guess, though I'm still bothered by the duplicate values thing. As for your proposition with Perseus, I think that would also create more duplicated values, because the only thing that can be resolved through it that we don't store yet are the Stoa identifiers. --Jahl de Vautban (talk) 18:48, 2 July 2024 (UTC)[reply]
    @Jahl de Vautban: sure, I perfectly agree; the problem of storing redundant values was also my initial concern, because of which I initially proposed only LAGL values. But in fact, after reading your comment about CTS URNs and also discussing with @Digitalphilologist:, I changed my initial thought and I realised that having the possibility to link to all the CTS URNs contained in LAGL Catalog was more important than avoiding to store redundant values. Yes, the concept of CTS URNs, being mostly based on existing IDs (TLG, PHI etc.), basically implies to store redundant values unfortunately; however, since CTS URNs are very commonly used in digital humanities projects regarding classical antiquities, I effectively think that having a fairly complete set of CTS URNs like the ones provided by LAGL Catalog is the best choice. So, for now I will wait proposing the properties for the CTS URNs provided by Perseus Catalog, since they can be easily extracted from our Perseus author ID (P7041) and also because they will be very much redundant with the ones of LAGL Catalog. Epìdosis 21:08, 2 July 2024 (UTC)[reply]
  •  Support I have just spoken with @Epìdosis: and I confirm that I'm asking the deletion of the Digital Athenaeus property. The LAGL Catalog will include the IDs of the Digital Athenaeus and will also include Latin authors. This is the reason why the prefix before the number is important (e.g., lagl0003) is important. Thank you.--Digitalphilologist (talk) 15:35, 29 June 2024 (UTC)[reply]
  • @Epìdosis, Jahl de Vautban, Digitalphilologist: ✓ Done as LAGL author ID (P12869). Regards, ZI Jony (Talk) 03:44, 5 July 2024 (UTC)[reply]
    great, thanks! Digitalphilologist (talk) 08:15, 8 July 2024 (UTC)[reply]