Property talk:P2190
Documentation
former identifier for a person's appearances on C-SPAN as strings - string IDs should be converted to numeric and moved to P10660 to prevent link rot
List of violations of this constraint: Database reports/Constraint violations/P2190#Format, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P2190#Unique value, SPARQL (every item), SPARQL (by value)
List of violations of this constraint: Database reports/Constraint violations/P2190#Entity types
List of violations of this constraint: Database reports/Constraint violations/P2190#Scope, SPARQL
This property is being used by: Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.) |
|
Over-strict constraints
[edit]C-SPAN disambiguates IDs by appending a number, for example, Arthur Miller (Q80596) -> arthurmiller02. Can someone tweak the constraints accordingly, please? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:40, 21 July 2016 (UTC)
- It seems that the regular expression for this ID (S-SPAN person ID) is now (judging by what C-SPAN seems to use):
- [a-zA-Z]+(0[2-9]|1[0-5]|)
- So, for example, a name like 'Richard Smith' would be represented with C-SPAN person ID (by C-SPAN) as 'RichardSmith'. In the past C-SPAN would have represented this name with all lower-case ('richardsmith'). But it seems that C-SPAN now (or for quite some time) uses both lower-case person-IDs (as originally) as well as mixed-case person-IDs.
- Even if this item (PP2190) is being deprecated or eventually eliminated, should the constraining regular expression be expanded to what I have here above (accepting mixed-case)? Many of the (very annoying) false constrain violations that appear all over this WikiData database are almost more of an obstacle to getting correct data into the database than an aid! L.Smithfield (talk) 23:04, 4 May 2023 (UTC)
Should this property change its Identifier from the string to numeric ID
[edit]The current property ID is a string such as "BoisfeuilletBoJonesJr", but an underlying numeric ID is available now for these such as 1009899 and this would make the id format more compatible with existing uses such as the US Legistlature ID fields and affords the ability to increment over the space to create a more complete catalog in mix'n'match or other location. The highest numeric tested with a valid response is id 1009899 which is a substantial increase from the current 9563 ids in 3092. Migration from the string ID to numeric should be simple as resolving the ID and extracting the ID from the URL. Not all string based IDs are resolving including the proposal example.
- Example 1: the property example link for Vivian Shiller is https://www.c-span.org/person/?vivianschiller while it returns 404 the numeric is https://www.c-span.org/person/?127636/VivianSchiller
- Example 2: https://www.c-span.org/person/?1009899/BoisfeuilletBoJonesJr
Wolfgang8741 (talk) 04:59, 23 February 2022 (UTC)
- Do you have a list of old and new IDs which would make it possible to convert them all quickly? If the process is going to take a long time, it may be easier to propose a new property with the new scheme and then gradually migrate them. — Martin (MSGJ · talk) 08:28, 23 February 2022 (UTC)
- I can write a script to compile the list to get the numeric IDs to ease the transition, but now with time to sleep on it and reading your response proposing a new property and migrating may be a more stable approach and propose deprecation of the string once parity has been reached would provide more clarity to any automated integrations. Also a new property would afford coordinating with any templates using the property such as EN Wikipedia's Template:C-SPAN which relies upon the string and a hard fork on the ID might make imports more difficult. Thank you for the feedback, I'll propose a new property to replace or complement the string. Wolfgang8741 (talk) 15:25, 23 February 2022 (UTC)
- Just to note, the property already has numeric ids being used rather than string based as shown in [1]https://w.wiki/4sYj. In practice the move from strings to IDs cleans up the ID space to allow for cleaner constraints and tool integration. Wolfgang8741 (talk) 15:32, 23 February 2022 (UTC)
- From the query https://w.wiki/4sYz and a sort by ID format, 5547 use the string format and 452 use the numeric - for scope of the issue. Found also H. Joel Deckard (Q1305216) which the string did resolve to a numeric, but the page was 404 and required a search for the newer id. Filed a ticket on C-Span about needing to redirect the old ID. Wolfgang8741 (talk) 16:24, 23 February 2022 (UTC)
- @MSGJ I'm happy to report the 5418 of the 5547 string IDs resolved and the corresponding numeric ID was extracted from the URL. Unfortunately 129 returned 404 or other response error and will need a manual check to locate a corresponding pair if available. Do you have a suggestion of how to go about applying these and deprecating the string IDs? I would assume quick statements would be easy to add the IDs, but I've not worked with automated editing of existing statements yet. The matched IDs are in a TSV with header of (Qid, Qurl, P2190string, CSpanUrl, P2190numeric) and the list needing a manual check have the same headings minus the P2190numeric and the CSpanUrl and contains the requested URL. Now that we have the corresponding IDs for the change and ready for application, what should be done next?
- Thinking further, deprecating the string format and keeping this property may be reasonable given the high success of matching and marking the existing strings as deprecated when the numeric is applied should be fine along with constraint configuration and assistive text.
- I haven't yet understood how to configure mix'n'match's scraping for which the existing catalog should be updated to use the numeric id from the existing string match https://mix-n-match.toolforge.org/#/catalog/3092 to reduce additional strings being added. Wolfgang8741 (talk) 00:50, 27 February 2022 (UTC)
- From the query https://w.wiki/4sYz and a sort by ID format, 5547 use the string format and 452 use the numeric - for scope of the issue. Found also H. Joel Deckard (Q1305216) which the string did resolve to a numeric, but the page was 404 and required a search for the newer id. Filed a ticket on C-Span about needing to redirect the old ID. Wolfgang8741 (talk) 16:24, 23 February 2022 (UTC)
- Just to note, the property already has numeric ids being used rather than string based as shown in [1]https://w.wiki/4sYj. In practice the move from strings to IDs cleans up the ID space to allow for cleaner constraints and tool integration. Wolfgang8741 (talk) 15:32, 23 February 2022 (UTC)
- I can write a script to compile the list to get the numeric IDs to ease the transition, but now with time to sleep on it and reading your response proposing a new property and migrating may be a more stable approach and propose deprecation of the string once parity has been reached would provide more clarity to any automated integrations. Also a new property would afford coordinating with any templates using the property such as EN Wikipedia's Template:C-SPAN which relies upon the string and a hard fork on the ID might make imports more difficult. Thank you for the feedback, I'll propose a new property to replace or complement the string. Wolfgang8741 (talk) 15:25, 23 February 2022 (UTC)
I support migrating to the numeric form. BrokenSegue (talk) 23:11, 23 February 2022 (UTC)
- @BrokenSegue @MSGJ It looks like consensus for the move to a numeric ID and multiple weeks have gone by without further comment after posting to Project Chat. I moved forward with an upload of the resolved numeric IDs to P2190 since they are backward compatible with the existing strings. I haven't deprecated en-mass before, can one of you help deprecate the strings once the two batches are complete (batch1 and batch2)? String IDs without numeric matches added prior to 26 February 2022 are most likely broken or the id could not be resolved manually and thus should be deprecated as well. Wolfgang8741 (talk) 14:24, 14 March 2022 (UTC)
- Proposed Split of Numeric format to a New C-SPAN ID submitted for review. Wolfgang8741 (talk) 18:51, 30 March 2022 (UTC)
- The numeric property has been accepted as C-SPAN person numeric ID (P10660) and I'm starting to import the numeric values to P10660. Following import to P10660, numeric values will be removed from P2190. Constraints have been rolled back to indicate that P2190 should only be the string format and P10660 should only contain the numeric format. Given the recent approval of P10660 and not all templates have moved to P10660, P2190 will need to indicate the transition and notice will be needed to help the transition before P2190 can be fully deprecated. Wolfgang8741 (talk) 21:32, 21 April 2022 (UTC)
- Proposed Split of Numeric format to a New C-SPAN ID submitted for review. Wolfgang8741 (talk) 18:51, 30 March 2022 (UTC)
The plan to support both the string person-ID (P2190) as well as the new person-numeric-ID (P10660) is a good plan (I think). Especially retaining both for the transition process was very wise (in my opinion). But can the format regular expression constraint for the old person-ID (P2190) please be revised to accept mixed-case letters. It seems that somewhere along the line, C-SPAN switched to using mixed-case letters for these IDs? As an aside (and in my opinion), WikiData does not need any more false constraint violations than it already has (all over the database). Thanks for any consideration. L.Smithfield (talk) 23:27, 4 May 2023 (UTC)
- United States of America-related properties
- All Properties
- Properties with external-id-datatype
- Properties used on 10000+ items
- Properties with format constraints
- Properties with unique value constraints
- Properties with constraints on items using them
- Properties with entity type constraints
- Properties with scope constraints