Wikidata:Property proposal/OpenStates Person ID
OpenStates person ID
[edit]Originally proposed at Wikidata:Property proposal/Person
Description | identifier for person entries in OpenStates.org |
---|---|
Represents | person |
Data type | External identifier |
Domain | human (Q5) |
Allowed values | ocd-person\b\/[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12} |
Example 1 | Rick Outman (Q7331626) → ocd-person/9c525554-fb85-45f3-87fa-cf6a3208ce32 |
Example 2 | Adam Hollier (Q76373561) → ocd-person/5417a310-a276-4035-8dc8-f536c61db49d |
Example 3 | Jeff Irwin (Q16216607) → ocd-person/a1c1163d-af8d-4b88-9365-07717d4a45c3 |
Example 4 | Kevin Daley (Q16730764) → ocd-person/cf7ed8de-2df4-41ff-9dcb-489980b031a0 |
Source | https://openstates.org/data/ |
Planned use | Cross-linking of Wikidata and OpenStates.org data. Used in initial resolving of entities via Mix-n-match followed by verification of (party, given name, family name, district) and sync of external IDs. The resolved Wikidata Q will be added as a new field to OpenStates for cross linking between Wikidata and OpenStates entities. |
Number of IDs in source | 7997 as of 20220128 |
Expected completeness | always incomplete (Q21873886) |
Robot and gadget jobs | Checking properties for consistency and cross check changes in data |
Applicable "stated in"-value | Open States (Q54449686) |
Distinct-values constraint | yes |
Wikidata project | The ID intersects with Wikiproject Every politician's goal for persons holding positions in the US Government federal, state, and local levels. |
Motivation
[edit]Existing data collection and cleaning of common political figure data by OpenStates as CC0 which does not currently interface with Wikidata, but directly aligns with the goals of Wikiproject Every Politician. Adding a property for an OpenStates people ID enables a bidirectional linking, import, and checks against the data work by both the Wikidata community and OpenStates. This may help with updates after each U.S. election or event. This may also assist with easier use of related Wikidata entries and Wikimedia properties through the existing notable consumers of OpenStates. Wolfgang8741 (talk) 22:32, 31 January 2022 (UTC)
Discussion
[edit]A point of discussion to complement this proposal is how best to capture both the UUID and computed ID for the web URL. The UUID is ocd-person/9c525554-fb85-45f3-87fa-cf6a3208ce32 the ID for the web URL is computed from the full name and UUID as a base62 slug-hash and relies on the full name from an OpenStates entry to compute reliably. An example instance https://openstates.org/person/rick-outman-4kyQcmzxj3evAoxO2Tx3OU/. It seems it would be important to capture the UUID then compute or import the OpenStates Computed ID to link back to the OpenStates website page for the person. This may as well be done through a second property such as OpenStates Computed ID.
- Examples 1-4 are the Q pointing to the UUID for the person and should require a qualifier of the OpenStates Computed ID for user links back to OpenStates for human readable websites values in 5-8 respectively align with examples 1-4.
- Examples 5-8 are examples of a Q linking to the equivalent slug-hash and are able to be used in a URL to resolve to the OpenStates page for the person and needs a second proposal for a property as a qualifier to a human readable entry. https://openstates.org/person/$1 for external links of https://openstates.org/person/adam-hollier-2Yg9lmx4Foq1aIJLnahN77/
- example 5 = Rick Outman (Q7331626) → ocd-person/9c525554-fb85-45f3-87fa-cf6a3208ce32 qualifier as rick-outman-4kyQcmzxj3evAoxO2Tx3OU for link https://openstates.org/person/rick-outman-4kyQcmzxj3evAoxO2Tx3OU/
- example 6 = Adam Hollier (Q76373561) → ocd-person/5417a310-a276-4035-8dc8-f536c61db49d qualifier as adam-hollier-2Yg9lmx4Foq1aIJLnahN77 for link https://openstates.org/person/adam-hollier-2Yg9lmx4Foq1aIJLnahN77/
- example 7 = Jeff Irwin (Q16216607) → ocd-person/a1c1163d-af8d-4b88-9365-07717d4a45c3 qualifier as jeff-irwin-4vE0LVSe6s1WAHbPFtfK5r for link https://openstates.org/person/jeff-irwin-4vE0LVSe6s1WAHbPFtfK5r/
- example 8 = Kevin Daley (Q16730764) → ocd-person/cf7ed8de-2df4-41ff-9dcb-489980b031a0 qualifier as kevin-daley-6JXNcWtsjl88K15PkmktsW for link https://openstates.org/person/kevin-daley-6JXNcWtsjl88K15PkmktsW/
Consensus on this approach could enable further data linking and integration with OpenStates. OpenStates was contacted on their Slack prior to this proposal to check feasibility of integrating a WikiData ID and are open to integrating back the Wikidata Identifier to each corresponding entry. Wolfgang8741 (talk) 22:32, 31 January 2022 (UTC)
- Second property for the computed ID sounds ok to me. ArthurPSmith (talk) 19:00, 1 February 2022 (UTC)
- It would be preferable to determine a single identifier with a formatter URL that could be included in Wikidata.
- BTW what happens when the spelling of a person's name is slightly changed? Does that recalculate everything? --- Jura 18:27, 3 February 2022 (UTC)
- @Jura1 I agree a single identifier with a formatter URL would be ideal. After some messaging with a maintainer of OpenStates, a slug change would redirect to a canonical slug-hash ID URL. This does mean at this time that any dumps used for the import need to compute this ID from the full name key and ID key using the slug-base62 conversions of each respective field. The OpenStates API can lookup data with just the base62 hash supplied. Should I refactor the proposal based on use of the slug-hash ID and formatter URL? Example 5 would become Rick Outman (Q7331626) → rick-outman-4kyQcmzxj3evAoxO2Tx3OU for formatted link [1]https://openstates.org/person/rick-outman-4kyQcmzxj3evAoxO2Tx3OU/ and yes, the slug may have more than two dashes depending on the name of the individual, for example: [2]https://github.com/openstates/people/blob/main/data/mi/legislature/Cynthia-A-Johnson-8579a776-ec6d-4239-b1a2-5e89dd3cee49.yml Wolfgang8741 (talk) 22:58, 9 February 2022 (UTC)
- @Jura1@ArthurPSmith Most of the data is now resolved to Wikidata Entities in the mix'n'match catalog 5046 pending final Proposal ID format approval. I did find in practice the redirects have a bug and still encourage including the qualifier property of the ocd-person/ format as it affords access to the underlying data while the computed URL is not included in the raw data. We can make the computed ID the primary ID and the ocd-person the qualifier ID from a constraint perspective. Wolfgang8741 (talk) 16:53, 22 February 2022 (UTC)
- @Jura1 I agree a single identifier with a formatter URL would be ideal. After some messaging with a maintainer of OpenStates, a slug change would redirect to a canonical slug-hash ID URL. This does mean at this time that any dumps used for the import need to compute this ID from the full name key and ID key using the slug-base62 conversions of each respective field. The OpenStates API can lookup data with just the base62 hash supplied. Should I refactor the proposal based on use of the slug-hash ID and formatter URL? Example 5 would become Rick Outman (Q7331626) → rick-outman-4kyQcmzxj3evAoxO2Tx3OU for formatted link [1]https://openstates.org/person/rick-outman-4kyQcmzxj3evAoxO2Tx3OU/ and yes, the slug may have more than two dashes depending on the name of the individual, for example: [2]https://github.com/openstates/people/blob/main/data/mi/legislature/Cynthia-A-Johnson-8579a776-ec6d-4239-b1a2-5e89dd3cee49.yml Wolfgang8741 (talk) 22:58, 9 February 2022 (UTC)
WikiProject every politician has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
- Support I literally came here to propose this and then noticed it was already in progress. Having multiple IDs is frustrating. I think I support having the computed be the primary ID (because it's linkable) and having the ocd-person link be secondary (or really we could maybe skip it for now). BrokenSegue (talk) 01:08, 10 July 2022 (UTC)
- Support from me as well. ChristianKl ❪✉❫ 11:24, 5 December 2022 (UTC)
- @Wolfgang8741, BrokenSegue, ChristianKl: Done --Tinker Bell ★ ♥ 16:26, 22 December 2022 (UTC)