Wikidata:Property proposal/relative statement frequency
Jump to navigation
Jump to search
relative statement frequency
[edit]Originally proposed at Wikidata:Property proposal/Generic
Not done
Motivation
[edit]nature of statement (P5102) is used to describe many things about statements, one of them being relative frequency a statement is present on instances of a class. This proposal proposes a dedicated property for this relationship to replace nature of statement (P5102) so that users can easily find and select a value for describing this relationship. It also restricts this property to being used on classes unlike nature of statement (P5102) which can be used anywhere and could document relationships using the values of this property with unknown or badly-modeled meanings. Lectrician1 (talk) 03:53, 16 December 2022 (UTC)
Discussion
[edit]- WikiProject Properties has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. --Tinker Bell ★ ♥ 15:59, 26 December 2022 (UTC)
- Support, but for allowed values, instead of a discrete list, it would be better if it were any item with instance of (P31) relative frequency (Q5374249). Josh Baumgartner (talk) 16:04, 26 December 2022 (UTC)
- @Joshbaumgartner I'm restricting it to allowed values to prevent users from creating many options that may be very similar in definition and thus make people confused. If you have any other suggestions for allowed values, I'd be happy to add them. Lectrician1 (talk) 16:33, 26 December 2022 (UTC)
- The selected value should match the reference exactly for things like this, so one would need to be able to create values to match what the source material says. For example, if I had a source that says "Humans have eyes most of the time." none of your list would be a match for that. I could guess and pick the closest one by my estimation, but that would not be true to the original source at that point. One problem I have with your list is that they not only do not have instance of (P31) relative frequency (Q5374249), but they also do not have any statements that quantify what exactly they imply. Thus they are just a list of subjective words that could mean anything (well not literally anything, but what exactly they do mean is left to subjective opinion). Having the allowed values limited to items with instance of (P31) relative frequency (Q5374249) means that new ones can be created to match source material claims as needed. The fact they have to be created and properly assigned a instance of (P31) value I think will deter most folks from creating new ones willy-nilly. Josh Baumgartner (talk) 16:57, 26 December 2022 (UTC)
- @Joshbaumgartner I'm restricting it to allowed values to prevent users from creating many options that may be very similar in definition and thus make people confused. If you have any other suggestions for allowed values, I'd be happy to add them. Lectrician1 (talk) 16:33, 26 December 2022 (UTC)
- Question @Lectrician1: Can you give a few examples how this is currently solved using nature of statement (P5102)? Thanks, Yellowcard (talk) 16:17, 26 December 2022 (UTC)
- Added in 4 and 5 @Yellowcard Lectrician1 (talk) 16:39, 26 December 2022 (UTC)
- Weak oppose Reading the comments, I think this property would perhaps be more appropriate to be a sense datatype where the lexeme is an adjective. The property would thus be "adjective used to describe frequency". I can also see the need for a related property that has an item datatype but only if the allowed values are those specified in a particular standard for e.g. risk assessments. For example, [1] provides a sample risk matrix for the European Space Agency (Q42262) and defines various likelihoods. Other organisations around the world define their own likelihoods, often reusing the same adjective for each likelihood but having a different criteria (e.g. "rare occurrence" could mean once every 100 years in one standard, or once every 10000 years in another standard). I'm less interested in pursuing the item data type property without being able to provide some decent examples of how it would be used (with sources). --Dhx1 (talk) 03:35, 27 December 2022 (UTC)
- Using a sense datatype would make statements language dependent, which is unnecessary for the information that we are trying to model. — The Erinaceous One 🦔 05:05, 27 December 2022 (UTC)
- My issue is that rarely (Q28962310) (and other example allowed values) are currently not defined in any meaningful way. Are the 5 allowed values specified meant to represent quintile (Q3176606)s of a normal distribution (Q133871) regardless of language? Or are allowed values meant to be items which should have described by source (P1343) technical standard (Q317623) (e.g. risk management standard used by NASA, or another organisation) and may therefore have a wide range of definitions and mapping to quantitative probabilities? Dhx1 (talk) 13:52, 27 December 2022 (UTC)
- Using a sense datatype would make statements language dependent, which is unnecessary for the information that we are trying to model. — The Erinaceous One 🦔 05:05, 27 December 2022 (UTC)
- Oppose in the current form. The proposed values are very vague. Having vague values and disallowing more precise one's like the ESA risk assessment statuses that Dhx1 speaks about seems like a bad modeling decision.
- Storing information for every anatomical structure about how often it appears in humans would add thousands of statements to Q5. It would make more sense to store the information in the items of the individual human anatomical structures.
- It seems to me like the example in this property were created without looking at any data sources. For creating a new property like this it would be worthwhile to first look at the datasources and then think how that data could best be represented on Wikidata. ChristianKl ❪✉❫ 16:22, 17 January 2023 (UTC)
- @Dhx1 @ChristianKl @The-erinaceous-one What if we had like "percentage chance levels" in order to make the values make more sense? For example, 10, 20, 50, 80, 90.
- Also, it's extremely difficult to find sources for these examples. Look at them.
- I still think it's important that we find a way to document that these statements vary in frequency. We need to find some way to do this. Lectrician1 (talk) 06:15, 20 March 2023 (UTC)
- Finding sources about how many humans actually have human eyes isn't easy. At the same time the scarcity of sources makes it much more valuable to have data because it's not something that someone can currently easily look up elsewhere.
- I'd be define with quantity as a datatype for the percentage. ChristianKl ❪✉❫ 11:25, 20 March 2023 (UTC)
- @Lectrician1, Yellowcard, Dhx1, Joshbaumgartner, The-erinaceous-one, ChristianKl: Not done no consensus at this time for creation of this property. --99of9 (talk) 00:53, 9 June 2023 (UTC)