Wikidata:Request a query/Archive/2022/02
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion. |
search for books in a book series
I would like to be able to provide an author, character, or book name and get a list of book series and the books in order. This could be three separate queries if need be. One for author, one for character, and one for book title.
I have tried alot of different queries including dbpedia.org queries but wikidata seems like a better place for this kind of stuff.
The following query works but I only get 3 titles back when I know there 20+ books by this author. Is this just that wikidata doesn't have the data where as wikipedia does? Or is there something wrong with my query. This query is just getting book titles by author. I believe I can figure out how to get books in order once I can get all the books by an author.
https://www.wikidata.org/wiki/Wikidata:WikiProject_Books The above link seems to indicate that I should be able to get books before and after a particular book using p155 follows p156 followed by; and p179 part of the series.
I have also tried limiting the search to wdt:P31/wdt:P279* wd:Q571. Which I think would limit the items to books but this seems to remove some of John Sandfords books from the result set.
Thanks much for any help that can be provided.
SELECT distinct ?book ?bookLabel ?authorLabel WHERE {
?author rdfs:label "John Sandford"@en . ?book wdt:P50 ?author .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
- I don't think there are items for more than the 3 books by John Sandford which you have found. Here is a query which can order the books after part of the series (P179) and series ordinal (P1545) if these are present. You can try it with other authors. --Dipsacus fullonum (talk) 10:10, 1 February 2022 (UTC)Try it!
SELECT ?book ?bookLabel ?seriesLabel ?number ?followsLabel ?followed_byLabel WHERE { ?book wdt:P50 wd:Q1701679 . # book by John Sandford OPTIONAL { ?book p:P179 ?series_statement . ?series_statement ps:P179 ?series . OPTIONAL { ?series_statement pq:P1545 ?number . } OPTIONAL { ?series_statement pq:P155 ?follows . } OPTIONAL { ?series_statement pq:P156 ?followed_by . } } SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } ORDER BY ?series ?number
- Reminder that there's a What links here link on all items, in this case usefully confirming that only four items point to this author. --Tagishsimon (talk) 11:07, 1 February 2022 (UTC)
- Thanks much for the query and comments. When I look at the wikipedia article for John Sandford it contains all the books in the 'Prey' series. However, as you point out it appears that wikidata does not contain this information. Do you know why that would be? I thought wikidata was built from wikipedia.
- Another question I have for you is that I changed authors to Lee Child. In this case I see most of not all of his books. However, the labels for followedby and follows are empty. However, when I look at the items individually those items have values. Do you know why the labels are not being output in the results?
- Thanks
- Brady
- SELECT ?book ?bookLabel ?seriesLabel ?number ?followsLabel ?followed_byLabel
- WHERE
- {
- ?book wdt:P50 wd:Q333719 . # book by Lee Child
- OPTIONAL
- {
- ?book p:P179 ?series_statement .
- ?series_statement ps:P179 ?series .
- OPTIONAL { ?series_statement pq:P1545 ?number . }
- OPTIONAL { ?series_statement pq:P155 ?follows . }
- OPTIONAL { ?series_statement pq:P156 ?followed_by . }
- }
- SERVICE wikibase:label {
- bd:serviceParam wikibase:language "en" .
- }
- }
- ORDER BY ?series ?number
Bradyguy18 (talk) 17:34, 1 February 2022 (UTC)
- @Bradyguy18: The query looks for follows (P155) and followed by (P156) used as qualifiers to a statement with part of the series (P179). But for the Lee Child book items, follows (P155) and followed by (P156) are instead used in independent statements, which the query doesn'ẗ look for. It means that the context for the values for P155 or P156 isn't stated, and I think that is a problem. But the query can of course be changed to also find the values for P155 and P156 when used independently, e.g. like this: --Dipsacus fullonum (talk) 19:15, 1 February 2022 (UTC)Try it!
SELECT ?book ?bookLabel ?seriesLabel ?number ?followsLabel ?followed_byLabel WHERE { ?book wdt:P50 wd:Q333719 . # book by Lee Child OPTIONAL { ?book p:P179 ?series_statement . ?series_statement ps:P179 ?series . OPTIONAL { ?series_statement pq:P1545 ?number . } OPTIONAL { ?series_statement pq:P155 ?follows_qualifier . } OPTIONAL { ?series_statement pq:P156 ?followed_by_qualifier . } } OPTIONAL { ?book wdt:P155 ?follows_main . } OPTIONAL { ?book wdt:P156 ?followed_by_main . } BIND (COALESCE(?follows_qualifier, ?follows_main) AS ?follows) BIND (COALESCE(?followed_by_qualifier, ?followed_by_main) AS ?followed_by) SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } ORDER BY ?series xsd:integer(?number)
- @Bradyguy18: The query looks for follows (P155) and followed by (P156) used as qualifiers to a statement with part of the series (P179). But for the Lee Child book items, follows (P155) and followed by (P156) are instead used in independent statements, which the query doesn'ẗ look for. It means that the context for the values for P155 or P156 isn't stated, and I think that is a problem. But the query can of course be changed to also find the values for P155 and P156 when used independently, e.g. like this:
- Reminder that there's a What links here link on all items, in this case usefully confirming that only four items point to this author. --Tagishsimon (talk) 11:07, 1 February 2022 (UTC)
Atypical uses of P642
Items with any statement qualified with of (P642), where the property of the main statement is not instance of (P31), subclass of (P279), or URL (P2699). Thanks! Swpb (talk) 17:07, 1 February 2022 (UTC)
- @Swpb: There are too many to list all. Here is a query limited to 1000 results: --Dipsacus fullonum (talk) 19:35, 1 February 2022 (UTC)Try it!
SELECT ?item ?itemLabel ?statement ?prop ?propLabel ?valueLabel ?forLabel WITH { SELECT ?item ?claim ?statement ?for WHERE { ?item ?claim ?statement . ?statement pq:P642 ?for . FILTER (?claim != p:P31 && ?claim != p:P279 && ?claim != p:P2699) } LIMIT 1000 } AS %query WHERE { INCLUDE %query ?prop wikibase:claim ?claim . ?prop wikibase:statementProperty ?sp . ?statement ?sp ?value . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . } }
- Thanks! Swpb (talk) 20:27, 1 February 2022 (UTC)
Films: One film per row; cells with > 1 value
I've adapted the following miserably little query of films made in 2017. I'm looking for 5 core items in columns: country, genre, language, director and script writer. Each film should have ONE row (at present, some have 20!). All 5 columns can have more than one value eg the film could be in more than one language. I've tried adding two things: GROUP_CONCAT and Group by, but the query fails. The separator should be a $$$ rather than a comma (end result = csv file). Adding GROUP_CONCAT to every one of the 5 items seems repetitive, and I'm sure there's a simpler way of doing this? If there's documentation on GROUP_CONCAT, please let me know.
SELECT DISTINCT ?item ?itemLabel ?languageLabel ?countryLabel ?directorLabel ?genreLabel ?scriptLabel (GROUP_CONCAT(DISTINCT(str(?genre)); separator="$$$ ") as ?genre)
WHERE {
?item wdt:P31 wd:Q11424.
?item wdt:P577 ?pubdate.
OPTIONAL { ?item wdt:P495 ?country }
OPTIONAL { ?item wdt:P136 ?genre }
OPTIONAL { ?item wdt:P364 ?language }
OPTIONAL { ?item wdt:P57 ?director }
OPTIONAL { ?item wdt:P58 ?script }
FILTER((?pubdate >= "2017-01-01T00:00:00Z"^^xsd:dateTime) && (?pubdate <= "2017-12-31T00:00:00Z"^^xsd:dateTime))
SERVICE wikibase:label { bd:serviceParam wikibase:language "cy,en,de,fr,ru,bg,cs,po,pl,nl,sl,eun,ro,oc,az,ast,my,tl,ga,br,it,cs,hu,ca,es,no,nn,gl". }
}
GROUP BY ?genre ?item ?itemLabel
ORDER BY ?item
Many thanks! Llywelyn2000 (talk) 07:28, 2 February 2022 (UTC)
- @Llywelyn2000: Probably this sort of thing, presuming it is the labels for country, language &c you're after. But yes, you need to repeatedly groupconcat for each element for which you want all values on a single row. And b/c the query has to groupconcat labels, the label service needs to be told to produce them early enough, hence all of the rdfs:label stuff in the label service call - one clause for each label, including the non-groupconcat'd ?itemLabel.
- --Tagishsimon (talk) 08:25, 2 February 2022 (UTC)Try it!
SELECT DISTINCT ?item ?itemLabel (GROUP_CONCAT(DISTINCT ?languageLabel_ ; separator="$$$ ") as ?languageLabel) (GROUP_CONCAT(DISTINCT ?countryLabel_ ; separator="$$$ ") as ?countryLabel) (GROUP_CONCAT(DISTINCT ?directorLabel_ ; separator="$$$ ") as ?directorLabel) (GROUP_CONCAT(DISTINCT ?genreLabel_ ; separator="$$$ ") as ?genreLabel) (GROUP_CONCAT(DISTINCT ?scriptLabel_ ; separator="$$$ ") as ?scriptLabel) (GROUP_CONCAT(DISTINCT(str(?genre)); separator="$$$ ") as ?genre) WHERE { ?item wdt:P31 wd:Q11424. ?item wdt:P577 ?pubdate. OPTIONAL { ?item wdt:P495 ?country } OPTIONAL { ?item wdt:P136 ?genre } OPTIONAL { ?item wdt:P364 ?language } OPTIONAL { ?item wdt:P57 ?director } OPTIONAL { ?item wdt:P58 ?script } FILTER((?pubdate >= "2017-01-01T00:00:00Z"^^xsd:dateTime) && (?pubdate <= "2017-12-31T00:00:00Z"^^xsd:dateTime)) SERVICE wikibase:label { bd:serviceParam wikibase:language "cy,en,de,fr,ru,bg,cs,po,pl,nl,sl,eun,ro,oc,az,ast,my,tl,ga,br,it,cs,hu,ca,es,no,nn,gl". ?item rdfs:label ?itemLabel . ?language rdfs:label ?languageLabel_ . ?country rdfs:label ?countryLabel_ . ?director rdfs:label ?directorLabel_ . ?genre rdfs:label ?genreLabel_ . ?script rdfs:label ?scriptLabel_ . } } GROUP BY ?item ?itemLabel ORDER BY ?item
- (after edit conflict) @Llywelyn2000: The solution is to use GROUP_CONCAT for all 5 columns where you want to aggregate the values as you say. I know of no special documentation on GROUP_CONCAT besides the SPARQL definition itself (https://www.w3.org/TR/sparql11-query/) and various tutorials. But don't GROUP BY the variable which you will aggregate. That will put different values in different rows. Also don't reuse variable names after "AS". I added an underscore in some variable names to avoid that in the query below. Finally you cannot aggregate over variables created by the label service in automatic mode, but have to switch to the manual mode. This is explained in the user manual in section mw:Wikidata Query Service/User Manual#Label service.
- --Dipsacus fullonum (talk) 08:31, 2 February 2022 (UTC)Try it!
SELECT DISTINCT ?item ?itemLabel (GROUP_CONCAT(DISTINCT ?countryLabel; separator="$$$ ") AS ?country) (GROUP_CONCAT(DISTINCT ?genreLabel; separator="$$$ ") AS ?genre) (GROUP_CONCAT(DISTINCT ?languageLabel; separator="$$$ ") AS ?language) (GROUP_CONCAT(DISTINCT ?directorLabel; separator="$$$ ") AS ?director) (GROUP_CONCAT(DISTINCT ?scriptLabel; separator="$$$ ") AS ?script) WHERE { ?item wdt:P31 wd:Q11424 . ?item wdt:P577 ?pubdate . hint:Prior hint:rangeSafe true . OPTIONAL { ?item wdt:P495 ?country_ } OPTIONAL { ?item wdt:P136 ?genre_ } OPTIONAL { ?item wdt:P364 ?language_} OPTIONAL { ?item wdt:P57 ?director_ } OPTIONAL { ?item wdt:P58 ?script_ } FILTER ((?pubdate >= "2017-01-01T00:00:00Z"^^xsd:dateTime) && (?pubdate <= "2017-12-31T00:00:00Z"^^xsd:dateTime)) SERVICE wikibase:label { bd:serviceParam wikibase:language "cy,en,de,fr,ru,bg,cs,po,pl,nl,sl,eun,ro,oc,az,ast,my,tl,ga,br,it,cs,hu,ca,es,no,nn,gl" . ?item rdfs:label ?itemLabel . ?country_ rdfs:label ?countryLabel . ?language_ rdfs:label ?languageLabel . ?genre_ rdfs:label ?genreLabel . ?director_ rdfs:label ?directorLabel . ?script_ rdfs:label ?scriptLabel . } } GROUP BY ?item ?itemLabel ORDER BY ?item
- @Dipsacus fullonum, Tagishsimon: Whow! Many, many thanks both of you! This will be really handy, and thanks for the explanation! Best regards to both of you! Llywelyn2000 (talk) 08:48, 2 February 2022 (UTC)
Extract columns from regex
Given the following query
SELECT ?item ?id ?name
WHERE {
?item p:P396 ?st .
?st ps:P396 ?id .
?st pq:P1810 ?name . FILTER(REGEX(?name, "<([\\S\\s]+\\d+[\\S\\s]*)-([\\S\\s]*)"))
}
you can see that my regex contains two subgroups, ([\\S\\s]+\\d+[\\S\\s]*) and ([\\S\\s]*). Is there a way to assign two variables to these two subgroups, so that I can display them as columns in the results of the query? Thanks, --Epìdosis 13:41, 4 February 2022 (UTC)
- I have found (unexpectedly) an inelegant but somehow efficient solution:
SELECT ?item ?id ?name ?first ?second
WHERE {
?item p:P396 ?st .
?st ps:P396 ?id .
?st pq:P1810 ?name . FILTER(REGEX(?name, "<(.+\\d+.*)-(.*)"))
BIND(REPLACE(STR(?name),".+<","") AS ?first_cl)
BIND(REPLACE(STR(?first_cl),"-.+>","") AS ?first)
BIND(REPLACE(STR(?name),".+<.+-","") AS ?second_cl)
BIND(REPLACE(STR(?second_cl),">","") AS ?second)
}
- probably it could be written in a better way, so suggestions are still welcome! --Epìdosis 14:25, 4 February 2022 (UTC)
- @Epìdosis: You can extract matched substrings by using "$n" as the replacement string i the replace function. You will need to add "^.*" and ".*$" to the search pattern to ensure the whole string is matched and replaced:
- --Dipsacus fullonum (talk) 15:31, 4 February 2022 (UTC)Try it!
SELECT ?item ?id ?name ?first ?second WHERE { ?item p:P396 ?st . ?st ps:P396 ?id . ?st pq:P1810 ?name . FILTER REGEX(?name, "<([\\S\\s]+\\d+[\\S\\s]*)-([\\S\\s]*)") BIND (REPLACE (?name, "^.*<([\\S\\s]+\\d+[\\S\\s]*)-([\\S\\s]*).*$", "$1") AS ?first) BIND (REPLACE (?name, "^.*<([\\S\\s]+\\d+[\\S\\s]*)-([\\S\\s]*).*$", "$2") AS ?second) }
- @Dipsacus fullonum: That's perfect! I had tried to use something like "$n", but I didn't know exactly how to do it. Thanks! --Epìdosis 15:36, 4 February 2022 (UTC)
Need help to search for New York Times journalists that are alive
I tried this query after reading some documentation and examples, but I'm not getting what I want. Here's the query code:
SELECT ?item ?itemLabel
WHERE {
?item wdt:P31 wd:Q5. # instance of human
?item wdt:P106 wd:Q1930187. # occupation is journalist
?item wdt:P108 wd:Q9684. # employer is the New York Times (NYT)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } #itemLabel is in English
}
ORDER BY DESC(?P569) # supposed to order the list by date of birth
My query gives a list where the date of birth is completely random. #2 in the list can be born in 1951, and further down in the list, someone is born in 1970.
I'd like to know if it's possible to 1)query only living people, who are journalists, and are employed by NYT or 2)query all journalists employed by NYT, but in descending order (so it goes from youngest to oldest)
Thanks! Tet (talk) 02:38, 4 February 2022 (UTC)
- The reason the list was in random order was because you have to retrieve the data for the variable you sort on. I'm assuming you want only current employees listed, so I added that requirement as well.
- Try it!
SELECT DISTINCT ?item ?itemLabel ?dob ?age WHERE { ?item wdt:P106 wd:Q1930187; p:P108/ps:P108 wd:Q9684; p:P108 ?st. OPTIONAL { ?item wdt:P569 ?dob. BIND(xsd:integer((NOW() - ?dob) / 365.25) AS ?age) } MINUS { # Former employee ?st ps:P108 wd:Q9684; pq:P582 []. } MINUS { ?item wdt:P570 ?dod. } # Living deprived SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } ORDER BY DESC(?dob)
- —Infrastruktur (talk) 07:10, 4 February 2022 (UTC)
- Infrastruktur wrote "I'm assuming you want only current employees listed, so I added that requirement as well."
- Well, not exactly. You removed persons that only have P108 statements with The New York Times (Q9684) as value and an end date. Journalists who also had other employers, was still listed even if they no longer is at NYT, like. e.g. Sewell Chan (Q7458218). I suggest this query instead: Try it!
SELECT DISTINCT ?item ?itemLabel ?dob ?age WHERE { ?item wdt:P106 wd:Q1930187; p:P108 ?NYT_statement. ?NYT_statement ps:P108 wd:Q9684; OPTIONAL { ?item wdt:P569 ?dob. BIND(xsd:integer((NOW() - ?dob) / 365.25) AS ?age) } MINUS { # Former employee ?NYT_statement pq:P582 []. } MINUS { ?item wdt:P570 ?dod. } # Living deprived SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } ORDER BY DESC(?dob)
You can also add check for a value for date of death (P570), but I didnẗ do that.--Dipsacus fullonum (talk) 09:33, 4 February 2022 (UTC)- There already is a check for a death date, so I struck the last comment. --Dipsacus fullonum (talk) 09:40, 4 February 2022 (UTC)
- Thank you both! Tet (talk) 19:20, 4 February 2022 (UTC)
- Nice catch. Also there's a couple of journos with birthdate given as a century, so look out for anyone around 22 or 122 years old, as their age is probably wrong. Infrastruktur (talk) 07:13, 5 February 2022 (UTC)
All NGOs with revenue in 2017
Hello wizzards of SPARQL, since OpenRefine has problems with qualifiers, I'm in need of a list of all NGOs (instance of (P31) is nonprofit organization (Q163740) and its subclasses) which have a total revenue (P2139) with the qualifier point in time (P585) being "2017" or some date in 2017. Thank you! --Newt713 (talk) 13:40, 7 February 2022 (UTC)
- @Newt713: This, I think:
- --Tagishsimon (talk) 14:21, 7 February 2022 (UTC)Try it!
SELECT ?item ?itemLabel ?revenue ?unitLabel ?time WHERE { ?item wdt:P31/wdt:P279* wd:Q163740 . ?item p:P2139 ?stat. ?stat psv:P2139/wikibase:quantityAmount ?revenue . ?stat psv:P2139/wikibase:quantityUnit ?unit. ?stat pq:P585 ?time . hint:Prior hint:rangeSafe true. FILTER("2017-00-00"^^xsd:dateTime <= ?time && ?time < "2018-00-00"^^xsd:dateTime) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
- @Tagishsimon: Perfect. That's it! Now I can exclude all those from my import. Thank you for your fast and profound help! --Newt713 (talk) 15:01, 7 February 2022 (UTC)
limit OPTIONAL
SELECT DISTINCT ?item ?birth ?pob
WHERE {
VALUES ?item { wd:Q6701 }
OPTIONAL {?item wdt:P569 ?birth .}
OPTIONAL {?item wdt:P19 ?pob .}
}
This selects P569 (date of birth) and P19 (place of birth) from Q6701 (Jacob Grimm). But it returns two results, because there are two entries for P19 with the same rank. Is there a way to limit OPTIONAL to the first result? It's important that only P19 is limited because for P569 I want two results if there two entries with the same rank for the date. --Slomox (talk) 14:58, 7 February 2022 (UTC)
- @Slomox: You'd probably solve that issue by aggregating the results on ?item and ?birth, and then GROUP_CONCAT() or SAMPLE() the ?pob. There'd be more work to do if you wanted the ?pobLabel - can lookat that if you wish.
- --Tagishsimon (talk) 16:45, 7 February 2022 (UTC)Try it!
SELECT DISTINCT ?item ?birth (GROUP_CONCAT(?pob) as ?placeOfBirth) WHERE { VALUES ?item { wd:Q6701 } OPTIONAL {?item wdt:P569 ?birth .} OPTIONAL {?item wdt:P19 ?pob .} } group by ?item ?birth
- @Slomox: (After edit conflict) "Is there a way to limit OPTIONAL to the first result?" There is no first result as there is no defined order. You can limit to one random result by grouping on all other variables in the select clause:
- Try it!
SELECT DISTINCT ?item ?birth (SAMPLE(?pob) AS ?sample_pob) WHERE { VALUES ?item { wd:Q6701 } OPTIONAL {?item wdt:P569 ?birth .} OPTIONAL {?item wdt:P19 ?pob .} } GROUP BY ?item ?birth
- Another way is to convert the optional clause into a subquery with a limit:
- --Dipsacus fullonum (talk) 16:59, 7 February 2022 (UTC)Try it!
SELECT DISTINCT ?item ?birth ?pob WHERE { VALUES ?item { wd:Q6701 } OPTIONAL {?item wdt:P569 ?birth .} { SELECT ?pob WHERE { VALUES ?item { wd:Q6701 } OPTIONAL {?item wdt:P19 ?pob .} } LIMIT 1 }
- Thank you to both of you! I went with Tagishsimon's solution and it solved my problem. --Slomox (talk) 17:38, 7 February 2022 (UTC)
Group by the domain of the URL
Want to transform the following query just shows everything from URL upto (but not including) the item following the last slash and groups them by it so that for example <http://data.thenextweb.com/tnw/entity/belgium>becomes http://data.thenextweb.com/tnw/entity/ so that we can see who is using this URL service and what URL we need to create URL pattern matches for, I have some regex skill but havn't had any luck applying them to SPARQL
# This query finds all usages of property P6363 AS a statement.
SELECT ?item ?itemLabel ?statement ?rank ?snacktype ?value
{
?item p:P6363 ?statement .
?statement wikibase:rank ?rank .
OPTIONAL { ?statement ps:P6363 ?dbvalue }
BIND(COALESCE(IF(wikibase:isSomeValue(?dbvalue), "unknown", "concrete"), "novalue") AS ?snacktype)
BIND(IF(wikibase:isSomeValue(?dbvalue), "", ?dbvalue) AS ?value)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
LIMIT 1000
Back ache (talk) 15:51, 8 February 2022 (UTC)
- No fancy SPARQL needed in this case I guess.
- Infrastruktur (talk) 17:44, 8 February 2022 (UTC)Try it!
SELECT DISTINCT ?value { ?item wdt:P6363 ?dbvalue . BIND(URI(REPLACE(STR(?dbvalue), "^(https?://.+/)(.*?)$", "$1")) AS ?value) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } } ORDER BY ?value
- @Back ache: It can be even shorter. You can create a new IRI value from an existing IRI with all characters after the last slash removed with:
BIND (IRI(REPLACE(STR(?value), "^(.*/).*$", "$1")) AS ?path)
- You didn't tell how to aggregate other variables in a group, so I didn't have enough information to write a query with a GROUP BY section. BTW, your query above links to WCQS (Wikimedia Commons Query Service). I suppose that is an error. Remove the template argument
|project=sdc
to make a link to WDQS instead. --Dipsacus fullonum (talk) 18:00, 8 February 2022 (UTC)
get linked wikipedia pages
I want to write a query where I can get all the linked Wikipedia pages' URL and languages as mentioned on the wikidata page. For example, https://www.wikidata.org/wiki/Q155164 mentions 73 entries for Wikipedia pages with different languages at the end. I need a query to get that table
- The data model is described at https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Sitelinks and suggests somethng like:
- --Tagishsimon (talk) 06:27, 9 February 2022 (UTC)Try it!
SELECT ?item ?itemLabel ?article ?sitelink ?platform ?group ?language WHERE { VALUES ?item {wd:Q155164} ?article schema:about ?item ; schema:isPartOf ?platform ; schema:name ?sitelink ; schema:inLanguage ?language . ?platform wikibase:wikiGroup ?group . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
Query failing
Hello, the below query has worked fine for years with Listeria, but for some times it encounters a timeout. Can someone help me optimizing it please?
SELECT ?item ?INSEE ?itemLabel (CONCAT("[[Wikidata:WikiProject France/Églises/",?INSEE,"\u007C",?itemLabel,"]]") AS ?LIEN) ?depCount (IF(?_withCoord > ?depCount, 0, ?depCount - ?_withCoord) AS ?withoutCoord) (IF(?_withImg > ?depCount, 0, ?depCount - ?_withImg) AS ?withoutImg) (ROUND((IF(?_withCoord > ?depCount, ?depCount, ?_withCoord) / ?depCount) * 100 * 100) / 100 AS ?withCoord) (ROUND((IF(?_withImg > ?depCount, ?depCount, ?_withImg) / ?depCount) * 100 * 100) / 100 AS ?withImg)
WHERE {
{
SELECT ?item (COUNT(DISTINCT ?item2) AS ?depCount) (COUNT(DISTINCT ?coord) AS ?_withCoord) (COUNT(DISTINCT ?img) AS ?_withImg) WHERE {
?item2 wdt:P131+ wd:Q142.
hint:Prior hint:gearing "forward".
?item2 wdt:P31/wdt:P279* wd:Q16970.
hint:Prior hint:gearing "forward".
?item2 wdt:P131 ?commune.
?commune p:P131 ?stmt .
?stmt ps:P131 ?item .
MINUS { ?stmt pq:P582 [] } .
?item wdt:P2586 [] .
OPTIONAL { ?item2 wdt:P625 ?coord } .
OPTIONAL { ?item2 wdt:P18 ?img } .
} GROUP BY ?item
} .
?item wdt:P2586 ?INSEE .
SERVICE wikibase:label { bd:serviceParam wikibase:language "fr" } .
}
ORDER BY ?INSEE
Thanks! Ayack (talk) 14:09, 9 February 2022 (UTC)
- @Ayack It seems to help a bit to remove the « minus » checking that the statement has no end date and to rely on the preferred rank. It assumes any former commune has a better ranked statement (even with no value Help as preferred if needed) but this seems workable.
- What seems to be the most efficient however is to rely on the commune of France (Q484170) to find all the communes. It seems to cut by about more than a half the time of the query. Are the results OK ?
- author TomT0m / talk page 15:48, 9 February 2022 (UTC)Try it!
SELECT ?item ?INSEE ?itemLabel (CONCAT("[[Wikidata:WikiProject France/Églises/",?INSEE,"\u007C",?itemLabel,"]]") AS ?LIEN) ?depCount (IF(?_withCoord > ?depCount, 0, ?depCount - ?_withCoord) AS ?withoutCoord) (IF(?_withImg > ?depCount, 0, ?depCount - ?_withImg) AS ?withoutImg) (ROUND((IF(?_withCoord > ?depCount, ?depCount, ?_withCoord) / ?depCount) * 100 * 100) / 100 AS ?withCoord) (ROUND((IF(?_withImg > ?depCount, ?depCount, ?_withImg) / ?depCount) * 100 * 100) / 100 AS ?withImg) WHERE { { SELECT ?item (COUNT(DISTINCT ?item2) AS ?depCount) (COUNT(DISTINCT ?coord) AS ?_withCoord) (COUNT(DISTINCT ?img) AS ?_withImg) WHERE { ?commune wdt:P31/wdt:P279* wd:Q484170 . ?item2 wdt:P31/wdt:P279* wd:Q16970. hint:Prior hint:gearing "forward". ?item2 wdt:P131 ?commune. ?commune wdt:P131 ?item . ?item wdt:P2586 ?INSEE . OPTIONAL { ?item2 wdt:P625 ?coord } . OPTIONAL { ?item2 wdt:P18 ?img } . } GROUP BY ?item } . ?item wdt:P2586 ?INSEE . SERVICE wikibase:label { bd:serviceParam wikibase:language "fr" } . } ORDER BY ?INSEE
- @Ayack: Another possibility is to just remove It isn't necessary as only French departments have INSEE department code (P2586). --Dipsacus fullonum (talk) 16:18, 9 February 2022 (UTC)
?item2 wdt:P131+ wd:Q142. hint:Prior hint:gearing "forward".
- Thank you both. It's now working in WDQ but the list is still not updated in WD. I was thinking it was due to timeout but it seems actually to be a Listeria issue: [1] :(. Ayack (talk) 16:44, 9 February 2022 (UTC)
- It might be the quote in one of the column name. The new Rust listeria is more sensitive to that kind on thing, I tried to remove it. author TomT0m / talk page 16:48, 9 February 2022 (UTC)
- @Ayack That was that, it worked. author TomT0m / talk page 16:49, 9 February 2022 (UTC)
- Great! Thanks @TomT0m! Ayack (talk) 16:51, 9 February 2022 (UTC)
- Thank you both. It's now working in WDQ but the list is still not updated in WD. I was thinking it was due to timeout but it seems actually to be a Listeria issue: [1] :(. Ayack (talk) 16:44, 9 February 2022 (UTC)
- @Ayack: Another possibility is to just remove
Billionaires
I thought it might be fun to see what sort of data WD contained concerning billionaires. Based on net worth (P2218), I am able to generate a list of billionaires, but there are quite a few problems.
First, I do not know how to control for units. As a result the #1 and #3 results are in CZ koruna and Indonesian rupiah respectively. Could someone explain how to control for units? (i.e. only include those that are listed in, say, dollars)
The other problem is related to multiple entries (2019, 2021, etc.) for net worth (many people show up more than once), although I'll need to look into this problem more carefully, as E. Musk also has multiple entries for net worth but does not appear multiple times.
I very much appreciate any help with the first problem (defining units in the query) and any ideas regarding the second. SashiRolls (talk) 16:30, 9 February 2022 (UTC)
- @SashiRolls: You get the currency from the full value representation, see mw:Wikibase/Indexing/RDF Dump Format. There is only one value for Elon Musk in your results because you ask for statements with best rank only by using the wdt: prefix. This is also in the RDF document. This query list best rank values in United States dollar (Q4917) only. You can replace wd:Q4917 with a variable to display the currency in the results. --Dipsacus fullonum (talk) 16:51, 9 February 2022 (UTC)Try it!
#billionaire SELECT ?Billionaire ?BillionaireLabel ?numLabel WHERE { ?Billionaire p:P2218 [a wikibase:BestRank ; psv:P2218 [wikibase:quantityAmount ?num ; wikibase:quantityUnit wd:Q4917]] FILTER (?num > 1000000000) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } ORDER BY DESC(?num)
- @Dipsacus fullonum:, thank you very much for the super-speedy response! I will read through the document you've given me as I've had similar types of questions in the past. SashiRolls (talk) 18:28, 9 February 2022 (UTC)
P18-lacking Wikidata items used as Commons depictions
Item commercial art gallery (Q56856618) lacks an image (P18), despite being used as a "depicts" statement value in the Structured Data of this Commons picture.
How to find other occurrences of this?
- Ideally sorted by popularity (items that are most used as Commons depictions).
- Ideally SPARQL, but I am OK with command-line tools too.
- If required by timeouts, splitting should preferably be done by coordinates.
It is the opposite of this query.
Thanks a lot! Syced (talk) 04:27, 8 February 2022 (UTC)
- @Syced: This is not easy as there are over 12 millions statements with depicts (P180) in WCQS. Splitting by coordinates isn't easy either. Many Commons pictures doesn't have coordinates, and those with coordinates can be both coordinates of the point of view (P1259) (20 millions pictures) and coordinates of depicted place (P9149) (5 million pictures). Which coordinates do you mean, and what about pictures with depicts (P180) but no coordinates? --Dipsacus fullonum (talk) 13:00, 8 February 2022 (UTC)
- @Syced: But presuming you can segment, there are iirc 1.6m distinct depects values on WCQS, and federation (at least as I code it) seems to be able to cope with checking 80-100k items before timeout, so that maybe gives you a measure of task. fwiw, I'm seeing 15k WD items with no image, from 80k distinct WCQS depicts values. (The temptation is to start filling in WD P18's blind, based on an adapted version of this query, a GIGO recipe if ever there were one.)
- --Tagishsimon (talk) 14:15, 8 February 2022 (UTC)Try it!
SELECT ?P180 WITH { SELECT DISTINCT ?P180 WHERE { ?file wdt:P180 ?P180 . } limit 80000 } as %i WHERE { INCLUDE %i SERVICE <https://query.wikidata.org/sparql> { OPTIONAL {?P180 wdt:P18 ?img . } FILTER (!BOUND(?img)) } }
- Unfortunately, depicts (P180) statements are often automatically imported based on image (P18) but when image (P18) statements are removed (to fix mistakes etc), few people bother to remove the depict statements from Commons images. Vojtěch Dostál (talk) 16:41, 8 February 2022 (UTC)
Thanks all! Tagishsimon's SPARQL is a great start, I actually started using it with success :-) Vojtěch: Sure, this is for one-by-one manual selection, not batch. @Dipsacus fullonum:: P1259 would be OK, P9149 would be OK too :-) Even only items with no coordinates would be interesting, especially if limited to the most "popular" items. Thanks a lot! Syced (talk) 03:03, 10 February 2022 (UTC)
- @Syced: Here is query using geographical search in a box. You can change the two specified coordinates for the corners of the box as you like, and also change the coordinate predicate (here coordinates of the point of view (P1259)). Credit: The query is based on query by Tagishsimon above. I noted that one of the first found item, when I ran the query, was a Wikimedia disambiguation page (Jeep Wrangler Unlimited (Q6172932)). So the query not only can give missing P18 values in Wikidata, but also wrong values in Commons. --Dipsacus fullonum (talk) 12:23, 10 February 2022 (UTC)Try it!
SELECT ?P180 ?file WITH { SELECT ?P180 ?file WHERE { ?file wdt:P180 ?P180 . SERVICE wikibase:box { ?file wdt:P1259 ?location . bd:serviceParam wikibase:cornerWest "Point(-121.872777777 37.304166666)"^^geo:wktLiteral . bd:serviceParam wikibase:cornerEast "Point(-121.486111111 38.575277777)"^^geo:wktLiteral . } } } AS %i WHERE { INCLUDE %i SERVICE <https://query.wikidata.org/sparql> { OPTIONAL { ?P180 wdt:P18 ?img } FILTER (!BOUND(?img)) } }
SELECT ?P180 ?l ?d ?file ?image
WITH
{
SELECT ?P180 ?file
WHERE
{
?file wdt:P180 ?P180 .
SERVICE wikibase:box
{
?file wdt:P1259 ?location .
bd:serviceParam wikibase:cornerWest "Point(-121.872777777 37.304166666)"^^geo:wktLiteral .
bd:serviceParam wikibase:cornerEast "Point(-121.486111111 38.575277777)"^^geo:wktLiteral .
}
}
} AS %i
WHERE
{
INCLUDE %i
SERVICE <https://query.wikidata.org/sparql>
{
OPTIONAL { ?P180 wdt:P18 ?img }
FILTER (!BOUND(?img))
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". ?P180 rdfs:label ?l . ?P180 schema:description ?d }
}
?file schema:url ?image.
}
Nice query, above the same with images, labels and descriptions. --- Jura 12:45, 10 February 2022 (UTC)
Conservation status in speciesbox
The Norwegian Biodiversity Information Centre have revised The Norwegian Red Lists (Norwegian Conservation Status List). I fail in my efforts in writing a query for the conservation status listed in no-wiki. I want to list species: Norwegian name, Scientific Name, and the Norwegian Conservation Status if possible, and the article URL.
I want to look only into https://no.wikipedia.org/wiki/ and list only species with a Speciesbox no:Mal:Taksoboks and I want listed only those entries containing the Norwegian Conservation Status present, containing a value in the Speciesbox. Example: For the status «VU», the entry in the Speciesbox will be:
| rødlistestatusno={{rødliste|VU|no}}
Please look to no:Granmeis for the result. If it is not possible to list the Norwegian Conservation Status, then a list of species with a Speciesbox is better than nothing. Magnefl (talk) 20:54, 9 February 2022 (UTC)
- Hej Magne. Denne søkning vil give sårbare arter med taksoboks på norsk Wikipedia. Du kan udskifte "VU" i søkningen for at få andre statuser. --Dipsacus fullonum (talk) 22:00, 9 February 2022 (UTC)Try it!
SELECT ?item ?tittel ?vitenskapelig_navn ?artikkel WHERE { SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "no.wikipedia.org" . bd:serviceParam wikibase:api "Generator" . bd:serviceParam mwapi:generator "search" . bd:serviceParam mwapi:gsrsearch 'hastemplate:"Taksoboks" insource:"rødlistestatusno Rødliste VU no"' . # Note that inline search ignores "greyspace", so the search will match "rødlistestatusno={{Rødliste|VU|no}}" bd:serviceParam mwapi:gsrlimit "max" . bd:serviceParam mwapi:gsrnamespace "0" . ?tittel wikibase:apiOutput mwapi:title . ?item wikibase:apiOutputItem mwapi:item . } FILTER BOUND(?item) OPTIONAL { ?item wdt:P225 ?vitenskapelig_navn } ?artikkel schema:about ?item . ?artikkel schema:isPartOf <https://no.wikipedia.org/> . }
- Ah, praktfullt! Det er til stor hjelp :) Magnefl (talk) 07:18, 10 February 2022 (UTC)
Instances of people with more than one value for death date
Hello! I'm wondering whether there's a way to search for people with two death date values, in other words conflicting information? Thanks! -- Zanimum (talk) 19:53, 29 January 2022 (UTC)
- @Zanimum: This should do it. I've used
because there are so many humans in Wikidata that if it weren't used, the query would result in Query Timeout Limit Reached, but feel free to change the value forLIMIT
.LIMIT
- -- Rdrg109 (talk) 22:42, 29 January 2022 (UTC)Try it!
SELECT ?item ?a ?b { ?item wdt:P31 wd:Q5; wdt:P570 ?a; wdt:P570 ?b. FILTER(?a != ?b). } LIMIT 1
- @Zanimum: Another way to avoid timeout is to limit the query to deal with just one or a few country of citizenship (P27) at a time. In the below query I have a filter to just France (Q142). I have also limited to date of death (P570) given with at least "day precision".
- Try it!
SELECT ?item ?itemLabel ?a ?b WITH { SELECT ?item ?a ?b { ?item wdt:P27 wd:Q142 . #limit to just people from France to avoid timeout ?item wdt:P31 wd:Q5; p:P570 [ wikibase:rank ?ra; psv:P570 [ wikibase:timeValue ?a; wikibase:timePrecision ?tpa ]]; p:P570 [ wikibase:rank ?rb; psv:P570 [ wikibase:timeValue ?b; wikibase:timePrecision ?tpb ]]. FILTER(?a < ?b) FILTER(?ra != wikibase:DeprecatedRank && ?rb != wikibase:DeprecatedRank && ?ra = ?rb) FILTER(?tpa>=11) #limit to just dates with at least day-precision FILTER(?tpb>=11) } } AS %i WHERE { INCLUDE %i SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } ORDER BY DESC(?a) ?itemLabel
- --Larske (talk) 12:45, 30 January 2022 (UTC)
- Maybe you want to limit to best rank (by replacing "wikibase:rank ?ra" with "a wikibase:BestRank")? --- Jura 12:54, 30 January 2022 (UTC)
- Thanks Jura, that is much better. In the query above the filtering is not correct. Here is an updated version of the query:
- Try it!
SELECT ?item ?itemLabel ?a ?b WITH { SELECT ?item ?a ?b { ?item wdt:P27 wd:Q142 . #limit to just people from France to avoid timeout ?item wdt:P31 wd:Q5; p:P570 [ a wikibase:BestRank; psv:P570 [ wikibase:timeValue ?a; wikibase:timePrecision ?tpa ]]; p:P570 [ a wikibase:BestRank; psv:P570 [ wikibase:timeValue ?b; wikibase:timePrecision ?tpb ]]. FILTER(?a < ?b) FILTER(?tpa>=11) #limit to just dates with at least day-precision FILTER(?tpb>=11) } } AS %i WHERE { INCLUDE %i SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } ORDER BY DESC(?a) ?itemLabel
- --Larske (talk) 13:38, 30 January 2022 (UTC)
- BTW as precision higher than 11 shouldn't exist yet, you could directly use 11. --- Jura 13:47, 30 January 2022 (UTC)
Apologies on my belated reply, but thank you, @Rdrg109:, @Larske:, @Jura1:, I'm very grateful for all of your help! -- Zanimum (talk) 22:05, 12 February 2022 (UTC)
Filtering for Values Within a Particular Class
Hello! I currently have this script to display all humans with medical conditions--
SELECT ?person ?personLabel ?condition ?conditionLabel
WHERE
{
?person wdt:P31 wd:Q5.
?person wdt:P1050 ?condition.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
--but am hoping to filter it down to only people with sexually transmitted infections. I know there's a FILTER function in SPARQL, but I can't quite figure out how to use it properly, especially when trying to display all values that are a subclass of another value. I'd appreciate any help somebody can offer; thank you very much!
- Add to the query. --Dipsacus fullonum (talk) 07:40, 11 February 2022 (UTC)
?condition wdt:P279* wd:Q12198.
- If you only want one line per person you may want to group the conditions with GROUP_CONCAT like this (giving 151 lines instead of 154):
- Try it!
SELECT ?person ?personLabel (GROUP_CONCAT(?conditionlabel;SEPARATOR=', ') AS ?conditions) WHERE { ?person wdt:P31 wd:Q5. ?person wdt:P1050 ?condition. ?condition wdt:P279+ wd:Q12198 . ?condition rdfs:label ?conditionlabel . FILTER(LANG(?conditionlabel)='en') SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } GROUP BY ?person ?personLabel ORDER BY ?personLabel
- --Larske (talk) 07:48, 11 February 2022 (UTC)
- Thank you so much! Especially for reading my mind and adding advice on the concatenation; that will be incredibly useful for other queries I'm working on. Tjg6ph (talk) 14:59, 11 February 2022 (UTC)
Newer items
I want to find all items created within the last 30 days that have "genre: K-POP" and "instance of: extended play". Lectrician1 (talk) 16:43, 12 February 2022 (UTC)
- Please give QID's for K-POP and extended play so we don't have to guess. Nothing obvious with these spellings seems to match. --Dipsacus fullonum (talk) 17:31, 12 February 2022 (UTC)
- @Dipsacus fullonum K-pop (Q213665) extended play (Q169930) Lectrician1 (talk) 17:33, 12 February 2022 (UTC)
- @Lectrician1: No such items are created in the last 30 days. I changed the limit to 40 days in the query below to prove it works:
- @Dipsacus fullonum K-pop (Q213665) extended play (Q169930) Lectrician1 (talk) 17:33, 12 February 2022 (UTC)
SELECT ?item ?itemLabel ?created ?age ?title ?q_number
WHERE
{
?item p:P136/ps:P136 wd:Q213665 .
?item p:P31/ps:P31 wd:Q169930 .
BIND (xsd:integer(SUBSTR(STR(?item), 33)) AS ?q_number)
FILTER (?q_number > 110000000) # QQ110000000 is from 2021-12-08
BIND (SUBSTR(STR(?item), 32) AS ?title)
SERVICE wikibase:mwapi
{
bd:serviceParam wikibase:endpoint "www.wikidata.org" .
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam mwapi:generator "allpages" .
bd:serviceParam mwapi:gapfrom ?title .
bd:serviceParam mwapi:gapto ?title .
bd:serviceParam mwapi:prop "revisions" .
bd:serviceParam mwapi:rvslots "main" .
bd:serviceParam mwapi:rvprop "timestamp" .
bd:serviceParam mwapi:rvdir "newer" .
bd:serviceParam mwapi:rvlimit "1" .
?created wikibase:apiOutput "revisions/rev/@timestamp" .
bd:serviceParam wikibase:limit "once" .
}
BIND (NOW() - xsd:dateTime(?created) AS ?age)
FILTER (?age <= 40) # Created in the last 40 days
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
ORDER BY ?age
--Dipsacus fullonum (talk) 21:47, 12 February 2022 (UTC)
top paths: P171*1..25
(((((((((((((((((((((((((wdt:P171|(wdt:P171/wdt:P171))|((wdt:P171/wdt:P171)/wdt:P171))|(((wdt:P171/wdt:P171)/wdt:P171)/wdt:P171))|(wdt:P171/wdt:P171*4))|(wdt:P171/wdt:P171*5))|(wdt:P171/wdt:P171*6))|(wdt:P171/wdt:P171*7))|(wdt:P171/wdt:P171*8))|(wdt:P171/wdt:P171*9))|(wdt:P171/wdt:P171*10))|(wdt:P171/wdt:P171*11))|(wdt:P171/wdt:P171*12))|(wdt:P171/wdt:P171*13))|(wdt:P171/wdt:P171*14))|(wdt:P171/wdt:P171*15))|(wdt:P171/wdt:P171*16))|(wdt:P171/wdt:P171*17))|(wdt:P171/wdt:P171*18))|(wdt:P171/wdt:P171*19))|(wdt:P171/wdt:P171*20))|(wdt:P171/wdt:P171*21))|(wdt:P171/wdt:P171*22))|(wdt:P171/wdt:P171*23))|(wdt:P171/wdt:P171*24))|(wdt:P171/wdt:P171*25))
I found that on wikitech:User:AKhatun/Wikidata_Subgraph_Query_Analysis#Paths_2. It was used 112,475 in sample period.
I'm curious when this is needed. Supposedly it aims to limit wdt:P171+ to 25 level. --- Jura 12:11, 5 February 2022 (UTC)
- Urk, that's a very long property chain, too bad blazegraph doesn't support the property{n,m} notation. But I believe a shorter variant that could work is:
- wdt:P171/wdt:P171?/wdt:P171?/wdt:P171?/wdt:P171?
- That ought to give the same result as wdt:P171{1,5} and avoids much of the repetion.
- I used something similar once when listing instance of subclasses of researchers, go far enough down the tree and it will start listing sculpture artists and other nonsense. Infrastruktur (talk) 14:02, 5 February 2022 (UTC)
- Also, maybe it is simply intended to prevent timeouts? Infrastruktur (talk) 14:24, 5 February 2022 (UTC)
- Or as I recently discovered the software doesn't safeguard against loops, so limiting recursion might be necessary if you run into loops, otherwise the query will never finish. Wikidata:WikiProject Ontology/Problems/Loops Infrastruktur (talk) 20:21, 14 February 2022 (UTC)
- @Infrastruktur: Loops in the data should never lead to undefined or infinite results in SPARQL (See section 9.4 Arbitrary Length Path Matching in specification at https://www.w3.org/TR/sparql11-query/). If you are right that the Wikidata SPARQL engine (Blazegraph) doesn't guard against loops, it would be a violation of the SPARQL 1.1 specification, but I don't believe that you are right. Here is an example which could have inifinite loops using the symmetric property spouse (P26): It runs fine with a finite number of results as it should. Can you give an example to support your claim that some queries will never finish? I have never seen that. --Dipsacus fullonum (talk) 23:13, 14 February 2022 (UTC)Try it!
SELECT ?item ?itemLabel ?spouse_recursive ?spouse_recursiveLabel WHERE { ?item wdt:P31 wd:Q5 . ?item wdt:P39 wd:Q11696 . ?item wdt:P26+ ?spouse_recursive . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } }
- You're right. I made the incorrect assumption that because it's possible to create loops in the class hierarchy that the graph software too would have no checks for this. Thanks for the demo. Infrastruktur (talk) 23:51, 14 February 2022 (UTC)
- @Infrastruktur: Loops in the data should never lead to undefined or infinite results in SPARQL (See section 9.4 Arbitrary Length Path Matching in specification at https://www.w3.org/TR/sparql11-query/). If you are right that the Wikidata SPARQL engine (Blazegraph) doesn't guard against loops, it would be a violation of the SPARQL 1.1 specification, but I don't believe that you are right. Here is an example which could have inifinite loops using the symmetric property spouse (P26):
preferred format for id
Query wise (on current Wikidata Query Service), which format would be the best one? See Wikidata:Property proposal/Finnish real property ID.
The idea is that we store in one format and can request any of the three. --- Jura 15:02, 13 February 2022 (UTC)
- I would say choose the format that is most often used. All conversions can be done. The most complicated part is removing or inserting leading zeros. Here is one way to do it:
- Try it!
# Conversion from format1 to format2 and format3 SELECT ?format1 ?format2 ?format3 ?num1 ?num2 ?num3 ?num4 { VALUES ?format1 { "91-2-2008-3" "398-1-1100-1" "529-102-11-1" "148-402-27-1" } BIND (REPLACE(?format1, "(\\d+)-(\\d+)-(\\d+)-(\\d+)", "$1") AS ?num1) BIND (REPLACE(?format1, "(\\d+)-(\\d+)-(\\d+)-(\\d+)", "$2") AS ?num2) BIND (REPLACE(?format1, "(\\d+)-(\\d+)-(\\d+)-(\\d+)", "$3") AS ?num3) BIND (REPLACE(?format1, "(\\d+)-(\\d+)-(\\d+)-(\\d+)", "$4") AS ?num4) BIND (CONCAT( SUBSTR(CONCAT("00", ?num1), STRLEN(?num1), 3), "-", SUBSTR(CONCAT("00", ?num2), STRLEN(?num2), 3), "-", SUBSTR(CONCAT("000", ?num3), STRLEN(?num3), 4), "-", SUBSTR(CONCAT("000", ?num4), STRLEN(?num4), 4)) AS ?format2) BIND (CONCAT( SUBSTR(CONCAT("00", ?num1), STRLEN(?num1), 3), SUBSTR(CONCAT("00", ?num2), STRLEN(?num2), 3), SUBSTR(CONCAT("000", ?num3), STRLEN(?num3), 4), SUBSTR(CONCAT("000", ?num4), STRLEN(?num4), 4)) AS ?format3) }
- Try it!
# Conversion from format2 to format3 and format1 SELECT ?format1 ?format2 ?format3 { VALUES ?format2 { "091-002-2008-0003" "398-001-1100-0001" "529-102-0011-0001" "148-402-0027-0001" } BIND (REPLACE(?format2, "(\\d+)-(\\d+)-(\\d+)-(\\d+)", "$1$2$3$4") AS ?format3) BIND (REPLACE(?format2, "0{0,2}(\\d+)-0{0,2}(\\d+)-0{0,3}(\\d+)-0{0,3}(\\d+)", "$1-$2-$3-$4") AS ?format1) }
- Try it!
# Conversion from format3 to format2 and format1 SELECT ?format1 ?format2 ?format3 ?num1 ?num2 ?num3 ?num4 { VALUES ?format3 { "09100220080003" "39800111000001" "52910200110001" "14840200270001" } BIND (REPLACE(?format3, "(\\d{3})(\\d{3})(\\d{4})(\\d{4})", "$1-$2-$3-$4") AS ?format2) BIND (REPLACE(?format3, "(\\d{3})(\\d{3})(\\d{4})(\\d{4})", "$1") AS ?num1) BIND (REPLACE(?format3, "(\\d{3})(\\d{3})(\\d{4})(\\d{4})", "$2") AS ?num2) BIND (REPLACE(?format3, "(\\d{3})(\\d{3})(\\d{4})(\\d{4})", "$3") AS ?num3) BIND (REPLACE(?format3, "(\\d{3})(\\d{3})(\\d{4})(\\d{4})", "$4") AS ?num4) BIND (CONCAT( STR(xsd:integer(?num1)), "-", STR(xsd:integer(?num2)), "-", STR(xsd:integer(?num3)), "-", STR(xsd:integer(?num4))) AS ?format1) }
- --Dipsacus fullonum (talk) 16:40, 13 February 2022 (UTC)
# Conversion from format2 to format3 and format1
SELECT ?format1 ?format2 ?format3
{
VALUES ?format2 { "091-002-2008-0003" "398-001-1100-0001" "529-102-0011-0001" "148-402-0027-0001" }
BIND (REPLACE(?format2, "-", "") AS ?format3)
}
Thanks for your help. I think that is good advice, it limits conversions. Above a slight shorter one for #2 to #3. --- Jura 17:18, 13 February 2022 (UTC)
{{Autofix}}
for Finnish real property ID (P10364) from format1 to format2 might be more complicated, but I suppose we can sort that out if/when it's needed. --- Jura 18:07, 13 February 2022 (UTC)
Get all data of a statement with his ID
Hello,
I am the author of WikibaseIntegrator, a fork of WikidataIntegrator.
In WBI you can speed up your search by querying the SPARQL server and later compare the data, this helps when you repeatedly work on the same property with different values.
The script can also compare the data on the SPARQL server with your own and return whether or not a write is required.
My current query is complex, I'm trying to get as much information as possible, and the SPARQL server doesn't like that (Error 500, timeout).
So I want to split my logic in two: have a first query to get only the ID and value of the statement, and later, if I want more information about this statement (if I want to update it or not), I ask for more information with the ID to the SPARQL server.
Is it possible to get everything the SPARQL server has on a statement through its ID?
Thanks, Myst (talk) 14:30, 6 February 2022 (UTC)
- @Myst: By & large, yes. Here's an example using a named subquery, in which the initial query gives us two statement_IDs for Q79160#P1435, and in which the second query unpacks the statement value, and the qualifier and reference properties and values. (In practise this query would call for some aggregation and OPTIONALity.)
- --Tagishsimon (talk) 14:58, 6 February 2022 (UTC)Try it!
SELECT ?propertyLabel ?value ?statement_ID ?statement_ref_ID ?ref_propertyLabel ?ref_value with { SELECT ?statement_ID ?statement_ref_ID WHERE { wd:Q79160 p:P1435 ?statement_ID. } } as %i WHERE { INCLUDE %i ?statement_ID ?predicate ?value. ?property wikibase:qualifier|wikibase:statementProperty ?predicate . ?statement_ID prov:wasDerivedFrom ?statement_ref_ID. ?statement_ref_ID ?ref_predicate ?ref_value . ?ref_property wikibase:reference ?ref_predicate . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } order by ?statement_ID ?statement_ref_ID
- @Myst: To get everything on a statement the query will also need to ask for statement rank (deprecated, normal, or preferred), bestRank status (yes or no), all the full values (with additional values like precision, calendar model, geo globe etc. dependent on value type), and all normalized values. That can be voluminous but not very complicated. See mw:Wikibase/Indexing/RDF Dump Format for details. --Dipsacus fullonum (talk) 15:26, 6 February 2022 (UTC)
- Hello @Tagishsimon, thank you for the answer (and sorry for my late reply).
- I tried your code and it seems good for what I want, my only issue is to reuse only the second part without the "INCLUDE %i" and set directly the statement_ID I want to retrieve. I didn't found how to do this. Myst (talk) 18:06, 18 February 2022 (UTC)
- Remove the named subquery and replace "INCLUDE %i" with something like:
- VALUES ?statement_ID { wds:Q79160-774EEFBA-E4DF-41D7-8E4A-BC1729AA76CF }
- You can have one or any number of statement ID's in the brackets, each with the wds: prefix. --Dipsacus fullonum (talk) 18:45, 18 February 2022 (UTC)
- Thank you! Myst (talk) 08:08, 19 February 2022 (UTC)
- Remove the named subquery and replace "INCLUDE %i" with something like:
In search of RINOs
as in Rapper in the name only. I came across Irving Rapper and wonder how many other people are rappers in the name only and by name I mean the wikidata label. It would make for a good example in Wikidata:SPARQL_query_service/queries/examples, too, as queries involving label filters are sparse there. --Frlgin (talk) 18:43, 15 February 2022 (UTC)
- A regular query will time out, but using the search API is much more efficient.
- —Infrastruktur (talk) 19:25, 15 February 2022 (UTC)Try it!
SELECT DISTINCT ?item ?label WHERE { SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "www.wikidata.org"; wikibase:api "Generator"; mwapi:generator "search"; mwapi:gsrsearch "inlabel:Rapper"; mwapi:gsrlimit "max". ?item wikibase:apiOutputItem mwapi:title. } ?item wdt:P31 wd:Q5. MINUS { ?item wdt:P106 wd:Q2252262. } ?item rdfs:label ?label. FILTER(LANG(?label) = "en") # Comment out to search non-english labels as well FILTER(CONTAINS(LCASE(?label), "rapper")) }
User which created the most Items in a "type of" category
Is it possible to find the Wikidata users, that created the most Items in a category? Or is that out of the scope of what queries can do?
Users that created Items where instance of (P31) is nonprofit organization (Q163740), ordered by amount of created Items in that category. Thanks --Newt713 (talk) 21:28, 19 February 2022 (UTC)
- @Newt713: You can get revision data in WDQS, including who created the first revision of any item, by using the MWAPI service. However to do that, you need a separate API call for each item, so it wont be realistic to do it for categories larger than up to around 1000 items. There are 27000+ items with instance of (P31) nonprofit organization (Q163740), that is far too many. You need to choose a smaller category, or find another tool. --Dipsacus fullonum (talk) 22:01, 19 February 2022 (UTC)
- Thank you! I'll look into it. --Newt713 (talk) 11:48, 20 February 2022 (UTC)
- @Newt713: A week ago I made a query which gets the creation date for select items by using MWAPI (see #Newer items above). It would not be hard to modify it to get the creator instead of the creation date. But it will time out if you run it with many items at once, like e.g. all nonprofit organizations. --Dipsacus fullonum (talk) 12:50, 20 February 2022 (UTC)
- @Dipsacus fullonum: That looks good. Perhaps if we add an additional filter by country (P17)? --Newt713 (talk) 12:56, 20 February 2022 (UTC)
- @Newt713: A week ago I made a query which gets the creation date for select items by using MWAPI (see #Newer items above). It would not be hard to modify it to get the creator instead of the creation date. But it will time out if you run it with many items at once, like e.g. all nonprofit organizations. --Dipsacus fullonum (talk) 12:50, 20 February 2022 (UTC)
- Thank you! I'll look into it. --Newt713 (talk) 11:48, 20 February 2022 (UTC)
Optimization request: African coups and attempts
SELECT ?coup ?coupLabel ?success ?time ?countryLabel WHERE {
?coup wdt:P31 ?type .
filter(?type in (wd:Q45382, wd:Q25906438, wd:Q4460509))
bind (?type != wd:Q25906438 as ?success)
?coup wdt:P17 ?country.
?country wdt:P30 wd:Q15.
optional {?coup wdt:P585 ?time}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} order by desc(?time)
My query to find coups and attempts in Africa works but is very slow (right now 50 seconds if not in cache). Can someone rewrite this to be faster? Thanks.--37.201.182.206 22:41, 19 February 2022 (UTC)
- Mainly, use VALUES, which provides a 'solution sequence' for the named variable, which is JOINed with the rest of the query - in other words, the variable is constrained to values in the VALUES list. Documentation: https://www.w3.org/TR/sparql11-query/#inline-data . FILTERs are always expensive & generally to be avoided.
- --Tagishsimon (talk) 23:06, 19 February 2022 (UTC)Try it!
SELECT ?coup ?coupLabel ?success ?time ?countryLabel WHERE { VALUES ?type {wd:Q45382 wd:Q25906438 wd:Q4460509} ?coup wdt:P31 ?type . bind (?type != wd:Q25906438 as ?success) ?coup wdt:P17 ?country. ?country wdt:P30 wd:Q15. optional {?coup wdt:P585 ?time} SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } order by desc(?time)
Query to count the number of identifiers for taxa
which have the identifier Australian Faunal Directory ID (P6039) That is the query would count all the database identifiers for such an item, and for Enteles vigorsii (Q107221668) would get a count of 9 (counting both iNaturalist ids). MargaretRDonald (talk) 05:17, 16 February 2022 (UTC)
- @MargaretRDonald: Here is a first attempt. As there are more than 130,000 objects in the result, an attempt to include the labels in the result will timeout.
- Try it!
SELECT ?item (COUNT(?id) AS ?count_of_external_identifiers) WHERE { ?item wdt:P6039 [] . OPTIONAL { ?item ?id [] . ?p wikibase:directClaim ?id . ?p wikibase:propertyType wikibase:ExternalId . } } GROUP BY ?item
- By dividing the result in several chunks as in the following query, experiment with OFFSET and LIMIT, you may be able to find the labels.
- Try it!
SELECT ?item ?itemLabel (COUNT(?id) AS ?count_of_external_identifiers) WITH { SELECT ?item { ?item wdt:P6039 [] . } OFFSET 50000 LIMIT 10000 } AS %i WHERE { INCLUDE %i OPTIONAL { ?item ?id ?value . ?p wikibase:directClaim ?id . ?p wikibase:propertyType wikibase:ExternalId } SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } GROUP BY ?item ?itemLabel
- --Larske (talk) 06:59, 16 February 2022 (UTC)
- Using OFFSET and LIMIT clauses without also using ORDER BY makes no sense since the results will not be in any defined order. So I propose to add in the named subquery
ORDER BY ?item
%i
above. - Also be aware that if some items have more than one best-rank statement with Australian Faunal Directory ID (P6039), then that item's external identifiers will be counted more than once (if the P6039 statements are found in the same chunk), and/or the item will in several chunks.
- By the way Blazegraph (the SPARQL engine) comes with a service intended to return only a part (called slice) of the results matching a triple which you can also use instead of the subquery: Try it!
SELECT ?item ?itemLabel (COUNT(?id) AS ?count_of_external_identifiers) WHERE { SERVICE bd:slice { ?item wdt:P6039 [] . bd:serviceParam bd:slice.offset 50000 . bd:serviceParam bd:slice.limit 10000 . } OPTIONAL { ?item ?id ?value . ?p wikibase:directClaim ?id . ?p wikibase:propertyType wikibase:ExternalId } SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } GROUP BY ?item ?itemLabel
- Finally you can speed up the query by only adding the labels after grouping: Try it!
SELECT ?item ?itemLabel ?count_of_external_identifiers WITH { SELECT ?item (COUNT(?id) AS ?count_of_external_identifiers) WHERE { SERVICE bd:slice { ?item wdt:P6039 [] . bd:serviceParam bd:slice.offset 50000 . bd:serviceParam bd:slice.limit 10000 . } OPTIONAL { ?item ?id ?value . ?p wikibase:directClaim ?id . ?p wikibase:propertyType wikibase:ExternalId } } GROUP BY ?item } AS %without_labels WHERE { INCLUDE %without_labels SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } }
- --Dipsacus fullonum (talk) 10:57, 16 February 2022 (UTC)
- Using OFFSET and LIMIT clauses without also using ORDER BY makes no sense since the results will not be in any defined order. So I propose to add
- It could be easier to use the triple that counts them automatically
- ?item wikibase:identifiers ?ids
- Property_talk:P6039 has a few links to queries for those with the most or the least. --- Jura 15:05, 16 February 2022 (UTC)
- @Jura1: Yes, it would be easier using
wikibase:identifiers
if that gives the count. Unfortunately I don't know that predicate. Where is it documented? There is no mention in the RDF Dump Format document. --Dipsacus fullonum (talk) 16:24, 16 February 2022 (UTC)- There must be a ticket in phabricator. I think it was added back when wikibase:statements was included. --- Jura 20:42, 16 February 2022 (UTC)
- I found phab:T144476 "[feature request] add count of external-ids to rdf (Wikidata)" (Closed, Resolved). I wonder why the development people want to use their time to develop new features, and then don't update the RDF documentation so the new features can be used by all. Why make it, and then more or less keep it secret? It simply makes no sense to me. It also have used a lot of resources unnecesary that users for years have manually counted the external identiers because the documentation isn't updated. Aw! --Dipsacus fullonum (talk) 21:18, 16 February 2022 (UTC)
- It was somewhat buggy at the beginning with delayed or missing updates of counts. Also, it took forever until it was available for all items. --- Jura 21:27, 16 February 2022 (UTC)
- @Jura1: Yes, it would be easier using
- Thank you all. @Larske: @Dipsacus fullonum: @Jura: Amazing work. However, I managed to misspecify what I am wanting. Would it be possible to get not the item but a summary of the items, and break it down so that not all 130,000+ objects are interrogated but just birds (using say ?item wdt:P171* wd:Q5113.) To give a table like the following:
- Thank you all. @Larske: @Dipsacus fullonum: @Jura: Amazing work. However, I managed to misspecify what I am wanting. Would it be possible to get not the item but a summary of the items, and break it down so that not all 130,000+ objects are interrogated but just birds (using say ?item wdt:P171* wd:Q5113.) To give a table like the following:
No of- external ids count
1 20000
2 1900
3 1750
....
40 4
Hopefully the full (sub)group can be summarised. Alternatively the table could consist of two lines: number of elements having more than one external id and those having just one. MargaretRDonald (talk) 20:14, 16 February 2022 (UTC)
- Try the chart for sitelinks on property talk and change "sitelinks" to "identifiers". --- Jura 21:04, 16 February 2022 (UTC)
- Like this:
- Try it!
SELECT ?number_of_external_identifiers (COUNT(DISTINCT ?item) AS ?number_of_bird_objects) WHERE { ?item wdt:P171* wd:Q5113 . ?item wdt:P6039 [] ; wikibase:identifiers ?number_of_external_identifiers . } GROUP BY ?number_of_external_identifiers ORDER BY ?number_of_external_identifiers
- --Larske (talk) 01:35, 17 February 2022 (UTC)
- I changed
COUNT(?item)
toCOUNT(DISTINCT ?item
in the query by Larske above because otherwise items with more than one best-rank P6039 statements will be overcounted. --Dipsacus fullonum (talk) 08:19, 17 February 2022 (UTC)- Thank you so much @Larske:. I have now almost got what I want. However my query on insect (Q1390) timed out (of course).... And I am hoping you might be able to find a solution. Thanks, again. MargaretRDonald (talk) 20:10, 19 February 2022 (UTC)
- @MargaretRDonald: To find a solution, you need to tell what you want. What was the query that timed out? --Dipsacus fullonum (talk) 21:40, 19 February 2022 (UTC)
- Thanks, @Dipsacus fullonum:. MargaretRDonald (talk) 00:57, 20 February 2022 (UTC)Try it!
SELECT ?number_of_external_identifiers (COUNT(DISTINCT ?item) AS ?number_of_insect_objects) WHERE { ?item wdt:P171* wd:Q1390 . ?item wdt:P6039 [] ; wikibase:identifiers ?number_of_external_identifiers . } GROUP BY ?number_of_external_identifiers ORDER BY ?number_of_external_identifiers
- @MargaretRDonald: There are so many insect items that it is impossible to make a list of them and then join it with the items with P6039 values, as the SPARQL engine tried. I tried finding the items with P6039 values first and then checking which are insects. That also timed out, but I could split the query into two.
- Items with 1 to 7 external identiers: Try it!
SELECT ?number_of_external_identifiers (COUNT(DISTINCT ?item) AS ?number_of_insect_objects) WITH { SELECT ?item ?number_of_external_identifiers WHERE { ?item wdt:P6039 [] . ?item wikibase:identifiers ?number_of_external_identifiers . hint:Prior hint:rangeSafe true . FILTER (?number_of_external_identifiers <= 7 ) } } AS %P6039 WHERE { INCLUDE %P6039 ?item wdt:P171* wd:Q1390 . hint:Prior hint:gearing "forward" . } GROUP BY ?number_of_external_identifiers ORDER BY ?number_of_external_identifiers
- Items with more than 7 external identiers: Try it!
SELECT ?number_of_external_identifiers (COUNT(DISTINCT ?item) AS ?number_of_insect_objects) WITH { SELECT ?item ?number_of_external_identifiers WHERE { ?item wdt:P6039 [] . ?item wikibase:identifiers ?number_of_external_identifiers . hint:Prior hint:rangeSafe true . FILTER (?number_of_external_identifiers > 7 ) } } AS %P6039 WHERE { INCLUDE %P6039 ?item wdt:P171* wd:Q1390 . hint:Prior hint:gearing "forward" . } GROUP BY ?number_of_external_identifiers ORDER BY ?number_of_external_identifiers
- --Dipsacus fullonum (talk) 09:02, 20 February 2022 (UTC)
- Thank, you so much for this, I have now managed to query my data very adequately, thanks to your efforts (and everyone else's). @Dipsacus fullonum:. MargaretRDonald (talk) 19:17, 20 February 2022 (UTC)
- @MargaretRDonald: To find a solution, you need to tell what you want. What was the query that timed out? --Dipsacus fullonum (talk) 21:40, 19 February 2022 (UTC)
- Thank you so much @Larske:. I have now almost got what I want. However my query on insect (Q1390) timed out (of course).... And I am hoping you might be able to find a solution. Thanks, again. MargaretRDonald (talk) 20:10, 19 February 2022 (UTC)
- I changed
URL search strings for all the instances of X missing property Y
Create URLs for a list of items of instance x that is missing property y
The URL is to be a combination of "search formatter URL" (P4354) and the labelname
The idea being to have something reusable that someone interested in filling in a particular id can find entries that currently do not have it.
For example "Instances of" (P31) surname (Q101352) without property "MyHeritage Surname ID" (P5452)
Would create a clickable list like
https://lastnames.myheritage.com/last-name/presley
https://lastnames.myheritage.com/last-name/springsteen
https://lastnames.myheritage.com/last-name/sinatra
Back ache (talk) 04:19, 21 February 2022 (UTC)
- @Back ache: You're looking for something like this, I think:
- Try it!
SELECT ?item ?item_label ?check_url WHERE { ?item wdt:P31 wd:Q101352 . MINUS {?item wdt:P5452 [] } . ?item rdfs:label ?item_label FILTER(lang(?item_label) = 'en') . wd:P5452 wdt:P1630 ?fmt . BIND(IRI(REPLACE(str(?item_label), '(^.*)', ?fmt)) AS ?check_url) . } LIMIT 50
I've limited the number of results shown to 50, as wikidata has a lot of surnames.
I'm a little concerned to discover the number of MyHeritage records with family name = "Family-Name" : [2], but I guess people get called all sorts of things. :-) Jheald (talk) 12:09, 21 February 2022 (UTC)
@Jheald:
- I agree hopefully using querys like this, alongside things like Wikidata:Tools/Wikidata for Web they'll create higher quality links, thanks for your help Back ache (talk) 12:44, 21 February 2022 (UTC)
- @Back ache: Here's also a slightly tweaked version of the query, which looks at the first million family name (P734) statements it can find, and then lists which surnames are most frequently occurring in that set that don't have a P734 :
- Try it!
SELECT ?count ?name ?name_label ?check_url WITH { SELECT ?item ?name WHERE { ?item wdt:P734 ?name } LIMIT 1000000 } AS %uses WITH { SELECT (COUNT(*) AS ?count) ?name WHERE { INCLUDE %uses } GROUP BY ?name HAVING (?count > 4) } AS %names WHERE { INCLUDE %names MINUS {?name wdt:P5452 [] } . ?name rdfs:label ?name_label FILTER(lang(?name_label) = 'en') . wd:P5452 wdt:P1630 ?fmt . BIND(IRI(REPLACE(str(?name_label), '(^.*)', ?fmt)) AS ?check_url) . } ORDER BY DESC(?count)
- -- Jheald (talk) 13:17, 21 February 2022 (UTC)
- So if I was trying to a variation for humans without a surname property, how do I only include reuslts with a space (to avoid people with one name) and extract only that which follows the last space to be used in link.
So that the query would exclude the name "Prince" and for "Elvis Presley" only use "Presley" as the varibable in the link
- Try it!
SELECT ?item ?item_label ?check_url WHERE { ?item wdt:P31 wd:Q5 . MINUS {?item wdt:P734 [] } . ?item rdfs:label ?item_label FILTER(lang(?item_label) = 'en') . wd:P5452 wdt:P1630 ?fmt . BIND(IRI(REPLACE(str(?item_label), '(^.*)', ?fmt)) AS ?check_url) . FILTER regex(?item_label, " ", "i") } LIMIT 50
Back ache (talk) 11:27, 23 February 2022 (UTC)
selecting a subset of items linked to property:P7400
hi!LibraryThing page for L.L.Zamenhof has at the top right a link persondata.toolforge.org gnd 118643495 and of course also to wikidata. because the wd item has both an entry for property LibraryThing Author ID and a dewiki entry as a consequence at the persondata.toolserver link from above you can find a link labeled "Eintrag in Librarything".
note: for now (10:06, 23 February 2022 (UTC)) 8,950 P4700 entries are available at wikidata. i hope that the multilingual LibraryThing community will more by the thousends for many of the 2,000,000 LT authors. let's hope "a dream comes trough"
my SPARK skills are very limited. i managed to generate https://w.wiki/4sQJ .
please generate a query selecting
- all wd items with property P7400
where
- a dewiki is available
- a eowiki is available
- Property:P8557 OR Q12565 is a value for Property:P463
the output should Show
- wd item labeled with Englisch wd item label
- LT as label with the P4700 value
- eo as label for the eowiki link to Esperanto Wikipedia
- de as label for the dewiki link to German Wikipedia
if possible the output should be sorted by Englisch labels of the wd items thanks in advance for all your eforts gangleri aka lery raynhart 10:06, 23 February 2022 (UTC) 88.128.88.112
- 10:13, 23 February 2022 (UTC) 88.128.88.112
Esperanto wikidata kaj LibraryThing
saluton! https://w.wiki/4sVZ
la rezulto montras aron da pagcoj rilatantajn al nomojn kiujn vi konas, kies nomojn vi devintus scii.
cxiu pagxo entenas Property:P4700. se vi iras al LibraryThing povas esti ke vi ankaux trovas en la supra dekstra parto ligon nomatan persondata.toolforge . estas ligo al la secxilo de la germanlinga vikidio. bv. noti ke tie estas ligo nomata "Eintrag in LibraryThing".
bv. noti ke la persondata.toolforge pagxo listigas ligojn al aliaj personoj menciataj en la germanlingva artikolo en vikipedio.
PETO: bv. pliampleksigi la sercxon tiamsnire ke aliaj e-istoj,
interlingvistoj aperu. eblas aldoni Membrojn de la Akademio de Esperanto, de la Lingva komitato, Honorajn membrojn de UEA, UEA membrojn ktp.
antauxdankon gangleri aka lery raynhart 10:59, 23 February 2022 (UTC) 88.128.88.112
- Jen serĉo por esperantistoj kun valoro por LibraryThing author ID (P7400): Try it!
SELECT ?item ?itemLabel WHERE { ?item wdt:P31 wd:Q5 . # ?item eastas homo ?item wdt:P7400 [] . # ?item havas LibraryThing-valoron ?item wdt:P106 wd:Q860918 . # ?item estas esperantisto SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],eo,en". } }
- Se vi volis ion alian, bonvolu diri la deziron pli klare. --Dipsacus fullonum (talk) 11:58, 23 February 2022 (UTC)
- Question for all: Why is everything after "PETO:" bold? Jeg kan ikke se årsagen. (Demando por ĉiuj: Kial estas ĉio post "PETO:" graseta? Mi ne vidas la kialon.)
aldono: Alaudo · talk @Alaudo ,
Andy · talk @Andy ,
Blahma · talk @Blahma ,
Castelobranco · talk @Castelobranco ,
Chuck SMITH @Chuck_SMITH ,
Darkweasel94 · talk @Darkweasel94 ,
DidiWeidmann · talk @DidiWeidmann ,
KuboF · talk @KuboF ,
Marcos · talk @Marcos ,
Narvalo · talk @Narvalo ,
PaulP · talk @PaulP ,
Pino @Pino ,
Roboto de Marcos @Roboto_de_Marcos ,
Thomas Guibal @Thomas_Guibal ,
ThomasPusch · talk @ThomasPusch ,
Tlustulimu · talk @Tlustulimu ,
Umbert' · talk @Umbert' ,
Yekrats · talk @Yekrats ,
לערי ריינהארט · talk
@לערי_ריינהארט , Maha · talk @Maha ,
Ziko · talk @Ziko
dankon Dipsacus fullonum pro la helpo! 14:20, 23 February 2022 (UTC) 62.54.176.88
grandega laboro! bonvolu krei kroman sercxon kie la rezulto entenas ankaux la LibraryThing ligojn kaj ligojn al enwiki kaj al dewiki se tiaj sitelinks ekzistas.
bv. plilargxigi la demandon sercxsnte memnrojm de UEA, Honorajn membrojn de UEA, de la Akademio de Esperanto, de la Lingva Komitato, pagxojn lihitsjn al Nia diligenta kolegaro, ligitajn al Originala literaturo, denaskajn esperantistpkn, lingvokapablo Esperanto verlintojn en Espersnto, Ido, Volapuko, aliaj planingvoj. antauxdankon! amike reinhardt aka lery raynhart 14:44, 23 February 2022 (UTC) 62.54.176.88
What are exactly sitelinks on this search?
SELECT ?item ?itemLabel ?itemDescription (GROUP_CONCAT(?p31label;SEPARATOR=', ') AS ?type) (YEAR(?p577) AS ?year) ?sitelinks ?sitelink ?article WITH {
SELECT DISTINCT ?item ?p577 ?sitelinks WHERE { ?item wdt:P31 wd:Q7725634 . ?item wdt:P407 wd:Q652 . ?item wdt:P577 ?p577 . hint:Prior hint:rangeSafe true . FILTER ("1980-00-00"^^xsd:dateTime <= ?p577 && ?p577 < "2021-00-00"^^xsd:dateTime) # change the year limits if needed ?item wikibase:sitelinks ?sitelinks .
} } AS %i WHERE {
INCLUDE %i ?item wdt:P31 [rdfs:label ?p31label] . FILTER(LANG(?p31label)='en') # change the language code if needed
OPTIONAL { ?article schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> ; schema:name ?sitelink . } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} GROUP BY ?item ?itemLabel ?itemDescription ?type ?p577 ?sitelinks ?sitelink ?article ORDER BY DESC(?sitelinks) ?itemLabel
What are exactly sitelinks here? Are they the links of other pages on the same entity translated in different languages?
- Well, the above query has syntax errors, so there nothing means anything. But it contains the triple
?item wikibase:sitelinks ?sitelinks
. The predicatewikibase:sitelinks
is used to record how many sitelinks to MediaWikia projects an item has. That may be links to Wikipedia in any language, links to Wikinews in any language, links to Wikisource in any language and several other projects. --Dipsacus fullonum (talk) 11:28, 23 February 2022 (UTC)
- How can I get how many links a page is translated to? Like, each item translated into how many languages 2001:B07:AE5:83D6:B08E:CDCF:9E4C:36AE 12:58, 23 February 2022 (UTC)
- You cannot. There are no records saying if pages in different languages are translations of each other or independent pages about the item. --Dipsacus fullonum (talk) 13:36, 23 February 2022 (UTC)
- So if I have item "The Godfather" I can't see in how many languages that wikipedia page has been translated to? 2001:B07:AE5:83D6:B08E:CDCF:9E4C:36AE 13:56, 23 February 2022 (UTC)
- That is correct. There are no records on translations. You can however see in how many languages there is a Wikipedia page about the item. --Dipsacus fullonum (talk) 15:08, 23 February 2022 (UTC)
- How can I do it? 2001:B07:AE5:83D6:B08E:CDCF:9E4C:36AE 15:53, 23 February 2022 (UTC)
- --Dipsacus fullonum (talk) 19:31, 23 February 2022 (UTC)Try it!
SELECT ?item (COUNT(?wikipedia_page) AS ?wikipedia_pages) { VALUES ?item { wd:Q47703 } ?wikipedia_page schema:about ?item . ?wikipedia_page schema:isPartOf/wikibase:wikiGroup "wikipedia" . } GROUP BY ?item
- How can I do it? 2001:B07:AE5:83D6:B08E:CDCF:9E4C:36AE 15:53, 23 February 2022 (UTC)
- That is correct. There are no records on translations. You can however see in how many languages there is a Wikipedia page about the item. --Dipsacus fullonum (talk) 15:08, 23 February 2022 (UTC)
- So if I have item "The Godfather" I can't see in how many languages that wikipedia page has been translated to? 2001:B07:AE5:83D6:B08E:CDCF:9E4C:36AE 13:56, 23 February 2022 (UTC)
- You cannot. There are no records saying if pages in different languages are translations of each other or independent pages about the item. --Dipsacus fullonum (talk) 13:36, 23 February 2022 (UTC)
Item With Most References
How can I get a list of the items with the most total references? More specifically, the sum of references on every statement of an item. AntisocialRyan (Talk) 16:22, 23 February 2022 (UTC)
- @AntisocialRyan: It would take far too long to count the number of references in all items by using SPARQL to avoid timeout. I think that you will have to analyze a database dump to do that. --Dipsacus fullonum (talk) 18:00, 23 February 2022 (UTC)
- Could it be narrowed down to a specific instance of? Like movies with the most references? AntisocialRyan (Talk) 19:42, 23 February 2022 (UTC)
Help optimizing query
I need help optimizing this query to take less than timeout.
SELECT (COUNT(DISTINCT ?director_de_cine) AS ?count) WHERE {
?director_de_cine (wdt:P106*) wd:Q2526255;
wdt:P21 wd:Q6581072.
?sitelink schema:about ?article;
schema:isPartOf <https://es.wikipedia.org/>.
}
thanks :-) —Ismael Olea (talk) 16:36, 23 February 2022 (UTC)
- @Olea: Mainly, your formulation
?sitelink schema:about ?article;
does not tie sitelinks to the ?director_de_cine variable, and so WDQS wanders off looking at all sitelinks for all items. I think, too, you might have wanted?director_de_cine wdt:P106/wdt:P279* wd:Q2526255
, (has an occupation of, or an occupation which is a subclass of). - --Tagishsimon (talk) 17:07, 23 February 2022 (UTC)Try it!
SELECT (COUNT(DISTINCT ?director_de_cine) AS ?count) WHERE { ?director_de_cine wdt:P106/wdt:P279* wd:Q2526255; wdt:P21 wd:Q6581072. ?article schema:about ?director_de_cine ; schema:isPartOf <https://es.wikipedia.org/> . }
How many languages there is a Wikipedia page about Italian writers' works 1980-2021
Hi, I have this query and I wanted to know if it's correct to get how many links (in the sense of translated pages) there are about each entity, which is a literary work of an Italian writer between 1980 and 2021.
SELECT ?item ?itemLabel ?itemDescription (GROUP_CONCAT(?p31label;SEPARATOR=', ') AS ?type) (YEAR(?p577) AS ?year) ?sitelinks ?sitelink ?article WITH {
SELECT DISTINCT ?item ?p577 ?sitelinks WHERE {
?item wdt:P31 wd:Q7725634 .
?item wdt:P407 wd:Q652 .
?item wdt:P577 ?p577 . hint:Prior hint:rangeSafe true .
FILTER ("1980-00-00"^^xsd:dateTime <= ?p577 && ?p577 < "2021-00-00"^^xsd:dateTime) # change the year limits if needed
?item wikibase:sitelinks ?sitelinks .
} } AS %i WHERE {
INCLUDE %i
?item wdt:P31 [rdfs:label ?p31label] . FILTER(LANG(?p31label)='en') # change the language code if needed
OPTIONAL { ?article schema:about ?item ;
schema:isPartOf <https://en.wikipedia.org/> ;
schema:name ?sitelink .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} GROUP BY ?item ?itemLabel ?itemDescription ?type ?p577 ?sitelinks ?sitelink ?article ORDER BY DESC(?sitelinks) ?itemLabel
- No, that is not correct. The query tells the number of sitelinks to Wikimedia pages that each item has, but that doesn't say anything about translated pages. There is no information about translations, so that cannot be queried. --Dipsacus fullonum (talk) 13:36, 24 February 2022 (UTC)
- So this query:
- SELECT ?item (COUNT(?wikipedia_page) AS ?wikipedia_pages)
- {
- VALUES ?item { wd:Q47703 }
- ?wikipedia_page schema:about ?item .
- ?wikipedia_page schema:isPartOf/wikibase:wikiGroup "wikipedia" .
- }
- GROUP BY ?item
- Cannot be done for each entity in my query? 137.204.150.17 13:49, 24 February 2022 (UTC)
Video games with most "has part" statements.
AntisocialRyan (Talk) 16:15, 24 February 2022 (UTC)
- @AntisocialRyan: Something like:
- --Tagishsimon (talk) 18:49, 24 February 2022 (UTC)Try it!
SELECT ?item ?itemLabel (COUNT(?part) as ?count) WHERE { ?item wdt:P31/wdt:P279* wd:Q7889 . ?item wdt:P527 ?part . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } GROUP BY ?item ?itemLabel ORDER BY DESC(?count)
Reality check
I presume I'm doing something wrong with the SPARQL, but can't see it. I'm looking for normal rank statements that have a 'reason for preferred rank' qualifier. On the face of it, some are being found. But when I check the statement in the WD UI, it's set to preferred rank.
SELECT distinct ?item ?itemLabel ?statement ?property ?propertyLabel ?rank
WHERE
{
?statement pq:P7452 []. hint:Prior hint:runFirst true. # statement has a qualifier "reason for preferred rank"
?statment wikibase:rank wikibase:NormalRank . # statement is of normal rank
?statment wikibase:rank ?rank. # get the rank anyway
?item ?predicate ?statement . # link the statement to an item
?property wikibase:claim ?predicate . # ensure the predicate is p:
} limit 100
--Tagishsimon (talk) 12:14, 25 February 2022 (UTC)
- I've spent hours trying to track down this bug in the past, check the spelling of statement in lines 5 & 6 Piecesofuk (talk) 15:46, 25 February 2022 (UTC)
- @Tagishsimon: It a spelling error. You have two different variables ?statement and ?statment.
- Thank you both. Obligatory Doh! --Tagishsimon (talk) 17:22, 25 February 2022 (UTC)
Splitting pairs
SELECT ?a ?aLabel ?b ?bLabel
WHERE {
?a wdt:P1696 ?b .
?b wdt:P1696 ?a .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
I'm having problems finding a solution to this problem: For every A that is an inverse and complementary property of B, split the pairs and discard half. For instance father/child is a pair so discard child/father. Infrastruktur (talk) 08:44, 26 February 2022 (UTC)
- @Infrastruktur: All things considered, this?
- --Tagishsimon (talk) 11:42, 26 February 2022 (UTC)Try it!
SELECT ?a ?aLabel ?b ?bLabel WHERE { ?a wdt:P1696 ?b . ?b wdt:P1696 ?a . filter (str(?a) < str(?b)) SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } }
- Thanks, clever solution. Infrastruktur (talk) 12:07, 26 February 2022 (UTC)
Each work that has won an Italian book prize
I would like to have all the works, authors and year of Italian literary awards/prize winners, divided by award. Is it possible?
- It's certainly not all of them, but it's all that we have so far (550 results at the moment):
- Try it!
SELECT DISTINCT ?award ?awardLabel ?year ?winner ?winnerLabel ?work ?workLabel WHERE { ?award wdt:P31/wdt:P279* wd:Q378427 . ?award wdt:P17 wd:Q38 . ?award ^ps:P166 ?statement . ?statement ^p:P166 ?winner . ?winner wdt:P31 wd:Q5 . OPTIONAL{?statement pq:P1686 ?work .} OPTIONAL{?statement pq:P585 ?point_in_time . BIND(YEAR(?point_in_time) AS ?year)} SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],it,en" .} BIND(xsd:integer(SUBSTR(STR(?award),33,99)) AS ?qid) } ORDER BY ASC(?qid) DESC(?year)
- As you can see, there is plenty more information to be added!
- EDIT: I should note that not all of these prizes are awarded for a particular work, so some gaps will be expected.
- --Quesotiotyo (talk) 19:50, 28 February 2022 (UTC)
- Here is another version that requires that the literary work has the award received (P166) as property instead of being stated as a qualifier for the award received (P166) property of the author.
- Try it!
SELECT DISTINCT ?awardLabel (YEAR(?p585) AS ?year) ?literary_work ?literary_workLabel ?authorLabel WITH { SELECT ?literary_work ?award ?author ?p585 WHERE { ?award wdt:P31/wdt:P279* wd:Q378427 . ?award wdt:P17 wd:Q38 . ?literary_work p:P166 ?p166stm . ?p166stm ps:P166 ?award . OPTIONAL { ?p166stm pq:P585 ?p585 } OPTIONAL { ?literary_work wdt:P50 ?author } } } AS %i WHERE { INCLUDE %i ?literary_work wdt:P31/wdt:P279* wd:Q386724 . SERVICE wikibase:label { bd:serviceParam wikibase:language "it,en". } } ORDER BY ?awardLabel ?year ?STR(?literary_workLabel)
- --Larske (talk) 22:59, 28 February 2022 (UTC)
Use differents colors
Hi,
I want to have the ?en in one color and ?fr in another in this query. Can someone help me ? (this page really need love...)
#defaultView:Map
select ?item ?coord ?en ?fr where {?item wdt:P17 wd:Q16 ; wdt:P131* wd:Q1965 ; wdt:P625 ?coord . {{?en schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> .} UNION {?fr schema:about ?item ; schema:isPartOf <https://fr.wikipedia.org/> .}}}
Simon Villeneuve (talk) 03:06, 28 February 2022 (UTC)
- @Simon Villeneuve: Something along these lines, I think. Setting ?rgb is, I think, optional; it'll select colours for you if you don't specify them.
#defaultView:Map{"hide":["?rgb","?layer"]}
hides the ?rgb and ?layer variable from the mouse-click pop-up. Documentation, such as it is, here: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Wikidata_Query_Help/Result_Views#Map
- Try it!
#defaultView:Map{"hide":["?rgb","?layer"]} SELECT ?item ?coord ?en ?fr ?layer ?rgb WHERE { ?item wdt:P17 wd:Q16; (wdt:P131*) wd:Q1965; wdt:P625 ?coord. { { ?en schema:about ?item; schema:isPartOf <https://en.wikipedia.org/>. } UNION { ?fr schema:about ?item; schema:isPartOf <https://fr.wikipedia.org/>. } } BIND (if(BOUND(?en) && BOUND(?fr),"ff0000",if(BOUND(?en),"00ff00",if(BOUND(?fr),"0000FF","000000" ))) as ?rgb) # set the dot colour BIND (if(BOUND(?en) && BOUND(?fr),"EN & FR",if(BOUND(?en),"EN",if(BOUND(?fr),"FR","NIL" ))) as ?layer) # set the layer }
- Slightly unclear to me why this is not a better query, but you may have your reasons:
- Try it!
#defaultView:Map{"hide":["?rgb","?layer"]} SELECT ?item ?coord ?en ?fr ?layer ?rgb WHERE { ?item wdt:P17 wd:Q16; (wdt:P131*) wd:Q1965; wdt:P625 ?coord. OPTIONAL { ?en schema:about ?item; schema:isPartOf <https://en.wikipedia.org/>. } OPTIONAL { ?fr schema:about ?item; schema:isPartOf <https://fr.wikipedia.org/>. } FILTER(BOUND(?en) || BOUND(?fr)) BIND (if(BOUND(?en) && BOUND(?fr),"ff0000",if(BOUND(?en),"00ff00",if(BOUND(?fr),"0000FF","000000" ))) as ?rgb) BIND (if(BOUND(?en) && BOUND(?fr),"EN & FR",if(BOUND(?en),"EN",if(BOUND(?fr),"FR","NIL" ))) as ?layer) }
--Tagishsimon (talk) 09:23, 28 February 2022 (UTC)
- @Tagishsimon: Thank you very much !
Yes, the second one is much better !
I saw the "Result_View#Map" page and, as I said, it need "love". I mean by that that there's too less informations about how use the options like "layer", especially the BIND part.
Thank you again. As usual, you find a way to solve my problems and more ! Simon Villeneuve (talk) 12:37, 28 February 2022 (UTC)- Too little information; agreed. That & other pages (*cough* MWAPI documentation *cough*) could do with more worked examples. Still too much of a black art right now. On that second query, you'll also get items with no EN/FR site link by commenting out
FILTER(BOUND(?en) || BOUND(?fr))
, but you've probably spotted that. Many more of them than items that do have one or other or both sitelinks. --Tagishsimon (talk) 14:55, 28 February 2022 (UTC)
- Too little information; agreed. That & other pages (*cough* MWAPI documentation *cough*) could do with more worked examples. Still too much of a black art right now. On that second query, you'll also get items with no EN/FR site link by commenting out