Wikidata:Request a query/Archive/2022/02

From Wikidata
Jump to navigation Jump to search

search for books in a book series

I would like to be able to provide an author, character, or book name and get a list of book series and the books in order. This could be three separate queries if need be. One for author, one for character, and one for book title.

I have tried alot of different queries including dbpedia.org queries but wikidata seems like a better place for this kind of stuff.

The following query works but I only get 3 titles back when I know there 20+ books by this author. Is this just that wikidata doesn't have the data where as wikipedia does? Or is there something wrong with my query. This query is just getting book titles by author. I believe I can figure out how to get books in order once I can get all the books by an author.

https://www.wikidata.org/wiki/Wikidata:WikiProject_Books The above link seems to indicate that I should be able to get books before and after a particular book using p155 follows p156 followed by; and p179 part of the series.

I have also tried limiting the search to wdt:P31/wdt:P279* wd:Q571. Which I think would limit the items to books but this seems to remove some of John Sandfords books from the result set.

Thanks much for any help that can be provided.

SELECT distinct ?book ?bookLabel ?authorLabel WHERE {

 ?author rdfs:label "John Sandford"@en .
 
 ?book wdt:P50 ?author .
 SERVICE wikibase:label {
   bd:serviceParam wikibase:language "en" .
 }

}

I don't think there are items for more than the 3 books by John Sandford which you have found. Here is a query which can order the books after part of the series (P179) and series ordinal (P1545) if these are present. You can try it with other authors.
SELECT ?book ?bookLabel ?seriesLabel ?number ?followsLabel ?followed_byLabel
WHERE
{
  ?book wdt:P50 wd:Q1701679 . # book by John Sandford
  OPTIONAL
  {
    ?book p:P179 ?series_statement .
    ?series_statement ps:P179 ?series .
    OPTIONAL { ?series_statement pq:P1545 ?number . }
    OPTIONAL { ?series_statement pq:P155 ?follows . }
    OPTIONAL { ?series_statement pq:P156 ?followed_by . }
  }
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
ORDER BY ?series ?number
Try it!
--Dipsacus fullonum (talk) 10:10, 1 February 2022 (UTC)
Reminder that there's a What links here link on all items, in this case usefully confirming that only four items point to this author. --Tagishsimon (talk) 11:07, 1 February 2022 (UTC)
Thanks much for the query and comments. When I look at the wikipedia article for John Sandford it contains all the books in the 'Prey' series. However, as you point out it appears that wikidata does not contain this information. Do you know why that would be? I thought wikidata was built from wikipedia.
Another question I have for you is that I changed authors to Lee Child. In this case I see most of not all of his books. However, the labels for followedby and follows are empty. However, when I look at the items individually those items have values. Do you know why the labels are not being output in the results?
Thanks
Brady
SELECT ?book ?bookLabel ?seriesLabel ?number ?followsLabel ?followed_byLabel
WHERE
{
?book wdt:P50 wd:Q333719 . # book by Lee Child
OPTIONAL
{
?book p:P179 ?series_statement .
?series_statement ps:P179 ?series .
OPTIONAL { ?series_statement pq:P1545 ?number . }
OPTIONAL { ?series_statement pq:P155 ?follows . }
OPTIONAL { ?series_statement pq:P156 ?followed_by . }
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
ORDER BY ?series ?number
Bradyguy18 (talk) 17:34, 1 February 2022 (UTC)
@Bradyguy18: The query looks for follows (P155) and followed by (P156) used as qualifiers to a statement with part of the series (P179). But for the Lee Child book items, follows (P155) and followed by (P156) are instead used in independent statements, which the query doesn'ẗ look for. It means that the context for the values for P155 or P156 isn't stated, and I think that is a problem. But the query can of course be changed to also find the values for P155 and P156 when used independently, e.g. like this:
SELECT ?book ?bookLabel ?seriesLabel ?number ?followsLabel ?followed_byLabel
WHERE
{
  ?book wdt:P50 wd:Q333719 . # book by Lee Child
  OPTIONAL
  {
    ?book p:P179 ?series_statement .
    ?series_statement ps:P179 ?series .
    OPTIONAL { ?series_statement pq:P1545 ?number . }
    OPTIONAL { ?series_statement pq:P155 ?follows_qualifier . }
    OPTIONAL { ?series_statement pq:P156 ?followed_by_qualifier . }
  }
  OPTIONAL { ?book wdt:P155 ?follows_main . }
  OPTIONAL { ?book wdt:P156 ?followed_by_main . }
  BIND (COALESCE(?follows_qualifier, ?follows_main) AS ?follows) 
  BIND (COALESCE(?followed_by_qualifier, ?followed_by_main) AS ?followed_by) 
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
ORDER BY ?series xsd:integer(?number)
Try it!
--Dipsacus fullonum (talk) 19:15, 1 February 2022 (UTC)

Atypical uses of P642

Items with any statement qualified with of (P642), where the property of the main statement is not instance of (P31), subclass of (P279), or URL (P2699). Thanks! Swpb (talk) 17:07, 1 February 2022 (UTC)

@Swpb: There are too many to list all. Here is a query limited to 1000 results:
SELECT ?item ?itemLabel ?statement ?prop ?propLabel ?valueLabel ?forLabel
WITH
{
  SELECT ?item ?claim ?statement ?for
  WHERE
  {
    ?item ?claim ?statement .
    ?statement pq:P642 ?for .
    FILTER (?claim != p:P31 && ?claim != p:P279 && ?claim != p:P2699)
  }
  LIMIT 1000
} AS %query
WHERE
{
  INCLUDE %query
  ?prop wikibase:claim ?claim .
  ?prop wikibase:statementProperty ?sp .
  ?statement ?sp ?value .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  }
}
Try it!
--Dipsacus fullonum (talk) 19:35, 1 February 2022 (UTC)
Thanks! Swpb (talk) 20:27, 1 February 2022 (UTC)

Films: One film per row; cells with > 1 value

I've adapted the following miserably little query of films made in 2017. I'm looking for 5 core items in columns: country, genre, language, director and script writer. Each film should have ONE row (at present, some have 20!). All 5 columns can have more than one value eg the film could be in more than one language. I've tried adding two things: GROUP_CONCAT and Group by, but the query fails. The separator should be a $$$ rather than a comma (end result = csv file). Adding GROUP_CONCAT to every one of the 5 items seems repetitive, and I'm sure there's a simpler way of doing this? If there's documentation on GROUP_CONCAT, please let me know.

SELECT DISTINCT ?item ?itemLabel ?languageLabel ?countryLabel ?directorLabel ?genreLabel ?scriptLabel (GROUP_CONCAT(DISTINCT(str(?genre)); separator="$$$ ") as ?genre)

WHERE {
  ?item wdt:P31 wd:Q11424.
  ?item wdt:P577 ?pubdate.
    OPTIONAL { ?item wdt:P495 ?country }
    OPTIONAL { ?item wdt:P136 ?genre }
    OPTIONAL { ?item wdt:P364 ?language }
    OPTIONAL { ?item wdt:P57 ?director }
    OPTIONAL { ?item wdt:P58 ?script }
 
  FILTER((?pubdate >= "2017-01-01T00:00:00Z"^^xsd:dateTime) && (?pubdate <= "2017-12-31T00:00:00Z"^^xsd:dateTime))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "cy,en,de,fr,ru,bg,cs,po,pl,nl,sl,eun,ro,oc,az,ast,my,tl,ga,br,it,cs,hu,ca,es,no,nn,gl". }
}

GROUP BY ?genre ?item ?itemLabel
ORDER BY ?item
Try it!

Many thanks! Llywelyn2000 (talk) 07:28, 2 February 2022 (UTC)

@Llywelyn2000: Probably this sort of thing, presuming it is the labels for country, language &c you're after. But yes, you need to repeatedly groupconcat for each element for which you want all values on a single row. And b/c the query has to groupconcat labels, the label service needs to be told to produce them early enough, hence all of the rdfs:label stuff in the label service call - one clause for each label, including the non-groupconcat'd ?itemLabel.
SELECT DISTINCT ?item ?itemLabel 
(GROUP_CONCAT(DISTINCT ?languageLabel_ ; separator="$$$ ") as ?languageLabel) 
(GROUP_CONCAT(DISTINCT ?countryLabel_ ; separator="$$$ ") as ?countryLabel)
(GROUP_CONCAT(DISTINCT ?directorLabel_ ; separator="$$$ ") as ?directorLabel)
(GROUP_CONCAT(DISTINCT ?genreLabel_ ; separator="$$$ ") as ?genreLabel)
(GROUP_CONCAT(DISTINCT ?scriptLabel_ ; separator="$$$ ") as ?scriptLabel)  
(GROUP_CONCAT(DISTINCT(str(?genre)); separator="$$$ ") as ?genre)

WHERE {
  ?item wdt:P31 wd:Q11424.
  ?item wdt:P577 ?pubdate.
    OPTIONAL { ?item wdt:P495 ?country }
    OPTIONAL { ?item wdt:P136 ?genre }
    OPTIONAL { ?item wdt:P364 ?language }
    OPTIONAL { ?item wdt:P57 ?director }
    OPTIONAL { ?item wdt:P58 ?script }
 
  FILTER((?pubdate >= "2017-01-01T00:00:00Z"^^xsd:dateTime) && (?pubdate <= "2017-12-31T00:00:00Z"^^xsd:dateTime))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "cy,en,de,fr,ru,bg,cs,po,pl,nl,sl,eun,ro,oc,az,ast,my,tl,ga,br,it,cs,hu,ca,es,no,nn,gl". 
                                           ?item rdfs:label ?itemLabel .
                                           ?language rdfs:label ?languageLabel_ .
                                           ?country rdfs:label ?countryLabel_ .
                                           ?director rdfs:label ?directorLabel_ .
                                           ?genre rdfs:label ?genreLabel_ .
                                           ?script rdfs:label ?scriptLabel_ .
                         }
}

GROUP BY ?item ?itemLabel
ORDER BY ?item
Try it!
--Tagishsimon (talk) 08:25, 2 February 2022 (UTC)
(after edit conflict) @Llywelyn2000: The solution is to use GROUP_CONCAT for all 5 columns where you want to aggregate the values as you say. I know of no special documentation on GROUP_CONCAT besides the SPARQL definition itself (https://www.w3.org/TR/sparql11-query/) and various tutorials. But don't GROUP BY the variable which you will aggregate. That will put different values in different rows. Also don't reuse variable names after "AS". I added an underscore in some variable names to avoid that in the query below. Finally you cannot aggregate over variables created by the label service in automatic mode, but have to switch to the manual mode. This is explained in the user manual in section mw:Wikidata Query Service/User Manual#Label service.
SELECT DISTINCT ?item ?itemLabel
  (GROUP_CONCAT(DISTINCT ?countryLabel; separator="$$$ ") AS ?country)
  (GROUP_CONCAT(DISTINCT ?genreLabel; separator="$$$ ") AS ?genre)
  (GROUP_CONCAT(DISTINCT ?languageLabel; separator="$$$ ") AS ?language)
  (GROUP_CONCAT(DISTINCT ?directorLabel; separator="$$$ ") AS ?director)
  (GROUP_CONCAT(DISTINCT ?scriptLabel; separator="$$$ ") AS ?script)
WHERE {
  ?item wdt:P31 wd:Q11424 .
  ?item wdt:P577 ?pubdate . hint:Prior hint:rangeSafe true .
  OPTIONAL { ?item wdt:P495 ?country_ }
  OPTIONAL { ?item wdt:P136 ?genre_ }
  OPTIONAL { ?item wdt:P364 ?language_}
  OPTIONAL { ?item wdt:P57 ?director_ }
  OPTIONAL { ?item wdt:P58 ?script_ }
 
  FILTER ((?pubdate >= "2017-01-01T00:00:00Z"^^xsd:dateTime) && (?pubdate <= "2017-12-31T00:00:00Z"^^xsd:dateTime))
  SERVICE wikibase:label
  {
    bd:serviceParam wikibase:language "cy,en,de,fr,ru,bg,cs,po,pl,nl,sl,eun,ro,oc,az,ast,my,tl,ga,br,it,cs,hu,ca,es,no,nn,gl" .
    ?item rdfs:label ?itemLabel .
    ?country_ rdfs:label ?countryLabel .
    ?language_ rdfs:label ?languageLabel .
    ?genre_ rdfs:label ?genreLabel .
    ?director_ rdfs:label ?directorLabel .
    ?script_ rdfs:label ?scriptLabel .
  }
}
GROUP BY ?item ?itemLabel
ORDER BY ?item
Try it!
--Dipsacus fullonum (talk) 08:31, 2 February 2022 (UTC)
@Dipsacus fullonum, Tagishsimon: Whow! Many, many thanks both of you! This will be really handy, and thanks for the explanation! Best regards to both of you! Llywelyn2000 (talk) 08:48, 2 February 2022 (UTC)

Extract columns from regex

Given the following query

SELECT ?item ?id ?name
WHERE {
  ?item p:P396 ?st .
  ?st ps:P396 ?id .
  ?st pq:P1810 ?name . FILTER(REGEX(?name, "<([\\S\\s]+\\d+[\\S\\s]*)-([\\S\\s]*)"))
}
Try it!

you can see that my regex contains two subgroups, ([\\S\\s]+\\d+[\\S\\s]*) and ([\\S\\s]*). Is there a way to assign two variables to these two subgroups, so that I can display them as columns in the results of the query? Thanks, --Epìdosis 13:41, 4 February 2022 (UTC)

I have found (unexpectedly) an inelegant but somehow efficient solution:
SELECT ?item ?id ?name ?first ?second
WHERE {
  ?item p:P396 ?st .
  ?st ps:P396 ?id .
  ?st pq:P1810 ?name . FILTER(REGEX(?name, "<(.+\\d+.*)-(.*)"))
  BIND(REPLACE(STR(?name),".+<","") AS ?first_cl)
  BIND(REPLACE(STR(?first_cl),"-.+>","") AS ?first)
  BIND(REPLACE(STR(?name),".+<.+-","") AS ?second_cl)
  BIND(REPLACE(STR(?second_cl),">","") AS ?second)
}
Try it!
probably it could be written in a better way, so suggestions are still welcome! --Epìdosis 14:25, 4 February 2022 (UTC)
@Epìdosis: You can extract matched substrings by using "$n" as the replacement string i the replace function. You will need to add "^.*" and ".*$" to the search pattern to ensure the whole string is matched and replaced:
SELECT ?item ?id ?name ?first ?second
WHERE {
  ?item p:P396 ?st .
  ?st ps:P396 ?id .
  ?st pq:P1810 ?name .
  FILTER REGEX(?name, "<([\\S\\s]+\\d+[\\S\\s]*)-([\\S\\s]*)")
  BIND (REPLACE (?name, "^.*<([\\S\\s]+\\d+[\\S\\s]*)-([\\S\\s]*).*$", "$1") AS ?first)
  BIND (REPLACE (?name, "^.*<([\\S\\s]+\\d+[\\S\\s]*)-([\\S\\s]*).*$", "$2") AS ?second)
}
Try it!
--Dipsacus fullonum (talk) 15:31, 4 February 2022 (UTC)
@Dipsacus fullonum: That's perfect! I had tried to use something like "$n", but I didn't know exactly how to do it. Thanks! --Epìdosis 15:36, 4 February 2022 (UTC)

Need help to search for New York Times journalists that are alive

I tried this query after reading some documentation and examples, but I'm not getting what I want. Here's the query code:

SELECT ?item ?itemLabel
WHERE {
  ?item wdt:P31 wd:Q5. # instance of human
  ?item wdt:P106 wd:Q1930187. # occupation is journalist
  ?item wdt:P108 wd:Q9684. # employer is the New York Times (NYT)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } #itemLabel is in English
} 
ORDER BY DESC(?P569) # supposed to order the list by date of birth
Try it!

My query gives a list where the date of birth is completely random. #2 in the list can be born in 1951, and further down in the list, someone is born in 1970.

I'd like to know if it's possible to 1)query only living people, who are journalists, and are employed by NYT or 2)query all journalists employed by NYT, but in descending order (so it goes from youngest to oldest)

Thanks! Tet (talk) 02:38, 4 February 2022 (UTC)

The reason the list was in random order was because you have to retrieve the data for the variable you sort on. I'm assuming you want only current employees listed, so I added that requirement as well.
SELECT DISTINCT ?item ?itemLabel ?dob ?age
WHERE {
  ?item wdt:P106 wd:Q1930187;
        p:P108/ps:P108 wd:Q9684;
        p:P108 ?st.
  OPTIONAL {
    ?item wdt:P569 ?dob.
    BIND(xsd:integer((NOW() - ?dob) / 365.25) AS ?age)
  }
  MINUS { # Former employee
    ?st ps:P108 wd:Q9684;
        pq:P582 [].
  }
  MINUS { ?item wdt:P570 ?dod. } # Living deprived
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?dob)
Try it!
Infrastruktur (talk) 07:10, 4 February 2022 (UTC)
Infrastruktur wrote "I'm assuming you want only current employees listed, so I added that requirement as well."
Well, not exactly. You removed persons that only have P108 statements with The New York Times (Q9684) as value and an end date. Journalists who also had other employers, was still listed even if they no longer is at NYT, like. e.g. Sewell Chan (Q7458218). I suggest this query instead:
SELECT DISTINCT ?item ?itemLabel ?dob ?age
WHERE {
  ?item wdt:P106 wd:Q1930187;
        p:P108 ?NYT_statement.
  ?NYT_statement ps:P108 wd:Q9684;
  OPTIONAL {
    ?item wdt:P569 ?dob.
    BIND(xsd:integer((NOW() - ?dob) / 365.25) AS ?age)
  }
  MINUS { # Former employee
    ?NYT_statement pq:P582 [].
  }
  MINUS { ?item wdt:P570 ?dod. } # Living deprived
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?dob)
Try it!
You can also add check for a value for date of death (P570), but I didnẗ do that. --Dipsacus fullonum (talk) 09:33, 4 February 2022 (UTC)
There already is a check for a death date, so I struck the last comment. --Dipsacus fullonum (talk) 09:40, 4 February 2022 (UTC)
Thank you both! Tet (talk) 19:20, 4 February 2022 (UTC)
Nice catch. Also there's a couple of journos with birthdate given as a century, so look out for anyone around 22 or 122 years old, as their age is probably wrong. Infrastruktur (talk) 07:13, 5 February 2022 (UTC)

All NGOs with revenue in 2017

Hello wizzards of SPARQL, since OpenRefine has problems with qualifiers, I'm in need of a list of all NGOs (instance of (P31) is nonprofit organization (Q163740) and its subclasses) which have a total revenue (P2139) with the qualifier point in time (P585) being "2017" or some date in 2017. Thank you! --Newt713 (talk) 13:40, 7 February 2022 (UTC)

@Newt713: This, I think:
SELECT ?item ?itemLabel ?revenue ?unitLabel ?time
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q163740 . 
  ?item p:P2139 ?stat.
  ?stat psv:P2139/wikibase:quantityAmount ?revenue . 
  ?stat psv:P2139/wikibase:quantityUnit ?unit. 
  ?stat pq:P585 ?time . hint:Prior hint:rangeSafe true.
  FILTER("2017-00-00"^^xsd:dateTime <= ?time &&
         ?time < "2018-00-00"^^xsd:dateTime)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } 
}
Try it!
--Tagishsimon (talk) 14:21, 7 February 2022 (UTC)
@Tagishsimon: Perfect. That's it! Now I can exclude all those from my import. Thank you for your fast and profound help! --Newt713 (talk) 15:01, 7 February 2022 (UTC)

limit OPTIONAL

SELECT DISTINCT ?item ?birth ?pob
WHERE {
  VALUES ?item { wd:Q6701 }
  OPTIONAL {?item wdt:P569 ?birth .}
  OPTIONAL {?item wdt:P19 ?pob .}
}
Try it!

This selects P569 (date of birth) and P19 (place of birth) from Q6701 (Jacob Grimm). But it returns two results, because there are two entries for P19 with the same rank. Is there a way to limit OPTIONAL to the first result? It's important that only P19 is limited because for P569 I want two results if there two entries with the same rank for the date. --Slomox (talk) 14:58, 7 February 2022 (UTC)

@Slomox: You'd probably solve that issue by aggregating the results on ?item and ?birth, and then GROUP_CONCAT() or SAMPLE() the ?pob. There'd be more work to do if you wanted the ?pobLabel - can lookat that if you wish.
SELECT DISTINCT ?item ?birth (GROUP_CONCAT(?pob) as ?placeOfBirth)
WHERE {
  VALUES ?item { wd:Q6701 }
  OPTIONAL {?item wdt:P569 ?birth .}
  OPTIONAL {?item wdt:P19 ?pob .}
} group by ?item ?birth
Try it!
--Tagishsimon (talk) 16:45, 7 February 2022 (UTC)
@Slomox: (After edit conflict) "Is there a way to limit OPTIONAL to the first result?" There is no first result as there is no defined order. You can limit to one random result by grouping on all other variables in the select clause:
SELECT DISTINCT ?item ?birth (SAMPLE(?pob) AS ?sample_pob)
WHERE {
  VALUES ?item { wd:Q6701 }
  OPTIONAL {?item wdt:P569 ?birth .}
  OPTIONAL {?item wdt:P19 ?pob .}
}
GROUP BY ?item ?birth
Try it!
Another way is to convert the optional clause into a subquery with a limit:
SELECT DISTINCT ?item ?birth ?pob
WHERE
{
  VALUES ?item { wd:Q6701 }
  OPTIONAL {?item wdt:P569 ?birth .}
  {
    SELECT ?pob
    WHERE
    {
      VALUES ?item { wd:Q6701 }
      OPTIONAL {?item wdt:P19 ?pob .}
    }
    LIMIT 1
  }
Try it!
--Dipsacus fullonum (talk) 16:59, 7 February 2022 (UTC)
Thank you to both of you! I went with Tagishsimon's solution and it solved my problem. --Slomox (talk) 17:38, 7 February 2022 (UTC)

Group by the domain of the URL

Want to transform the following query just shows everything from URL upto (but not including) the item following the last slash and groups them by it so that for example <http://data.thenextweb.com/tnw/entity/belgium>becomes http://data.thenextweb.com/tnw/entity/ so that we can see who is using this URL service and what URL we need to create URL pattern matches for, I have some regex skill but havn't had any luck applying them to SPARQL

# This query finds all usages of property P6363 AS a statement.

SELECT ?item ?itemLabel ?statement ?rank ?snacktype ?value
{
  ?item p:P6363 ?statement .
  ?statement wikibase:rank ?rank .
  OPTIONAL { ?statement ps:P6363 ?dbvalue }

  BIND(COALESCE(IF(wikibase:isSomeValue(?dbvalue), "unknown", "concrete"), "novalue") AS ?snacktype)
  BIND(IF(wikibase:isSomeValue(?dbvalue), "", ?dbvalue) AS ?value)

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
LIMIT 1000
Try it!

Back ache (talk) 15:51, 8 February 2022 (UTC)

No fancy SPARQL needed in this case I guess.
SELECT DISTINCT ?value
{
  ?item wdt:P6363 ?dbvalue .

  BIND(URI(REPLACE(STR(?dbvalue), "^(https?://.+/)(.*?)$", "$1")) AS ?value)

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
ORDER BY ?value
Try it!
Infrastruktur (talk) 17:44, 8 February 2022 (UTC)
@Back ache: It can be even shorter. You can create a new IRI value from an existing IRI with all characters after the last slash removed with:
BIND (IRI(REPLACE(STR(?value), "^(.*/).*$", "$1")) AS ?path)
You didn't tell how to aggregate other variables in a group, so I didn't have enough information to write a query with a GROUP BY section. BTW, your query above links to WCQS (Wikimedia Commons Query Service). I suppose that is an error. Remove the template argument |project=sdc to make a link to WDQS instead. --Dipsacus fullonum (talk) 18:00, 8 February 2022 (UTC)

get linked wikipedia pages

I want to write a query where I can get all the linked Wikipedia pages' URL and languages as mentioned on the wikidata page. For example, https://www.wikidata.org/wiki/Q155164 mentions 73 entries for Wikipedia pages with different languages at the end. I need a query to get that table

The data model is described at https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Sitelinks and suggests somethng like:
SELECT ?item ?itemLabel ?article ?sitelink ?platform ?group ?language
WHERE 
{
  VALUES ?item {wd:Q155164} 
  ?article schema:about ?item ;
           schema:isPartOf ?platform ; 
           schema:name ?sitelink ;
           schema:inLanguage ?language .
  ?platform wikibase:wikiGroup ?group .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } 
}
Try it!
--Tagishsimon (talk) 06:27, 9 February 2022 (UTC)

Query failing

Hello, the below query has worked fine for years with Listeria, but for some times it encounters a timeout. Can someone help me optimizing it please?

SELECT ?item ?INSEE ?itemLabel (CONCAT("[[Wikidata:WikiProject France/Églises/",?INSEE,"\u007C",?itemLabel,"]]") AS ?LIEN) ?depCount (IF(?_withCoord > ?depCount, 0, ?depCount - ?_withCoord) AS ?withoutCoord) (IF(?_withImg > ?depCount, 0, ?depCount - ?_withImg) AS ?withoutImg) (ROUND((IF(?_withCoord > ?depCount, ?depCount, ?_withCoord) / ?depCount) * 100 * 100) / 100 AS ?withCoord) (ROUND((IF(?_withImg > ?depCount, ?depCount, ?_withImg) / ?depCount) * 100 * 100) / 100 AS ?withImg)
WHERE {
  {
    SELECT ?item (COUNT(DISTINCT ?item2) AS ?depCount) (COUNT(DISTINCT ?coord) AS ?_withCoord) (COUNT(DISTINCT ?img) AS ?_withImg) WHERE {
      ?item2 wdt:P131+ wd:Q142.
      hint:Prior hint:gearing "forward".
      ?item2 wdt:P31/wdt:P279* wd:Q16970.
      hint:Prior hint:gearing "forward".
      ?item2 wdt:P131 ?commune.
      ?commune p:P131 ?stmt .
      ?stmt ps:P131 ?item .
      MINUS { ?stmt pq:P582 [] } .
      ?item wdt:P2586 [] .
      OPTIONAL { ?item2 wdt:P625 ?coord } .
      OPTIONAL { ?item2 wdt:P18 ?img } .
    } GROUP BY ?item
  } .
  ?item wdt:P2586 ?INSEE .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "fr" } .
}
ORDER BY ?INSEE
Try it!

Thanks! Ayack (talk) 14:09, 9 February 2022 (UTC)

@Ayack It seems to help a bit to remove the « minus » checking that the statement has no end date and to rely on the preferred rank. It assumes any former commune has a better ranked statement (even with no value Help as preferred if needed) but this seems workable.
What seems to be the most efficient however is to rely on the commune of France (Q484170)  View with Reasonator View with SQID to find all the communes. It seems to cut by about more than a half the time of the query. Are the results OK ?
SELECT ?item ?INSEE ?itemLabel (CONCAT("[[Wikidata:WikiProject France/Églises/",?INSEE,"\u007C",?itemLabel,"]]") AS ?LIEN) ?depCount (IF(?_withCoord > ?depCount, 0, ?depCount - ?_withCoord) AS ?withoutCoord) (IF(?_withImg > ?depCount, 0, ?depCount - ?_withImg) AS ?withoutImg) (ROUND((IF(?_withCoord > ?depCount, ?depCount, ?_withCoord) / ?depCount) * 100 * 100) / 100 AS ?withCoord) (ROUND((IF(?_withImg > ?depCount, ?depCount, ?_withImg) / ?depCount) * 100 * 100) / 100 AS ?withImg)
WHERE {
  {
    SELECT ?item (COUNT(DISTINCT ?item2) AS ?depCount) (COUNT(DISTINCT ?coord) AS ?_withCoord) (COUNT(DISTINCT ?img) AS ?_withImg) WHERE {
      ?commune wdt:P31/wdt:P279* wd:Q484170 .
      ?item2 wdt:P31/wdt:P279* wd:Q16970.
      hint:Prior hint:gearing "forward".
      ?item2 wdt:P131 ?commune.
      ?commune wdt:P131 ?item .
      ?item wdt:P2586 ?INSEE .
      
      OPTIONAL { ?item2 wdt:P625 ?coord } .
      OPTIONAL { ?item2 wdt:P18 ?img } .
    } GROUP BY ?item
  } .
  ?item wdt:P2586 ?INSEE .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "fr" } .
}
ORDER BY ?INSEE
Try it!
author  TomT0m / talk page 15:48, 9 February 2022 (UTC)
@Ayack: Another possibility is to just remove
      ?item2 wdt:P131+ wd:Q142.
      hint:Prior hint:gearing "forward".
It isn't necessary as only French departments have INSEE department code (P2586). --Dipsacus fullonum (talk) 16:18, 9 February 2022 (UTC)
Thank you both. It's now working in WDQ but the list is still not updated in WD. I was thinking it was due to timeout but it seems actually to be a Listeria issue: [1] :(. Ayack (talk) 16:44, 9 February 2022 (UTC)
It might be the quote in one of the column name. The new Rust listeria is more sensitive to that kind on thing, I tried to remove it. author  TomT0m / talk page 16:48, 9 February 2022 (UTC)
@Ayack That was that, it worked. author  TomT0m / talk page 16:49, 9 February 2022 (UTC)
Great! Thanks @TomT0m! Ayack (talk) 16:51, 9 February 2022 (UTC)

Billionaires

I thought it might be fun to see what sort of data WD contained concerning billionaires. Based on net worth (P2218), I am able to generate a list of billionaires, but there are quite a few problems.

First, I do not know how to control for units. As a result the #1 and #3 results are in CZ koruna and Indonesian rupiah respectively. Could someone explain how to control for units? (i.e. only include those that are listed in, say, dollars)

The other problem is related to multiple entries (2019, 2021, etc.) for net worth (many people show up more than once), although I'll need to look into this problem more carefully, as E. Musk also has multiple entries for net worth but does not appear multiple times.

I very much appreciate any help with the first problem (defining units in the query) and any ideas regarding the second. SashiRolls (talk) 16:30, 9 February 2022 (UTC)

@SashiRolls: You get the currency from the full value representation, see mw:Wikibase/Indexing/RDF Dump Format. There is only one value for Elon Musk in your results because you ask for statements with best rank only by using the wdt: prefix. This is also in the RDF document. This query list best rank values in United States dollar (Q4917) only. You can replace wd:Q4917 with a variable to display the currency in the results.
#billionaire
SELECT  ?Billionaire ?BillionaireLabel ?numLabel
WHERE
{
  ?Billionaire p:P2218 [a wikibase:BestRank ; psv:P2218 [wikibase:quantityAmount ?num ; wikibase:quantityUnit wd:Q4917]]
  FILTER (?num > 1000000000)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?num)
Try it!
--Dipsacus fullonum (talk) 16:51, 9 February 2022 (UTC)
@Dipsacus fullonum:, thank you very much for the super-speedy response! I will read through the document you've given me as I've had similar types of questions in the past. SashiRolls (talk) 18:28, 9 February 2022 (UTC)

P18-lacking Wikidata items used as Commons depictions

Item commercial art gallery (Q56856618) lacks an image (P18), despite being used as a "depicts" statement value in the Structured Data of this Commons picture.

How to find other occurrences of this?

  • Ideally sorted by popularity (items that are most used as Commons depictions).
  • Ideally SPARQL, but I am OK with command-line tools too.
  • If required by timeouts, splitting should preferably be done by coordinates.

It is the opposite of this query.

Thanks a lot! Syced (talk) 04:27, 8 February 2022 (UTC)

@Syced: This is not easy as there are over 12 millions statements with depicts (P180) in WCQS. Splitting by coordinates isn't easy either. Many Commons pictures doesn't have coordinates, and those with coordinates can be both coordinates of the point of view (P1259) (20 millions pictures) and coordinates of depicted place (P9149) (5 million pictures). Which coordinates do you mean, and what about pictures with depicts (P180) but no coordinates? --Dipsacus fullonum (talk) 13:00, 8 February 2022 (UTC)
@Syced: But presuming you can segment, there are iirc 1.6m distinct depects values on WCQS, and federation (at least as I code it) seems to be able to cope with checking 80-100k items before timeout, so that maybe gives you a measure of task. fwiw, I'm seeing 15k WD items with no image, from 80k distinct WCQS depicts values. (The temptation is to start filling in WD P18's blind, based on an adapted version of this query, a GIGO recipe if ever there were one.)
SELECT ?P180 WITH { SELECT DISTINCT ?P180 WHERE 
{
  ?file wdt:P180 ?P180 .
} limit 80000  } as %i
WHERE
{
  INCLUDE %i
  SERVICE <https://query.wikidata.org/sparql> {
  OPTIONAL {?P180 wdt:P18 ?img . }
  FILTER (!BOUND(?img))
  }
}
Try it!
--Tagishsimon (talk) 14:15, 8 February 2022 (UTC)
Unfortunately, depicts (P180) statements are often automatically imported based on image (P18) but when image (P18) statements are removed (to fix mistakes etc), few people bother to remove the depict statements from Commons images. Vojtěch Dostál (talk) 16:41, 8 February 2022 (UTC)

Thanks all! Tagishsimon's SPARQL is a great start, I actually started using it with success :-) Vojtěch: Sure, this is for one-by-one manual selection, not batch. @Dipsacus fullonum:: P1259 would be OK, P9149 would be OK too :-) Even only items with no coordinates would be interesting, especially if limited to the most "popular" items. Thanks a lot! Syced (talk) 03:03, 10 February 2022 (UTC)

@Syced: Here is query using geographical search in a box. You can change the two specified coordinates for the corners of the box as you like, and also change the coordinate predicate (here coordinates of the point of view (P1259)). Credit: The query is based on query by Tagishsimon above. I noted that one of the first found item, when I ran the query, was a Wikimedia disambiguation page (Jeep Wrangler Unlimited (Q6172932)). So the query not only can give missing P18 values in Wikidata, but also wrong values in Commons.
SELECT ?P180 ?file
WITH
{
  SELECT ?P180 ?file
  WHERE 
  {
    ?file wdt:P180 ?P180 .
    SERVICE wikibase:box
    {
      ?file wdt:P1259 ?location .
      bd:serviceParam wikibase:cornerWest "Point(-121.872777777 37.304166666)"^^geo:wktLiteral .
      bd:serviceParam wikibase:cornerEast "Point(-121.486111111 38.575277777)"^^geo:wktLiteral .
    }
  }
} AS %i
WHERE
{
  INCLUDE %i
  SERVICE <https://query.wikidata.org/sparql>
  {
    OPTIONAL { ?P180 wdt:P18 ?img }
    FILTER (!BOUND(?img))
  }
}
Try it!
--Dipsacus fullonum (talk) 12:23, 10 February 2022 (UTC)


SELECT ?P180 ?l ?d ?file ?image
WITH
{
  SELECT ?P180 ?file
  WHERE 
  {
    ?file wdt:P180 ?P180 .
    SERVICE wikibase:box
    {
      ?file wdt:P1259 ?location .
      bd:serviceParam wikibase:cornerWest "Point(-121.872777777 37.304166666)"^^geo:wktLiteral .
      bd:serviceParam wikibase:cornerEast "Point(-121.486111111 38.575277777)"^^geo:wktLiteral .
    }
  }
} AS %i
WHERE
{
  INCLUDE %i
  SERVICE <https://query.wikidata.org/sparql>
  {
    OPTIONAL { ?P180 wdt:P18 ?img }
    FILTER (!BOUND(?img))
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". ?P180 rdfs:label ?l . ?P180 schema:description ?d }
  }
  ?file schema:url ?image.
}
Try it!

Nice query, above the same with images, labels and descriptions. --- Jura 12:45, 10 February 2022 (UTC)

Conservation status in speciesbox

The Norwegian Biodiversity Information Centre have revised The Norwegian Red Lists (Norwegian Conservation Status List). I fail in my efforts in writing a query for the conservation status listed in no-wiki. I want to list species: Norwegian name, Scientific Name, and the Norwegian Conservation Status if possible, and the article URL. I want to look only into https://no.wikipedia.org/wiki/ and list only species with a Speciesbox no:Mal:Taksoboks and I want listed only those entries containing the Norwegian Conservation Status present, containing a value in the Speciesbox. Example: For the status «VU», the entry in the Speciesbox will be:

| rødlistestatusno={{rødliste|VU|no}}

Please look to no:Granmeis for the result. If it is not possible to list the Norwegian Conservation Status, then a list of species with a Speciesbox is better than nothing. Magnefl (talk) 20:54, 9 February 2022 (UTC)

Hej Magne. Denne søkning vil give sårbare arter med taksoboks på norsk Wikipedia. Du kan udskifte "VU" i søkningen for at få andre statuser.
SELECT ?item ?tittel ?vitenskapelig_navn ?artikkel
WHERE
{
  SERVICE wikibase:mwapi
  {
    bd:serviceParam wikibase:endpoint "no.wikipedia.org" .
    bd:serviceParam wikibase:api "Generator" .
    bd:serviceParam mwapi:generator "search" .
    bd:serviceParam mwapi:gsrsearch 'hastemplate:"Taksoboks" insource:"rødlistestatusno Rødliste VU no"' .
     # Note that inline search ignores "greyspace", so the search will match "rødlistestatusno={{Rødliste|VU|no}}"
    bd:serviceParam mwapi:gsrlimit "max" .
    bd:serviceParam mwapi:gsrnamespace "0" .
    ?tittel wikibase:apiOutput mwapi:title .
    ?item wikibase:apiOutputItem mwapi:item .
  }
  FILTER BOUND(?item)
  OPTIONAL { ?item wdt:P225 ?vitenskapelig_navn }
  ?artikkel schema:about ?item .
  ?artikkel schema:isPartOf <https://no.wikipedia.org/> .
}
Try it!
--Dipsacus fullonum (talk) 22:00, 9 February 2022 (UTC)
Ah, praktfullt! Det er til stor hjelp :) Magnefl (talk) 07:18, 10 February 2022 (UTC)

Instances of people with more than one value for death date

Hello! I'm wondering whether there's a way to search for people with two death date values, in other words conflicting information? Thanks! -- Zanimum (talk) 19:53, 29 January 2022 (UTC)

@Zanimum: This should do it. I've used LIMIT because there are so many humans in Wikidata that if it weren't used, the query would result in Query Timeout Limit Reached, but feel free to change the value for LIMIT.
SELECT ?item ?a ?b {
  ?item wdt:P31 wd:Q5;
        wdt:P570 ?a;
        wdt:P570 ?b.
  FILTER(?a != ?b).
}
LIMIT 1
Try it!
-- Rdrg109 (talk) 22:42, 29 January 2022 (UTC)
@Zanimum: Another way to avoid timeout is to limit the query to deal with just one or a few country of citizenship (P27) at a time. In the below query I have a filter to just France (Q142). I have also limited to date of death (P570) given with at least "day precision".
SELECT ?item ?itemLabel ?a ?b WITH {
SELECT ?item ?a ?b {
  ?item wdt:P27 wd:Q142 . #limit to just people from France to avoid timeout
  ?item wdt:P31 wd:Q5; 
        p:P570 [ wikibase:rank ?ra; psv:P570 [ wikibase:timeValue ?a; wikibase:timePrecision ?tpa ]]; 
        p:P570 [ wikibase:rank ?rb; psv:P570 [ wikibase:timeValue ?b; wikibase:timePrecision ?tpb ]]. 
  FILTER(?a < ?b)
  FILTER(?ra != wikibase:DeprecatedRank && ?rb != wikibase:DeprecatedRank && ?ra = ?rb) 
  FILTER(?tpa>=11) #limit to just dates with at least day-precision
  FILTER(?tpb>=11)
}
} AS %i WHERE {
  INCLUDE %i
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?a) ?itemLabel
Try it!
--Larske (talk) 12:45, 30 January 2022 (UTC)
Maybe you want to limit to best rank (by replacing "wikibase:rank ?ra" with "a wikibase:BestRank")? --- Jura 12:54, 30 January 2022 (UTC)
Thanks Jura, that is much better. In the query above the filtering is not correct. Here is an updated version of the query:
SELECT ?item ?itemLabel ?a ?b WITH {
SELECT ?item ?a ?b {
  ?item wdt:P27 wd:Q142 . #limit to just people from France to avoid timeout
  ?item wdt:P31 wd:Q5; 
        p:P570 [ a wikibase:BestRank; psv:P570 [ wikibase:timeValue ?a; wikibase:timePrecision ?tpa ]]; 
        p:P570 [ a wikibase:BestRank; psv:P570 [ wikibase:timeValue ?b; wikibase:timePrecision ?tpb ]]. 
  FILTER(?a < ?b)
  FILTER(?tpa>=11) #limit to just dates with at least day-precision
  FILTER(?tpb>=11)
}
} AS %i WHERE {
  INCLUDE %i
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?a) ?itemLabel
Try it!
--Larske (talk) 13:38, 30 January 2022 (UTC)
BTW as precision higher than 11 shouldn't exist yet, you could directly use 11. --- Jura 13:47, 30 January 2022 (UTC)

Apologies on my belated reply, but thank you, @Rdrg109:, @Larske:, @Jura1:, I'm very grateful for all of your help! -- Zanimum (talk) 22:05, 12 February 2022 (UTC)

Filtering for Values Within a Particular Class

Hello! I currently have this script to display all humans with medical conditions--

SELECT ?person ?personLabel ?condition ?conditionLabel
WHERE
{
  ?person wdt:P31 wd:Q5.
  ?person wdt:P1050 ?condition.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

--but am hoping to filter it down to only people with sexually transmitted infections. I know there's a FILTER function in SPARQL, but I can't quite figure out how to use it properly, especially when trying to display all values that are a subclass of another value. I'd appreciate any help somebody can offer; thank you very much!

Add
  ?condition wdt:P279* wd:Q12198.
to the query. --Dipsacus fullonum (talk) 07:40, 11 February 2022 (UTC)
If you only want one line per person you may want to group the conditions with GROUP_CONCAT like this (giving 151 lines instead of 154):
SELECT ?person ?personLabel (GROUP_CONCAT(?conditionlabel;SEPARATOR=', ') AS ?conditions) WHERE
{
  ?person wdt:P31 wd:Q5.
  ?person wdt:P1050 ?condition.
  ?condition wdt:P279+ wd:Q12198 .
  ?condition rdfs:label ?conditionlabel . FILTER(LANG(?conditionlabel)='en')
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?person ?personLabel
ORDER BY ?personLabel
Try it!
--Larske (talk) 07:48, 11 February 2022 (UTC)
Thank you so much! Especially for reading my mind and adding advice on the concatenation; that will be incredibly useful for other queries I'm working on. Tjg6ph (talk) 14:59, 11 February 2022 (UTC)

Newer items

I want to find all items created within the last 30 days that have "genre: K-POP" and "instance of: extended play". Lectrician1 (talk) 16:43, 12 February 2022 (UTC)

Please give QID's for K-POP and extended play so we don't have to guess. Nothing obvious with these spellings seems to match. --Dipsacus fullonum (talk) 17:31, 12 February 2022 (UTC)
@Dipsacus fullonum K-pop (Q213665) extended play (Q169930) Lectrician1 (talk) 17:33, 12 February 2022 (UTC)
@Lectrician1: No such items are created in the last 30 days. I changed the limit to 40 days in the query below to prove it works:
SELECT ?item ?itemLabel ?created ?age ?title ?q_number
WHERE
{
  ?item p:P136/ps:P136 wd:Q213665 .
  ?item p:P31/ps:P31 wd:Q169930 .
  BIND (xsd:integer(SUBSTR(STR(?item), 33)) AS ?q_number)
  FILTER (?q_number > 110000000) # QQ110000000 is from 2021-12-08
  BIND (SUBSTR(STR(?item), 32) AS ?title)
  SERVICE wikibase:mwapi
  {
    bd:serviceParam wikibase:endpoint "www.wikidata.org" .
    bd:serviceParam wikibase:api "Generator" .
    bd:serviceParam mwapi:generator "allpages" .
    bd:serviceParam mwapi:gapfrom ?title .
    bd:serviceParam mwapi:gapto ?title .
    bd:serviceParam mwapi:prop "revisions" .
    bd:serviceParam mwapi:rvslots "main" .
    bd:serviceParam mwapi:rvprop "timestamp" .
    bd:serviceParam mwapi:rvdir "newer" .
    bd:serviceParam mwapi:rvlimit "1" .
    ?created wikibase:apiOutput "revisions/rev/@timestamp" .
    bd:serviceParam wikibase:limit "once" .
  }
  BIND (NOW() - xsd:dateTime(?created) AS ?age)
  FILTER (?age <= 40) # Created in the last 40 days
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
ORDER BY ?age
Try it!

--Dipsacus fullonum (talk) 21:47, 12 February 2022 (UTC)

top paths: P171*1..25

(((((((((((((((((((((((((wdt:P171|(wdt:P171/wdt:P171))|((wdt:P171/wdt:P171)/wdt:P171))|(((wdt:P171/wdt:P171)/wdt:P171)/wdt:P171))|(wdt:P171/wdt:P171*4))|(wdt:P171/wdt:P171*5))|(wdt:P171/wdt:P171*6))|(wdt:P171/wdt:P171*7))|(wdt:P171/wdt:P171*8))|(wdt:P171/wdt:P171*9))|(wdt:P171/wdt:P171*10))|(wdt:P171/wdt:P171*11))|(wdt:P171/wdt:P171*12))|(wdt:P171/wdt:P171*13))|(wdt:P171/wdt:P171*14))|(wdt:P171/wdt:P171*15))|(wdt:P171/wdt:P171*16))|(wdt:P171/wdt:P171*17))|(wdt:P171/wdt:P171*18))|(wdt:P171/wdt:P171*19))|(wdt:P171/wdt:P171*20))|(wdt:P171/wdt:P171*21))|(wdt:P171/wdt:P171*22))|(wdt:P171/wdt:P171*23))|(wdt:P171/wdt:P171*24))|(wdt:P171/wdt:P171*25))
Try it!

I found that on wikitech:User:AKhatun/Wikidata_Subgraph_Query_Analysis#Paths_2. It was used 112,475 in sample period.

I'm curious when this is needed. Supposedly it aims to limit wdt:P171+ to 25 level. --- Jura 12:11, 5 February 2022 (UTC)

Urk, that's a very long property chain, too bad blazegraph doesn't support the property{n,m} notation. But I believe a shorter variant that could work is:
wdt:P171/wdt:P171?/wdt:P171?/wdt:P171?/wdt:P171?
That ought to give the same result as wdt:P171{1,5} and avoids much of the repetion.
I used something similar once when listing instance of subclasses of researchers, go far enough down the tree and it will start listing sculpture artists and other nonsense. Infrastruktur (talk) 14:02, 5 February 2022 (UTC)
Also, maybe it is simply intended to prevent timeouts? Infrastruktur (talk) 14:24, 5 February 2022 (UTC)
Or as I recently discovered the software doesn't safeguard against loops, so limiting recursion might be necessary if you run into loops, otherwise the query will never finish. Wikidata:WikiProject Ontology/Problems/Loops Infrastruktur (talk) 20:21, 14 February 2022 (UTC)
@Infrastruktur: Loops in the data should never lead to undefined or infinite results in SPARQL (See section 9.4 Arbitrary Length Path Matching in specification at https://www.w3.org/TR/sparql11-query/). If you are right that the Wikidata SPARQL engine (Blazegraph) doesn't guard against loops, it would be a violation of the SPARQL 1.1 specification, but I don't believe that you are right. Here is an example which could have inifinite loops using the symmetric property spouse (P26):
SELECT ?item ?itemLabel ?spouse_recursive ?spouse_recursiveLabel
WHERE
{
  ?item wdt:P31 wd:Q5 .
  ?item wdt:P39 wd:Q11696 .
  ?item wdt:P26+ ?spouse_recursive .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
Try it!
It runs fine with a finite number of results as it should. Can you give an example to support your claim that some queries will never finish? I have never seen that. --Dipsacus fullonum (talk) 23:13, 14 February 2022 (UTC)
You're right. I made the incorrect assumption that because it's possible to create loops in the class hierarchy that the graph software too would have no checks for this. Thanks for the demo. Infrastruktur (talk) 23:51, 14 February 2022 (UTC)

preferred format for id

Query wise (on current Wikidata Query Service), which format would be the best one? See Wikidata:Property proposal/Finnish real property ID.

The idea is that we store in one format and can request any of the three. --- Jura 15:02, 13 February 2022 (UTC)

I would say choose the format that is most often used. All conversions can be done. The most complicated part is removing or inserting leading zeros. Here is one way to do it:
# Conversion from format1 to format2 and format3
SELECT ?format1 ?format2 ?format3 ?num1 ?num2 ?num3 ?num4
{
  VALUES ?format1 { "91-2-2008-3" "398-1-1100-1" "529-102-11-1" "148-402-27-1" }

  BIND (REPLACE(?format1, "(\\d+)-(\\d+)-(\\d+)-(\\d+)", "$1") AS ?num1)
  BIND (REPLACE(?format1, "(\\d+)-(\\d+)-(\\d+)-(\\d+)", "$2") AS ?num2)
  BIND (REPLACE(?format1, "(\\d+)-(\\d+)-(\\d+)-(\\d+)", "$3") AS ?num3)
  BIND (REPLACE(?format1, "(\\d+)-(\\d+)-(\\d+)-(\\d+)", "$4") AS ?num4)
  
  BIND (CONCAT(
    SUBSTR(CONCAT("00", ?num1), STRLEN(?num1), 3), "-",
    SUBSTR(CONCAT("00", ?num2), STRLEN(?num2), 3), "-",
    SUBSTR(CONCAT("000", ?num3), STRLEN(?num3), 4), "-",
    SUBSTR(CONCAT("000", ?num4), STRLEN(?num4), 4)) AS ?format2)
  
  BIND (CONCAT(
    SUBSTR(CONCAT("00", ?num1), STRLEN(?num1), 3),
    SUBSTR(CONCAT("00", ?num2), STRLEN(?num2), 3), 
    SUBSTR(CONCAT("000", ?num3), STRLEN(?num3), 4), 
    SUBSTR(CONCAT("000", ?num4), STRLEN(?num4), 4)) AS ?format3)
}
Try it!
# Conversion from format2 to format3 and format1
SELECT ?format1 ?format2 ?format3
{
  VALUES ?format2 { "091-002-2008-0003" "398-001-1100-0001" "529-102-0011-0001" "148-402-0027-0001" }

  BIND (REPLACE(?format2, "(\\d+)-(\\d+)-(\\d+)-(\\d+)", "$1$2$3$4") AS ?format3)

  BIND (REPLACE(?format2, "0{0,2}(\\d+)-0{0,2}(\\d+)-0{0,3}(\\d+)-0{0,3}(\\d+)", "$1-$2-$3-$4") AS ?format1)
}
Try it!
# Conversion from format3 to format2 and format1
SELECT ?format1 ?format2 ?format3 ?num1 ?num2 ?num3 ?num4
{
  VALUES ?format3 { "09100220080003" "39800111000001" "52910200110001" "14840200270001" }

  BIND (REPLACE(?format3, "(\\d{3})(\\d{3})(\\d{4})(\\d{4})", "$1-$2-$3-$4") AS ?format2)

  BIND (REPLACE(?format3, "(\\d{3})(\\d{3})(\\d{4})(\\d{4})", "$1") AS ?num1)
  BIND (REPLACE(?format3, "(\\d{3})(\\d{3})(\\d{4})(\\d{4})", "$2") AS ?num2)
  BIND (REPLACE(?format3, "(\\d{3})(\\d{3})(\\d{4})(\\d{4})", "$3") AS ?num3)
  BIND (REPLACE(?format3, "(\\d{3})(\\d{3})(\\d{4})(\\d{4})", "$4") AS ?num4)

  BIND (CONCAT(
    STR(xsd:integer(?num1)), "-",
    STR(xsd:integer(?num2)), "-",
    STR(xsd:integer(?num3)), "-",
    STR(xsd:integer(?num4))) AS ?format1)
}
Try it!
--Dipsacus fullonum (talk) 16:40, 13 February 2022 (UTC)
# Conversion from format2 to format3 and format1
SELECT ?format1 ?format2 ?format3
{
  VALUES ?format2 { "091-002-2008-0003" "398-001-1100-0001" "529-102-0011-0001" "148-402-0027-0001" }
  
  BIND (REPLACE(?format2, "-", "") AS ?format3)

}
Try it!

Thanks for your help. I think that is good advice, it limits conversions. Above a slight shorter one for #2 to #3. --- Jura 17:18, 13 February 2022 (UTC)

{{Autofix}} for Finnish real property ID (P10364) from format1 to format2 might be more complicated, but I suppose we can sort that out if/when it's needed. --- Jura 18:07, 13 February 2022 (UTC)

Get all data of a statement with his ID

Hello,

I am the author of WikibaseIntegrator, a fork of WikidataIntegrator.

In WBI you can speed up your search by querying the SPARQL server and later compare the data, this helps when you repeatedly work on the same property with different values.

The script can also compare the data on the SPARQL server with your own and return whether or not a write is required.

My current query is complex, I'm trying to get as much information as possible, and the SPARQL server doesn't like that (Error 500, timeout).

So I want to split my logic in two: have a first query to get only the ID and value of the statement, and later, if I want more information about this statement (if I want to update it or not), I ask for more information with the ID to the SPARQL server.

Is it possible to get everything the SPARQL server has on a statement through its ID?

Thanks, Myst (talk) 14:30, 6 February 2022 (UTC)

@Myst: By & large, yes. Here's an example using a named subquery, in which the initial query gives us two statement_IDs for Q79160#P1435, and in which the second query unpacks the statement value, and the qualifier and reference properties and values. (In practise this query would call for some aggregation and OPTIONALity.)
SELECT ?propertyLabel ?value ?statement_ID ?statement_ref_ID ?ref_propertyLabel ?ref_value with { SELECT ?statement_ID ?statement_ref_ID
WHERE 
{
  wd:Q79160 p:P1435 ?statement_ID.
  } } as %i
WHERE
{
  INCLUDE %i
  ?statement_ID ?predicate ?value.
  ?property wikibase:qualifier|wikibase:statementProperty ?predicate .
  
  ?statement_ID prov:wasDerivedFrom ?statement_ref_ID.
  ?statement_ref_ID ?ref_predicate ?ref_value .
  ?ref_property wikibase:reference ?ref_predicate .
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } 
} order by ?statement_ID ?statement_ref_ID
Try it!
--Tagishsimon (talk) 14:58, 6 February 2022 (UTC)
@Myst: To get everything on a statement the query will also need to ask for statement rank (deprecated, normal, or preferred), bestRank status (yes or no), all the full values (with additional values like precision, calendar model, geo globe etc. dependent on value type), and all normalized values. That can be voluminous but not very complicated. See mw:Wikibase/Indexing/RDF Dump Format for details. --Dipsacus fullonum (talk) 15:26, 6 February 2022 (UTC)
Hello @Tagishsimon, thank you for the answer (and sorry for my late reply).
I tried your code and it seems good for what I want, my only issue is to reuse only the second part without the "INCLUDE %i" and set directly the statement_ID I want to retrieve. I didn't found how to do this. Myst (talk) 18:06, 18 February 2022 (UTC)
Remove the named subquery and replace "INCLUDE %i" with something like:
VALUES ?statement_ID { wds:Q79160-774EEFBA-E4DF-41D7-8E4A-BC1729AA76CF }
You can have one or any number of statement ID's in the brackets, each with the wds: prefix. --Dipsacus fullonum (talk) 18:45, 18 February 2022 (UTC)
Thank you! Myst (talk) 08:08, 19 February 2022 (UTC)

In search of RINOs

as in Rapper in the name only. I came across Irving Rapper and wonder how many other people are rappers in the name only and by name I mean the wikidata label. It would make for a good example in Wikidata:SPARQL_query_service/queries/examples, too, as queries involving label filters are sparse there. --Frlgin (talk) 18:43, 15 February 2022 (UTC)

A regular query will time out, but using the search API is much more efficient.
SELECT DISTINCT ?item ?label
WHERE {
  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:endpoint "www.wikidata.org";
                    wikibase:api "Generator";
                    mwapi:generator "search";
                    mwapi:gsrsearch "inlabel:Rapper";
                    mwapi:gsrlimit "max".
    ?item wikibase:apiOutputItem mwapi:title.
  }
  ?item wdt:P31 wd:Q5.
  MINUS { ?item wdt:P106 wd:Q2252262. }
  ?item rdfs:label ?label.
  FILTER(LANG(?label) = "en") # Comment out to search non-english labels as well
  FILTER(CONTAINS(LCASE(?label), "rapper"))
}
Try it!
Infrastruktur (talk) 19:25, 15 February 2022 (UTC)

User which created the most Items in a "type of" category

Is it possible to find the Wikidata users, that created the most Items in a category? Or is that out of the scope of what queries can do?

Users that created Items where instance of (P31) is nonprofit organization (Q163740), ordered by amount of created Items in that category. Thanks --Newt713 (talk) 21:28, 19 February 2022 (UTC)

@Newt713: You can get revision data in WDQS, including who created the first revision of any item, by using the MWAPI service. However to do that, you need a separate API call for each item, so it wont be realistic to do it for categories larger than up to around 1000 items. There are 27000+ items with instance of (P31) nonprofit organization (Q163740), that is far too many. You need to choose a smaller category, or find another tool. --Dipsacus fullonum (talk) 22:01, 19 February 2022 (UTC)
Thank you! I'll look into it. --Newt713 (talk) 11:48, 20 February 2022 (UTC)
@Newt713: A week ago I made a query which gets the creation date for select items by using MWAPI (see #Newer items above). It would not be hard to modify it to get the creator instead of the creation date. But it will time out if you run it with many items at once, like e.g. all nonprofit organizations. --Dipsacus fullonum (talk) 12:50, 20 February 2022 (UTC)
@Dipsacus fullonum: That looks good. Perhaps if we add an additional filter by country (P17)? --Newt713 (talk) 12:56, 20 February 2022 (UTC)

Optimization request: African coups and attempts

SELECT ?coup ?coupLabel ?success ?time ?countryLabel WHERE {
  ?coup wdt:P31 ?type .
  filter(?type in (wd:Q45382, wd:Q25906438, wd:Q4460509))
  bind (?type != wd:Q25906438 as ?success) 
  ?coup wdt:P17 ?country.
  ?country wdt:P30 wd:Q15.
  optional {?coup wdt:P585 ?time}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
} order by desc(?time)
Try it!

My query to find coups and attempts in Africa works but is very slow (right now 50 seconds if not in cache). Can someone rewrite this to be faster? Thanks.--37.201.182.206 22:41, 19 February 2022 (UTC)

Mainly, use VALUES, which provides a 'solution sequence' for the named variable, which is JOINed with the rest of the query - in other words, the variable is constrained to values in the VALUES list. Documentation: https://www.w3.org/TR/sparql11-query/#inline-data . FILTERs are always expensive & generally to be avoided.
SELECT ?coup ?coupLabel ?success ?time ?countryLabel WHERE {
  VALUES ?type {wd:Q45382 wd:Q25906438 wd:Q4460509}
  ?coup wdt:P31 ?type .
  bind (?type != wd:Q25906438 as ?success) 
  ?coup wdt:P17 ?country.
  ?country wdt:P30 wd:Q15.
  optional {?coup wdt:P585 ?time}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } 
} order by desc(?time)
Try it!
--Tagishsimon (talk) 23:06, 19 February 2022 (UTC)

Query to count the number of identifiers for taxa

which have the identifier Australian Faunal Directory ID (P6039) That is the query would count all the database identifiers for such an item, and for Enteles vigorsii (Q107221668) would get a count of 9 (counting both iNaturalist ids). MargaretRDonald (talk) 05:17, 16 February 2022 (UTC)

@MargaretRDonald: Here is a first attempt. As there are more than 130,000 objects in the result, an attempt to include the labels in the result will timeout.
SELECT ?item (COUNT(?id) AS ?count_of_external_identifiers) WHERE {
  ?item wdt:P6039 [] .
  OPTIONAL { ?item ?id [] .
  ?p wikibase:directClaim ?id .
  ?p wikibase:propertyType wikibase:ExternalId . }
}
GROUP BY ?item
Try it!
By dividing the result in several chunks as in the following query, experiment with OFFSET and LIMIT, you may be able to find the labels.
SELECT ?item ?itemLabel (COUNT(?id) AS ?count_of_external_identifiers) WITH {
SELECT ?item {
  ?item wdt:P6039 [] .
}
OFFSET 50000
LIMIT 10000
} AS %i WHERE {
  INCLUDE %i
  OPTIONAL { ?item ?id ?value .
  ?p wikibase:directClaim ?id .
  ?p wikibase:propertyType wikibase:ExternalId
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?item ?itemLabel
Try it!
--Larske (talk) 06:59, 16 February 2022 (UTC)
Using OFFSET and LIMIT clauses without also using ORDER BY makes no sense since the results will not be in any defined order. So I propose to add
ORDER BY ?item
in the named subquery %i above.
Also be aware that if some items have more than one best-rank statement with Australian Faunal Directory ID (P6039), then that item's external identifiers will be counted more than once (if the P6039 statements are found in the same chunk), and/or the item will in several chunks.
By the way Blazegraph (the SPARQL engine) comes with a service intended to return only a part (called slice) of the results matching a triple which you can also use instead of the subquery:
SELECT ?item ?itemLabel (COUNT(?id) AS ?count_of_external_identifiers)
WHERE
{
  SERVICE bd:slice
  {
    ?item wdt:P6039 [] .
    bd:serviceParam bd:slice.offset 50000 .
    bd:serviceParam bd:slice.limit 10000 .
  }
  OPTIONAL
  {
    ?item ?id ?value .
    ?p wikibase:directClaim ?id .
    ?p wikibase:propertyType wikibase:ExternalId
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?item ?itemLabel
Try it!
Finally you can speed up the query by only adding the labels after grouping:
SELECT ?item ?itemLabel ?count_of_external_identifiers
WITH
{
  SELECT ?item (COUNT(?id) AS ?count_of_external_identifiers)
  WHERE
  {
    SERVICE bd:slice
    {
      ?item wdt:P6039 [] .
      bd:serviceParam bd:slice.offset 50000 .
      bd:serviceParam bd:slice.limit 10000 .
    }
    OPTIONAL
    {
      ?item ?id ?value .
      ?p wikibase:directClaim ?id .
      ?p wikibase:propertyType wikibase:ExternalId
    }
  }
  GROUP BY ?item
}
AS %without_labels
WHERE
{
  INCLUDE %without_labels
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
--Dipsacus fullonum (talk) 10:57, 16 February 2022 (UTC)
  • It could be easier to use the triple that counts them automatically
?item wikibase:identifiers ?ids
Property_talk:P6039 has a few links to queries for those with the most or the least. --- Jura 15:05, 16 February 2022 (UTC)
@Jura1: Yes, it would be easier using wikibase:identifiers if that gives the count. Unfortunately I don't know that predicate. Where is it documented? There is no mention in the RDF Dump Format document. --Dipsacus fullonum (talk) 16:24, 16 February 2022 (UTC)
I found phab:T144476 "[feature request] add count of external-ids to rdf (Wikidata)" (Closed, Resolved). I wonder why the development people want to use their time to develop new features, and then don't update the RDF documentation so the new features can be used by all. Why make it, and then more or less keep it secret? It simply makes no sense to me. It also have used a lot of resources unnecesary that users for years have manually counted the external identiers because the documentation isn't updated. Aw! --Dipsacus fullonum (talk) 21:18, 16 February 2022 (UTC)
It was somewhat buggy at the beginning with delayed or missing updates of counts. Also, it took forever until it was available for all items. --- Jura 21:27, 16 February 2022 (UTC)
Thank you all. @Larske: @Dipsacus fullonum: @Jura: Amazing work. However, I managed to misspecify what I am wanting. Would it be possible to get not the item but a summary of the items, and break it down so that not all 130,000+ objects are interrogated but just birds (using say ?item wdt:P171* wd:Q5113.) To give a table like the following:

No of- external ids count
1 20000
2 1900
3 1750
....
40 4
Hopefully the full (sub)group can be summarised. Alternatively the table could consist of two lines: number of elements having more than one external id and those having just one. MargaretRDonald (talk) 20:14, 16 February 2022 (UTC)

Try the chart for sitelinks on property talk and change "sitelinks" to "identifiers". --- Jura 21:04, 16 February 2022 (UTC)
Like this:
SELECT ?number_of_external_identifiers (COUNT(DISTINCT ?item) AS ?number_of_bird_objects) WHERE {
  ?item wdt:P171* wd:Q5113 .
  ?item wdt:P6039 [] ; wikibase:identifiers ?number_of_external_identifiers .
}
GROUP BY ?number_of_external_identifiers
ORDER BY ?number_of_external_identifiers
Try it!
--Larske (talk) 01:35, 17 February 2022 (UTC)
I changed COUNT(?item) to COUNT(DISTINCT ?item in the query by Larske above because otherwise items with more than one best-rank P6039 statements will be overcounted. --Dipsacus fullonum (talk) 08:19, 17 February 2022 (UTC)
Thank you so much @Larske:. I have now almost got what I want. However my query on insect (Q1390) timed out (of course).... And I am hoping you might be able to find a solution. Thanks, again. MargaretRDonald (talk) 20:10, 19 February 2022 (UTC)
@MargaretRDonald: To find a solution, you need to tell what you want. What was the query that timed out? --Dipsacus fullonum (talk) 21:40, 19 February 2022 (UTC)
SELECT ?number_of_external_identifiers (COUNT(DISTINCT ?item) AS ?number_of_insect_objects) WHERE {
  ?item wdt:P171* wd:Q1390 .
  ?item wdt:P6039 [] ; wikibase:identifiers ?number_of_external_identifiers .
}
GROUP BY ?number_of_external_identifiers
ORDER BY ?number_of_external_identifiers
Try it!
Thanks, @Dipsacus fullonum:. MargaretRDonald (talk) 00:57, 20 February 2022 (UTC)
@MargaretRDonald: There are so many insect items that it is impossible to make a list of them and then join it with the items with P6039 values, as the SPARQL engine tried. I tried finding the items with P6039 values first and then checking which are insects. That also timed out, but I could split the query into two.
Items with 1 to 7 external identiers:
SELECT ?number_of_external_identifiers (COUNT(DISTINCT ?item) AS ?number_of_insect_objects)
WITH
{
  SELECT ?item ?number_of_external_identifiers
  WHERE
  {
    ?item wdt:P6039 [] .
    ?item wikibase:identifiers ?number_of_external_identifiers .
    hint:Prior hint:rangeSafe true .
    FILTER (?number_of_external_identifiers <= 7 )
  }
}
AS %P6039    
WHERE
{
  INCLUDE %P6039
  ?item wdt:P171* wd:Q1390 .
  hint:Prior hint:gearing "forward" .
}
GROUP BY ?number_of_external_identifiers
ORDER BY ?number_of_external_identifiers
Try it!
Items with more than 7 external identiers:
SELECT ?number_of_external_identifiers (COUNT(DISTINCT ?item) AS ?number_of_insect_objects)
WITH
{
  SELECT ?item ?number_of_external_identifiers
  WHERE
  {
    ?item wdt:P6039 [] .
    ?item wikibase:identifiers ?number_of_external_identifiers .
    hint:Prior hint:rangeSafe true .
    FILTER (?number_of_external_identifiers > 7 )
  }
}
AS %P6039    
WHERE
{
  INCLUDE %P6039
  ?item wdt:P171* wd:Q1390 .
  hint:Prior hint:gearing "forward" .
}
GROUP BY ?number_of_external_identifiers
ORDER BY ?number_of_external_identifiers
Try it!
--Dipsacus fullonum (talk) 09:02, 20 February 2022 (UTC)
Thank, you so much for this, I have now managed to query my data very adequately, thanks to your efforts (and everyone else's). @Dipsacus fullonum:. MargaretRDonald (talk) 19:17, 20 February 2022 (UTC)

URL search strings for all the instances of X missing property Y

Create URLs for a list of items of instance x that is missing property y

The URL is to be a combination of "search formatter URL" (P4354) and the labelname

The idea being to have something reusable that someone interested in filling in a particular id can find entries that currently do not have it.

For example "Instances of" (P31) surname (Q101352) without property "MyHeritage Surname ID" (P5452)

Would create a clickable list like

https://lastnames.myheritage.com/last-name/presley
https://lastnames.myheritage.com/last-name/springsteen
https://lastnames.myheritage.com/last-name/sinatra

Back ache (talk) 04:19, 21 February 2022 (UTC)

@Back ache: You're looking for something like this, I think:
SELECT ?item ?item_label ?check_url WHERE {
  ?item wdt:P31 wd:Q101352 .
  MINUS {?item wdt:P5452 [] } .
  ?item rdfs:label ?item_label FILTER(lang(?item_label) = 'en') .

  wd:P5452 wdt:P1630 ?fmt .
  BIND(IRI(REPLACE(str(?item_label), '(^.*)', ?fmt)) AS ?check_url) .
} LIMIT 50
Try it!

I've limited the number of results shown to 50, as wikidata has a lot of surnames.

I'm a little concerned to discover the number of MyHeritage records with family name = "Family-Name" : [2], but I guess people get called all sorts of things. :-) Jheald (talk) 12:09, 21 February 2022 (UTC)

@Jheald:

I agree hopefully using querys like this, alongside things like Wikidata:Tools/Wikidata for Web they'll create higher quality links, thanks for your help Back ache (talk) 12:44, 21 February 2022 (UTC)
@Back ache: Here's also a slightly tweaked version of the query, which looks at the first million family name (P734) statements it can find, and then lists which surnames are most frequently occurring in that set that don't have a P734 :
SELECT ?count ?name ?name_label ?check_url 

WITH {
  SELECT ?item ?name WHERE {
    ?item wdt:P734 ?name
  } LIMIT 1000000 
} AS %uses

WITH {
  SELECT (COUNT(*) AS ?count) ?name WHERE {
     INCLUDE %uses
  } GROUP BY ?name
  HAVING (?count > 4)
} AS %names

WHERE {
  INCLUDE %names
  MINUS {?name wdt:P5452 [] } .
  ?name rdfs:label ?name_label FILTER(lang(?name_label) = 'en') .

  wd:P5452 wdt:P1630 ?fmt .
  BIND(IRI(REPLACE(str(?name_label), '(^.*)', ?fmt)) AS ?check_url) .
} ORDER BY DESC(?count)
Try it!
-- Jheald (talk) 13:17, 21 February 2022 (UTC)
So if I was trying to a variation for humans without a surname property, how do I only include reuslts with a space (to avoid people with one name) and extract only that which follows the last space to be used in link.

So that the query would exclude the name "Prince" and for "Elvis Presley" only use "Presley" as the varibable in the link


SELECT ?item ?item_label ?check_url WHERE {
  ?item wdt:P31 wd:Q5 .
  MINUS {?item wdt:P734 [] } .
  ?item rdfs:label ?item_label FILTER(lang(?item_label) = 'en') .

  wd:P5452 wdt:P1630 ?fmt .
  BIND(IRI(REPLACE(str(?item_label), '(^.*)', ?fmt)) AS ?check_url) .
  FILTER regex(?item_label, " ", "i")
} LIMIT 50
Try it!

Back ache (talk) 11:27, 23 February 2022 (UTC)

selecting a subset of items linked to property:P7400

hi!LibraryThing page for L.L.Zamenhof has at the top right a link persondata.toolforge.org gnd 118643495 and of course also to wikidata. because the wd item has both an entry for property LibraryThing Author ID and a dewiki entry as a consequence at the persondata.toolserver link from above you can find a link labeled "Eintrag in Librarything".


note: for now (10:06, 23 February 2022 (UTC)) 8,950 P4700 entries are available at wikidata. i hope that the multilingual LibraryThing community will more by the thousends for many of the 2,000,000 LT authors. let's hope "a dream comes trough"


my SPARK skills are very limited. i managed to generate https://w.wiki/4sQJ .


please generate a query selecting

all wd items with property P7400

where

a dewiki is available
a eowiki is available
Property:P8557 OR Q12565 is a value for Property:P463

the output should Show

wd item labeled with Englisch wd item label
LT as label with the P4700 value
eo as label for the eowiki link to Esperanto Wikipedia
de as label for the dewiki link to German Wikipedia

if possible the output should be sorted by Englisch labels of the wd items thanks in advance for all your eforts gangleri aka lery raynhart 10:06, 23 February 2022 (UTC) 88.128.88.112

10:13, 23 February 2022 (UTC) 88.128.88.112

Esperanto wikidata kaj LibraryThing

saluton! https://w.wiki/4sVZ
la rezulto montras aron da pagcoj rilatantajn al nomojn kiujn vi konas, kies nomojn vi devintus scii.
cxiu pagxo entenas Property:P4700. se vi iras al LibraryThing povas esti ke vi ankaux trovas en la supra dekstra parto ligon nomatan persondata.toolforge . estas ligo al la secxilo de la germanlinga vikidio. bv. noti ke tie estas ligo nomata "Eintrag in LibraryThing".
bv. noti ke la persondata.toolforge pagxo listigas ligojn al aliaj personoj menciataj en la germanlingva artikolo en vikipedio.
PETO: bv. pliampleksigi la sercxon tiamsnire ke aliaj e-istoj, interlingvistoj aperu. eblas aldoni Membrojn de la Akademio de Esperanto, de la Lingva komitato, Honorajn membrojn de UEA, UEA membrojn ktp.
antauxdankon gangleri aka lery raynhart 10:59, 23 February 2022 (UTC) 88.128.88.112

Jen serĉo por esperantistoj kun valoro por LibraryThing author ID (P7400):
SELECT ?item ?itemLabel
WHERE
{
  ?item wdt:P31 wd:Q5 . # ?item eastas homo
  ?item wdt:P7400 [] . # ?item havas LibraryThing-valoron
  ?item wdt:P106 wd:Q860918 . # ?item estas esperantisto
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],eo,en". }
}
Try it!
Se vi volis ion alian, bonvolu diri la deziron pli klare. --Dipsacus fullonum (talk) 11:58, 23 February 2022 (UTC)
Question for all: Why is everything after "PETO:" bold? Jeg kan ikke se årsagen. (Demando por ĉiuj: Kial estas ĉio post "PETO:" graseta? Mi ne vidas la kialon.)

aldono: Alaudo · talk @Alaudo , Andy · talk @Andy , Blahma · talk @Blahma , Castelobranco · talk @Castelobranco , Chuck SMITH @Chuck_SMITH , Darkweasel94 · talk @Darkweasel94 , DidiWeidmann · talk @DidiWeidmann , KuboF · talk @KuboF , Marcos · talk @Marcos , Narvalo · talk @Narvalo , PaulP · talk @PaulP , Pino @Pino , Roboto de Marcos @Roboto_de_Marcos , Thomas Guibal @Thomas_Guibal , ThomasPusch · talk @ThomasPusch , Tlustulimu · talk @Tlustulimu , Umbert' · talk @Umbert' , Yekrats · talk @Yekrats , לערי ריינהארט · talk @לערי_ריינהארט , Maha · talk @Maha , Ziko · talk @Ziko
dankon Dipsacus fullonum pro la helpo! 14:20, 23 February 2022 (UTC) 62.54.176.88


grandega laboro! bonvolu krei kroman sercxon kie la rezulto entenas ankaux la LibraryThing ligojn kaj ligojn al enwiki kaj al dewiki se tiaj sitelinks ekzistas.
bv. plilargxigi la demandon sercxsnte memnrojm de UEA, Honorajn membrojn de UEA, de la Akademio de Esperanto, de la Lingva Komitato, pagxojn lihitsjn al Nia diligenta kolegaro, ligitajn al Originala literaturo, denaskajn esperantistpkn, lingvokapablo Esperanto verlintojn en Espersnto, Ido, Volapuko, aliaj planingvoj. antauxdankon! amike reinhardt aka lery raynhart 14:44, 23 February 2022 (UTC) 62.54.176.88

SELECT ?item ?itemLabel ?itemDescription (GROUP_CONCAT(?p31label;SEPARATOR=', ') AS ?type) (YEAR(?p577) AS ?year) ?sitelinks ?sitelink ?article WITH {

SELECT DISTINCT ?item ?p577 ?sitelinks WHERE {
  ?item wdt:P31 wd:Q7725634 .
  ?item wdt:P407 wd:Q652 .
  ?item wdt:P577 ?p577 . hint:Prior hint:rangeSafe true .
  FILTER ("1980-00-00"^^xsd:dateTime <= ?p577 && ?p577 < "2021-00-00"^^xsd:dateTime) # change the year limits if needed
  ?item wikibase:sitelinks ?sitelinks .

} } AS %i WHERE {

INCLUDE %i
?item wdt:P31 [rdfs:label ?p31label] . FILTER(LANG(?p31label)='en')   # change the language code if needed
 OPTIONAL { ?article schema:about ?item ;
 schema:isPartOf <https://en.wikipedia.org/> ; 
 schema:name ?sitelink .
 }
 
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

} GROUP BY ?item ?itemLabel ?itemDescription ?type ?p577 ?sitelinks ?sitelink ?article ORDER BY DESC(?sitelinks) ?itemLabel

What are exactly sitelinks here? Are they the links of other pages on the same entity translated in different languages?

Well, the above query has syntax errors, so there nothing means anything. But it contains the triple ?item wikibase:sitelinks ?sitelinks. The predicate wikibase:sitelinks is used to record how many sitelinks to MediaWikia projects an item has. That may be links to Wikipedia in any language, links to Wikinews in any language, links to Wikisource in any language and several other projects. --Dipsacus fullonum (talk) 11:28, 23 February 2022 (UTC)
How can I get how many links a page is translated to? Like, each item translated into how many languages 2001:B07:AE5:83D6:B08E:CDCF:9E4C:36AE 12:58, 23 February 2022 (UTC)
You cannot. There are no records saying if pages in different languages are translations of each other or independent pages about the item. --Dipsacus fullonum (talk) 13:36, 23 February 2022 (UTC)
So if I have item "The Godfather" I can't see in how many languages that wikipedia page has been translated to? 2001:B07:AE5:83D6:B08E:CDCF:9E4C:36AE 13:56, 23 February 2022 (UTC)
That is correct. There are no records on translations. You can however see in how many languages there is a Wikipedia page about the item. --Dipsacus fullonum (talk) 15:08, 23 February 2022 (UTC)
How can I do it? 2001:B07:AE5:83D6:B08E:CDCF:9E4C:36AE 15:53, 23 February 2022 (UTC)
SELECT ?item (COUNT(?wikipedia_page) AS ?wikipedia_pages)
{
  VALUES ?item { wd:Q47703 }
  ?wikipedia_page schema:about ?item .
  ?wikipedia_page schema:isPartOf/wikibase:wikiGroup "wikipedia" .
}
GROUP BY ?item
Try it!
--Dipsacus fullonum (talk) 19:31, 23 February 2022 (UTC)

Item With Most References

How can I get a list of the items with the most total references? More specifically, the sum of references on every statement of an item. AntisocialRyan (Talk) 16:22, 23 February 2022 (UTC)

@AntisocialRyan: It would take far too long to count the number of references in all items by using SPARQL to avoid timeout. I think that you will have to analyze a database dump to do that. --Dipsacus fullonum (talk) 18:00, 23 February 2022 (UTC)
Could it be narrowed down to a specific instance of? Like movies with the most references? AntisocialRyan (Talk) 19:42, 23 February 2022 (UTC)

Help optimizing query

I need help optimizing this query to take less than timeout.

SELECT (COUNT(DISTINCT ?director_de_cine) AS ?count) WHERE {
  ?director_de_cine (wdt:P106*) wd:Q2526255;
    wdt:P21 wd:Q6581072.
    ?sitelink schema:about ?article;
    schema:isPartOf <https://es.wikipedia.org/>.
}
Try it!

thanks :-) —Ismael Olea (talk) 16:36, 23 February 2022 (UTC)

@Olea: Mainly, your formulation ?sitelink schema:about ?article; does not tie sitelinks to the ?director_de_cine variable, and so WDQS wanders off looking at all sitelinks for all items. I think, too, you might have wanted ?director_de_cine wdt:P106/wdt:P279* wd:Q2526255, (has an occupation of, or an occupation which is a subclass of).
SELECT (COUNT(DISTINCT ?director_de_cine) AS ?count) WHERE {
  ?director_de_cine wdt:P106/wdt:P279* wd:Q2526255;
    wdt:P21 wd:Q6581072.
  ?article schema:about ?director_de_cine ;
  schema:isPartOf <https://es.wikipedia.org/> .
}
Try it!
--Tagishsimon (talk) 17:07, 23 February 2022 (UTC)

How many languages there is a Wikipedia page about Italian writers' works 1980-2021

Hi, I have this query and I wanted to know if it's correct to get how many links (in the sense of translated pages) there are about each entity, which is a literary work of an Italian writer between 1980 and 2021.

SELECT ?item ?itemLabel ?itemDescription (GROUP_CONCAT(?p31label;SEPARATOR=', ') AS ?type) (YEAR(?p577) AS ?year) ?sitelinks ?sitelink ?article WITH { SELECT DISTINCT ?item ?p577 ?sitelinks WHERE {

 ?item wdt:P31 wd:Q7725634 .
 ?item wdt:P407 wd:Q652 .
 ?item wdt:P577 ?p577 . hint:Prior hint:rangeSafe true .
 FILTER ("1980-00-00"^^xsd:dateTime <= ?p577 && ?p577 < "2021-00-00"^^xsd:dateTime) # change the year limits if needed
 ?item wikibase:sitelinks ?sitelinks .

} } AS %i WHERE { INCLUDE %i ?item wdt:P31 [rdfs:label ?p31label] . FILTER(LANG(?p31label)='en') # change the language code if needed

OPTIONAL { ?article schema:about ?item ;
schema:isPartOf <https://en.wikipedia.org/> ; 
schema:name ?sitelink .
}

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } GROUP BY ?item ?itemLabel ?itemDescription ?type ?p577 ?sitelinks ?sitelink ?article ORDER BY DESC(?sitelinks) ?itemLabel

No, that is not correct. The query tells the number of sitelinks to Wikimedia pages that each item has, but that doesn't say anything about translated pages. There is no information about translations, so that cannot be queried. --Dipsacus fullonum (talk) 13:36, 24 February 2022 (UTC)
So this query:
SELECT ?item (COUNT(?wikipedia_page) AS ?wikipedia_pages)
{
VALUES ?item { wd:Q47703 }
?wikipedia_page schema:about ?item .
?wikipedia_page schema:isPartOf/wikibase:wikiGroup "wikipedia" .
}
GROUP BY ?item
Cannot be done for each entity in my query? 137.204.150.17 13:49, 24 February 2022 (UTC)

Video games with most "has part" statements.

AntisocialRyan (Talk) 16:15, 24 February 2022 (UTC)

@AntisocialRyan: Something like:
SELECT ?item ?itemLabel (COUNT(?part) as ?count)
WHERE 
{
  ?item wdt:P31/wdt:P279* wd:Q7889 . 
  ?item wdt:P527 ?part .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } 
} GROUP BY ?item ?itemLabel ORDER BY DESC(?count)
Try it!
--Tagishsimon (talk) 18:49, 24 February 2022 (UTC)

Reality check

I presume I'm doing something wrong with the SPARQL, but can't see it. I'm looking for normal rank statements that have a 'reason for preferred rank' qualifier. On the face of it, some are being found. But when I check the statement in the WD UI, it's set to preferred rank.

SELECT distinct ?item ?itemLabel ?statement ?property ?propertyLabel ?rank
WHERE 
{
  ?statement pq:P7452 []. hint:Prior hint:runFirst true. # statement has a qualifier "reason for preferred rank"
  ?statment wikibase:rank wikibase:NormalRank .          # statement is of normal rank
  ?statment wikibase:rank ?rank.                         # get the rank anyway
  ?item ?predicate ?statement .                          # link the statement to an item
  ?property wikibase:claim ?predicate .                  # ensure the predicate is p:
} limit 100
Try it!

--Tagishsimon (talk) 12:14, 25 February 2022 (UTC)

I've spent hours trying to track down this bug in the past, check the spelling of statement in lines 5 & 6 Piecesofuk (talk) 15:46, 25 February 2022 (UTC)
@Tagishsimon: It a spelling error. You have two different variables ?statement and ?statment.
Thank you both. Obligatory Doh! --Tagishsimon (talk) 17:22, 25 February 2022 (UTC)

Splitting pairs

SELECT ?a ?aLabel ?b ?bLabel
WHERE {
  ?a wdt:P1696 ?b .
  ?b wdt:P1696 ?a .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!

I'm having problems finding a solution to this problem: For every A that is an inverse and complementary property of B, split the pairs and discard half. For instance father/child is a pair so discard child/father. Infrastruktur (talk) 08:44, 26 February 2022 (UTC)

@Infrastruktur: All things considered, this?
SELECT ?a ?aLabel ?b ?bLabel
WHERE {
  ?a wdt:P1696 ?b .
  ?b wdt:P1696 ?a .
  filter (str(?a) < str(?b))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
--Tagishsimon (talk) 11:42, 26 February 2022 (UTC)
Thanks, clever solution. Infrastruktur (talk) 12:07, 26 February 2022 (UTC)

Each work that has won an Italian book prize

I would like to have all the works, authors and year of Italian literary awards/prize winners, divided by award. Is it possible?

It's certainly not all of them, but it's all that we have so far (550 results at the moment):
SELECT DISTINCT ?award ?awardLabel ?year ?winner ?winnerLabel ?work ?workLabel WHERE {
  ?award wdt:P31/wdt:P279* wd:Q378427 .
  ?award wdt:P17 wd:Q38 .
  ?award ^ps:P166 ?statement .
  ?statement ^p:P166 ?winner .
  ?winner wdt:P31 wd:Q5 .
  OPTIONAL{?statement pq:P1686 ?work .} 
  OPTIONAL{?statement pq:P585 ?point_in_time .
           BIND(YEAR(?point_in_time) AS ?year)}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],it,en" .}
  BIND(xsd:integer(SUBSTR(STR(?award),33,99)) AS ?qid)
} ORDER BY ASC(?qid) DESC(?year)
Try it!
As you can see, there is plenty more information to be added!
EDIT: I should note that not all of these prizes are awarded for a particular work, so some gaps will be expected.
--Quesotiotyo (talk) 19:50, 28 February 2022 (UTC)
Here is another version that requires that the literary work has the award received (P166) as property instead of being stated as a qualifier for the award received (P166) property of the author.
SELECT DISTINCT ?awardLabel (YEAR(?p585) AS ?year) ?literary_work ?literary_workLabel ?authorLabel  WITH {
SELECT ?literary_work ?award ?author ?p585 WHERE {
  ?award wdt:P31/wdt:P279* wd:Q378427 .
  ?award wdt:P17 wd:Q38 .
  ?literary_work p:P166 ?p166stm .
  ?p166stm ps:P166 ?award .
  OPTIONAL { ?p166stm pq:P585 ?p585 }
  OPTIONAL { ?literary_work wdt:P50 ?author }
}
} AS %i WHERE {
  INCLUDE %i
  ?literary_work wdt:P31/wdt:P279* wd:Q386724 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "it,en". }
}
ORDER BY ?awardLabel ?year ?STR(?literary_workLabel)
Try it!
--Larske (talk) 22:59, 28 February 2022 (UTC)

Use differents colors

Hi,

I want to have the ?en in one color and ?fr in another in this query. Can someone help me ? (this page really need love...)

#defaultView:Map
select ?item ?coord ?en ?fr where {?item wdt:P17 wd:Q16 ; wdt:P131* wd:Q1965 ; wdt:P625 ?coord . {{?en schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> .} UNION {?fr schema:about ?item ; schema:isPartOf <https://fr.wikipedia.org/> .}}}
Try it!

Simon Villeneuve (talk) 03:06, 28 February 2022 (UTC)

@Simon Villeneuve: Something along these lines, I think. Setting ?rgb is, I think, optional; it'll select colours for you if you don't specify them. #defaultView:Map{"hide":["?rgb","?layer"]} hides the ?rgb and ?layer variable from the mouse-click pop-up. Documentation, such as it is, here: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Wikidata_Query_Help/Result_Views#Map
#defaultView:Map{"hide":["?rgb","?layer"]}
SELECT ?item ?coord ?en ?fr ?layer ?rgb WHERE {
  ?item wdt:P17 wd:Q16;
    (wdt:P131*) wd:Q1965;
    wdt:P625 ?coord.
  {
    {
      ?en schema:about ?item;
        schema:isPartOf <https://en.wikipedia.org/>.
    }
    UNION
    {
      ?fr schema:about ?item;
        schema:isPartOf <https://fr.wikipedia.org/>.
    }
  }
  BIND (if(BOUND(?en) && BOUND(?fr),"ff0000",if(BOUND(?en),"00ff00",if(BOUND(?fr),"0000FF","000000" ))) as ?rgb) # set the dot colour
  BIND (if(BOUND(?en) && BOUND(?fr),"EN & FR",if(BOUND(?en),"EN",if(BOUND(?fr),"FR","NIL" ))) as ?layer)         # set the layer
}
Try it!
Slightly unclear to me why this is not a better query, but you may have your reasons:
#defaultView:Map{"hide":["?rgb","?layer"]}
SELECT ?item ?coord ?en ?fr ?layer ?rgb WHERE {
  ?item wdt:P17 wd:Q16;
    (wdt:P131*) wd:Q1965;
    wdt:P625 ?coord.
  
    OPTIONAL {
      ?en schema:about ?item;
        schema:isPartOf <https://en.wikipedia.org/>.
    }
  OPTIONAL
    {
      ?fr schema:about ?item;
        schema:isPartOf <https://fr.wikipedia.org/>.
    }
  FILTER(BOUND(?en) || BOUND(?fr))
  BIND (if(BOUND(?en) && BOUND(?fr),"ff0000",if(BOUND(?en),"00ff00",if(BOUND(?fr),"0000FF","000000" ))) as ?rgb)
  BIND (if(BOUND(?en) && BOUND(?fr),"EN & FR",if(BOUND(?en),"EN",if(BOUND(?fr),"FR","NIL" ))) as ?layer)
}
Try it!

--Tagishsimon (talk) 09:23, 28 February 2022 (UTC)

@Tagishsimon: Thank you very much !
Yes, the second one is much better !
I saw the "Result_View#Map" page and, as I said, it need "love". I mean by that that there's too less informations about how use the options like "layer", especially the BIND part.
Thank you again. As usual, you find a way to solve my problems and more ! Simon Villeneuve (talk) 12:37, 28 February 2022 (UTC)
Too little information; agreed. That & other pages (*cough* MWAPI documentation *cough*) could do with more worked examples. Still too much of a black art right now. On that second query, you'll also get items with no EN/FR site link by commenting out FILTER(BOUND(?en) || BOUND(?fr)), but you've probably spotted that. Many more of them than items that do have one or other or both sitelinks. --Tagishsimon (talk) 14:55, 28 February 2022 (UTC)