User Details
- User Since
- Mar 12 2015, 12:15 PM (507 w, 6 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Vladimir Alexiev [ Global Accounts ]
Oct 16 2023
I used queries like this to compare counts between wikidata and our wdtruthy service.
- For the first 2 queries we use count(distinct ?x): they have prop path but a small result population
- For the other queries we use count(*) because it's much faster
Oct 5 2023
I see the count triples on recently modified entities:
Oct 3 2023
The full dump is 15B triples (you can see this here https://query.wikidata.org/bigdata/ldf).
WDtruthy is 6.5B triples (we have it in GraphDB, continuously updating).
Adding the counts will add 320M, or 5%.
Sep 27 2023
The following query (from https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization#Distinct_term_scan,_and_group_by_and_count_optimization)
SELECT ?p (COUNT(?p) AS ?count) WHERE { [] ?p [] . } GROUP BY ?p order by desc(?count)
when ran at the WD query service shows the interesting counts:
- wikibase:statements 107093728
- wikibase:identifiers 105929733
- wikibase:sitelinks 105928930
Sep 15 2023
I posted https://github.com/w3c/sparql-dev/issues/192 to collect info about what various SPARQL processors do in such cases.
Blazegraph allows it for certain situations:
- COUNT
- SAMPLE
- identity rebinding
- expression rebinding (not aggregate): but returns no rows
Reading through the spec:
- https://www.w3.org/TR/sparql11-query/#bind: "The variable introduced by the BIND clause must not have been used in the group graph pattern up to the point of use in BIND."
- https://www.w3.org/TR/sparql11-query/#selectExpressions: "The rules of assignment in SELECT expression are the same as for assignment in BIND. The expression combines variable bindings already in the query solution, or defined earlier in the SELECT clause. The variable may be used in an expression later in the same SELECT clause and may not be be assigned again in the same SELECT clause."
- This says you can't "assign" the same var twice in SELECT and that vars are brought forward from BIND, but not explicitly that you can't "reassign" from BIND to SELECT
Nov 24 2022
Sep 21 2022
Jul 26 2022
Thanks everyone and especially @Jheald for the valuable info.
Feb 17 2022
https://github.com/Sophox/sophox/issues/17 is about GeoSPARQL in Sophox, the OpenStreetMap SPARQL endpoint.
It uses Blazegraph (and Wikibase for OSM tags.keys).
Dec 6 2021
Turns out that VIAF pages (from which I copied the values) do have such weird invisible Unicode chars. Printed in hex:
3337 3237 37e2 808f e280 8f
Dec 3 2021
Bart Hanssens tried to validate the WD dump with rdf4j:
https://github.com/barthanssens/rdf4j-bigfile-validator/blob/main/log.txt
'孟慶雲' was not recognised as a language literal, and could not be verified, with language zh-classical
Nov 17 2021
Sep 20 2021
see updated report immediately after fixing several items
Sep 16 2021
@Lucas, congratulations on your command line wizardry! I know "jq" but not
nearly to that extent, and I how did I not know about "units"?
@Tacsipacsi Labels are missing at https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P2088. Maybe because it has a much bigger number of violations?
Sep 14 2021
ok! posted T290961 as a replacement
Here's the goal: a SPARQL query should return all violations of a certain kind, with a possible data lag of a few hours.
So you need:
- a baseline of having processed all items (TODO)
- processing of changed items (DONE)
- periodic processing of every item because constraint definitions or implementations can change globally (TODO?)
Sep 13 2021
@Mohammed_Sadat_WMDE Can you explain why a QS batch that merely deletes (does not create statements with qualifiers) returns errors and has to be restarted 10 or 20 times until all deletes go through? Or is "throttling" the same as "failing"?
Sep 9 2021
@Addshore I think that's a fair description. To add:
- those pages are linked to the Discussion pages of each property, so are perceived as an integral part of WD, eg https://www.wikidata.org/wiki/Property_talk:P245:
- Database reports/Constraint violations/P245: KrBot
- Database reports/Complex constraint violations/P245: not sure who generates
- Database reports/Humans with missing claims/P245: not sure who generates
- And there's a link for each individual constraint
- It also generates useful Type Statistics (eg https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P245#Types_statistics) that allow us to evaluate whether the type constraints are right. Eg the first "No" is https://www.wikidata.org/wiki/Q15632617 "fictional human" (ULAN is only supposed to include real persons and organizations) and I can search for Q15632617 to see whether to remove that type from WD items, or to add it to the ULAN property
(Yes, we mean WMDE.)
Can I make two related requests? Not sure how to post them as separate tasks related to this task, can someone from WMD do that?
Sep 3 2021
@Gehel sorry, I don't see any investigation by WMD.
Yes, I care because that's a very valid use case of extracting and reshaping WD data.
Why does the Select succeed but Construct on the same resultset fail? Construct should be a very cheap operation after the results are found.
I will reopen.
Aug 31 2021
https://news.ycombinator.com/item?id=28283350 is a discussion between @Denny and someone else where he says "There is no difference between 7-7-2000 and 07-07-2000 in xsd".
Feb 11 2021
The following equivalent query works fine:
Jan 14 2021
Final location of discussion: https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Ships#Home/Registry_Port_vs_City/Country
Jan 7 2021
Dec 23 2020
Here are some queries to add the counts: directly to the Item node, and using some namespace ontoRecon:. I show the above counts, plus 3 more.
They would need a lot of memory (group over 90M items) and a lot of time (especially statements).
Not tested yet.
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX schema: <http://schema.org/> PREFIX wikibase: <http://wikiba.se/ontology#>
Nov 25 2020
@ericP Wikidata doesn't use OWL axioms. It uses blanks only for the special values "unknown" and "no value"
Nov 6 2020
Oct 22 2020
Out of 100 results of this query, about a third are such broken "classes" (dump URLs)
select distinct ?c { [] wdt:P31|wdt:P279 ?c } limit 100
Oct 21 2020
T10217 describez other zh langs that need to renamed to conform to official IANA lang tags.
Jul 8 2020
May 28 2020
Use FILTER(?dateOfDeath >= "2020-01-01"^^xsd:date) which is as convenient, but is also a valid literal (note the datatype).
May 27 2020
The reason is overloading of the WD backend, therefore throttling of update rates of any particular user or tool.
I had similar issues with adding 1.5M "Worldcat Identity" identifiers, together with "source: <VIAF ID>".
Mar 19 2020
Examining https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P7704#%22Type_human_(Q5),_musical_ensemble_(Q2088357),_fictional_character_(Q95074),_fictional_profession_(Q17305127),_duo_(Q10648343),_pseudonym_(Q61002),_collective_(Q13473501),_group_of_humans_(Q16334295)%22_violations, the first thing "plasma" did not sound right to me (what has Plasma have to do with culture).
So I looked at https://www.wikidata.org/wiki/Q10251 and sure enough: Europeana entity: agent/base/33720 : broken link.
More details at https://www.wikidata.org/wiki/Property_talk:P7704#Remove_Disambiguation_entities_from_Europeana_Entities: 253+45 to remove.
Furthermore, https://www.wikidata.org/wiki/Property_talk:P7704#Europeana_Entity_class_constraint_test shows more classes to be analyzed.
I cannot post to https://europeana.atlassian.net/browse/EA.
So I made https://europeana.atlassian.net/browse/PRO-60: allow posting issues to more Europeana projects
Their issue trackers were at https://europeanadev.assembla.com/p/projects, in particular https://europeanadev.assembla.com/spaces/europeana-apis
https://europeanadev.assembla.com/spaces/europeana-infrastructure.
But they also have https://europeana.atlassian.net. https://europeana.atlassian.net/browse/RD-16 says at least one of the spaces (R&D) was migrated to Atlassian on 21/Oct/18.
So I looked at https://europeana.atlassian.net/secure/BrowseProjects.jspa and there are several relevant projects:
- https://europeana.atlassian.net/browse/EA: APIs. The project lead is Hugo Manguinhas who I think is also responsible for Europeana Entities
- https://europeana.atlassian.net/browse/IN
- https://europeana.atlassian.net/browse/RD
Mar 9 2020
As of today, a slightly modified query https://w.wiki/K36 works ok (I added "country is Bulgaria")
Mar 2 2020
I've done a lot of work with GLAM data that often includes "unknown" for creator.
Getty ULAN has a whole slew of "unknowns" http://vocab.getty.edu/doc/#ULAN_Hierarchy_and_Classes (note: the counts are several years old, I imagine there are a few more thousands of those now):
- 500355043 Unidentified Named People includes things like "the master of painting X"
- 500125081 Unknown People by Culture includes things like "unknown Egyptian" (to be used in situations like "unknown creator, but Egyptian culture"). We've modeled those as gvp:UnknownPersonConcept and groups (schema:Organization) but users still think of them as "persons".
- Further, there are things like "unknown but from the circle of Rembrandt" or "unknown but copy after Rembrandt" etc, about 20 varieties of them, see
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts/Item_structure#Attribution_Qualifiers and https://www.wikidata.org/wiki/Wikidata:Property_proposal/Attribution_Qualifier
Feb 28 2020
With this query http://yasgui.org/short/ZI1xPxBvI :
Feb 27 2020
http://yasgui.org/short/fLTUDXqOy are 16k portraits of @smithsoniannpg in #RDF, collected by http://americanartcollaborative.org, ready to integrate to @wikidata thru https://wikidata.org/wiki/Property:P6152 and https://wikidata.org/wiki/Property:P4692. Both NPG ID and AAC ID are 2.4k in WD; AAC ID's are total 234k. So there are 14k more portraits to be harvested from AAC and posted on WD #SumOfAllPaintings.
Feb 6 2020
See https://www.wikidata.org/wiki/Property_talk:P7844#regex_and_formatterUrl_exceptions and https://www.wikidata.org/wiki/Property_talk:P7711#Format trying to get culture.fr people into some discussion regarding the identifiers used in their 30 thesauri, that their GINCO system cannot generate sequential IDs, mixes all thesauri into one namespace, and likely the result will be that we'll merge these thesauri on Wikidata.
Jan 11 2020
I think this is your message: ShEx element already has 168 child elements, adding Wikidata annotations might break them and it points to the reason? I guess you give up generating the comments in this case: but why do you think 168 is too many?
The extra elements probably come from an add on (SingleFile comes to mind) but if you add a couple more to those 168, that would be ok?
Jan 10 2020
Same result after relaunch
@Jelabra I see the tooltips at http://wikishape.weso.es/ Schema>Info:
The positioning, size and alignment of the tooltip need to be adjusted a bit.
Jan 9 2020
Color is nice but imho decoding the P and Q numbers is critical. See this comment for details https://phabricator.wikimedia.org/T224962#5789394
Color is nice but imho decoding the P and Q numbers is critical.
https://www.wikidata.org/wiki/User:Zvpunry/EntitySchemaHighlighter.js does that: it linkifies Pnnn and Qnnn and shows a tooltip on hover.
Jan 3 2020
@Salgo60 I could translate to BG but bg - Шаблон:Europeana is missing.
Nov 7 2019
Aug 2 2019
May 30 2019
@Peb will this work in a streaming fashion? I know WD has a 60s timeout but still it can produce some 10-100 megabytes in that time. Will the JS approach be reliable enough ?
It's also important to be able to get Turtle, using all available prefixes, which is a lot more readable than ntriples.
Use construct when you want to get an rdf graph out of the repo, especially when you want the data to be shaped differently or to use ontology terms different from the original
May 21 2019
Apr 28 2019
@GerardM The IANA language tag list has a subfield Script exactly because languages can be written in different scripts.
Neither you nor I could write a single word in Phoenician script. Yet we could write a Phoenician word in English and be able to read it, and pronounce it approximately correctly.
This is valuable, no matter if you acknowledge it or not.
Apr 9 2019
It's fixed!
Apr 4 2019
https://github.com/eclipse/rdf4j/issues/1291 got this answer:
Mar 29 2019
"Unless GerardM thinks it is not useful" ;-)
Mar 28 2019
It does not make an argument go away.
Mar 25 2019
There is no Phoenician in Latin script.
Mar 20 2019
That is false. Gades is a Phoenician word, even if used in an English or Spanish text. The corresponding English word is Cadiz and the Spanish word is Cádiz.
Mar 19 2019
"Gades" is in phn-Latn, that's for sure.