User Details
- User Since
- Apr 2 2019, 6:24 PM (296 w, 1 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Igorkim78 [ Global Accounts ]
Feb 7 2022
Nov 3 2020
If you will consider changing collator configuration, note, that collator type should NOT be changed from the default value ICU:
com.bigdata.btree.keys.KeyBuilder.collator=ICU
There are collator type options JDK and ASCII, but both would not be usable, as JDK is basically result in the same comparison as ICU uses, but generate much larger keys; and ASCII just assumes the source text to be ASCII and completely drops Unicode support.
Jan 31 2020
@Aklapper , Thank you! Fixed the commit message.
The issue caused by a combination of Service node producing variable ?coDescription, which is not explicitely defined in the main query, so optimizers assume this variable not bound and do not bother with proper order of the lang function evaluation. Fixing might require reordering optimizers to make wikibase:label produced variables visible to other optimizers, but it kind of tricky because wikibase:label itself depends on results of other optimizers applied at the proper order (as wikibase:label takes a list of variables for processing from the main query).
Jan 16 2020
Performance measured on dump from 20191202: https://dumps.wikimedia.org/wikidatawiki/entities/20191202/
Baseline tIme to load: 4264m29.914s, 714218864640 bytes
Dec 23 2019
The configuration changes for SDC data are as follows. Note that namespace 'sdc' is used to store RDF data in blazegraph journal, might be changed as needed. It is not recommended to keep the namespace the same as for Wikidata (wdq), as it might result in conflicts while deploying the services on shared server (if such configuration will be implemented) and also might result in addressing the wrong namespace in the Blazegraph journal returning improper data for the queries.
- Blazegraph journal config (RWStore.properties)
replace the similar configuration for WDQS (search for com.bigdata.namespace.wdq prefix for the parameters to be replaced):
# Bump up the branching factor for the lexicon indices on the default kb. com.bigdata.namespace.sdc.lex.BLOBS.com.bigdata.btree.BTree.branchingFactor=400 com.bigdata.namespace.sdc.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor=599 com.bigdata.namespace.sdc.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor=300 # Bump up the branching factor for the statement indices on the default kb. com.bigdata.namespace.sdc.spo.JUST.com.bigdata.btree.BTree.branchingFactor=1024 com.bigdata.namespace.sdc.spo.OSP.com.bigdata.btree.BTree.branchingFactor=866 com.bigdata.namespace.sdc.spo.POS.com.bigdata.btree.BTree.branchingFactor=954 com.bigdata.namespace.sdc.spo.SPO.com.bigdata.btree.BTree.branchingFactor=934
Note, that the final configuration should be adjusted for the real production data according to instructions in T232768.
Dec 6 2019
What FS is used to store wikidata.jnl file? And what is underlying physical disk? What is OS exact version?
Blazegraph applies heavy load on disk, so it might be a combination of heavy stress writes and reads, resulting in either overheating of the physical disk, leading to errors, or OS layer bugs in FS or NVMe drivers.
Dec 3 2019
We need statistics on how many triples use bnode as an object:
{code}
select ?p (count(*)as ?cnt) {
?s ?p ?o . filter (isBlank(?o))
}
group by ?p
{code}
and as a subject (if any)
{code}
select ?p (count(*)as ?cnt) {
?s ?p ?o . filter (isBlank(?s))
}
group by ?p
{code}
Nov 19 2019
output of
iostat -x 1
and
sudo iotop
?
What about new logger UPDATED_ENTITY_IDS does it track updated entity IDs? How many per minute/hour?
Nov 18 2019
Thanks! Yes it is Wikidata-Query-Service
Thanks, yes it is Wikidata-Query-Service
Nov 13 2019
Wdqs1006 reports 574.6GiB are reserved for the journal and 544.3GiB are actually used (~5% of space unused).
While Wdqs1005 reports 1037.7GiB are reserved and only 543.5 are actully used (~47% of space unused).
Most of the %FileWaste or reserved for 8K allocators, but %SlotWaste is also higher than usual for 4k (10 times higher than usual), 2k, 64 (3 times), 320 and 768 allocators (2 times).
Oct 23 2019
Added link to the task T236251: Add header returning time millis to first solution similar to TTFB measured in Blazegraph.
The corresponding header X-FIRST-SOLUTION-MILLIS might be very useful while analyzing long-running queries and also comparing queries performance. If the time reported by Blazegraph is significantly less than total time of the query execution, it might be caused by:
- Total result is very large one, and it has consumed much time on serialization/deserialization (that is basically OK situation, if the number of results are large)
- Some connectivity issues, over network and/or inter-process. In this case the metric X-FIRST-SOLUTION-MILLIS will be the same for subsequent calls, but total query time vary over time.
- Query might be very unselective, but additional constraints filter out many potential solutions, so the first solution is computed fast but to collect all the asked results it takes much time. Such queries are subject to analysis and might need fixing in the Blazegraph code or data layout.
Oct 16 2019
The LabelService optimizer was fixed (so it will not throw NPEs) this August, by reusing Blazegraph core utility com.bigdata.rdf.sparql.ast.StaticAnalysis.getVarsFromArguments(BOp) to run an introspection on variables used in filters and other clauses, so LabelService call placement could be properly adjusted, this introspection seems to come into infinite loop over the AST tree. Vars reuse to label aggregation after the original var is a common practice, so, yes it should be fixed. Looking on the workaround to extract referenced vars without catching into the infinite loop.
Oct 9 2019
Oct 7 2019
There is a context param queryTimeout set to 10 minutes in web.xml, which is applied for all Blazegraph servlets. Stas prepared a patch, extending it 10x times, https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/520948/ you might apply it locally (or just edit web.xml file) to resolve your issue, as the change has not been applied to the WDQS master due to this timeout is system-wide and extending it might result in unexpected consumption of resources (this timeout will be also applied to queries, including very heave ones, thus allowing them running much longer before generating timeout).
Sep 30 2019
These characters are indeed mapped to the same term in the DB.
Sep 12 2019
Aug 29 2019
Differences in bnodes might be tolerated with additional replacement. The cleanup stage could be merged with initial sed+sort
Aug 2 2019
Looking at query exetution plans, ProjectionOp for the query with lang() for coDescription got arranged prior to materialization of coDescription, so it (along with its lang) has not got the way to the projection. The reason for such behavior needs some more research. Will update on that.
Jul 1 2019
Fixed optional support and added testcase for that code path.
Service projectedVars actually include both inbound and outbound variables (those which are params for the service and those which are produced by labels lookup. But for the check if service node could be reordered prior to any clauses placed at the bottom of the query, we need to consider only inbound variables, so they would be available for the service call, and all outbound vars available for the latter filters and other clauses.
Jun 25 2019
The idea for the change is to replace runLast hint with more complicated logic. So there are 3 steps:
- first 'most probable optimal' placement to allow for EmptyLabelServiceOptimizer to see the variables to process.
- then EmptyLabelServiceOptimizer adds statement patterns for resolutions.
- and then additional optimizer step rearranges LabelService to the latest possible step before any clauses, which might use the variables bound by LabelService.
May 7 2019
The EmptyLabelServiceOptimizer running optimizeJoinGroup(AST2BOpContext, StaticAnalysis, IBindingSet[], JoinGroupNode) as of current takes projection from StaticAnalisys.getQueryRoot() as parent of JoinGroupNode wrapping statement pattern of the LabelService clause is unavailable.
May 6 2019
Additionally tested configuration option with only Raw records disabled, comparing to original baseline:
Configuration options are assigned in RWStore.properties. Particular options are:
This seems to be optimizers order problem.
CompareBOp executes to check if "Ada"@en equals to ?langLabel several times but the ?langLabel is not bound on all occasions:
while running ASTDeferredIVResolution
while running com.bigdata.rdf.sparql.ast.optimizers.ASTSetValueExpressionsOptimizer
then while running ConditionalRoutingOp for ChunkedRunningQuery
Apr 29 2019
Complete test logs attached
Load performance for the tested configurations on isolated environment (i7-7700HQ, 8 cores 2.8GHz, 32GB RAM, SSD Samsung 960 PRO)
Attached results of the load 100 ttl.gz files with different configurations
- original baseline (commit blazegraph 895a4f3bd003ddb4b1f31257f642ca3616bca79b, rdf 4245b2a5bc0c7d4b369a43ba512b5e537dac07a4)
- reference URIs inlining,
- reference URIs inlining, raw records disabled per T213210
- reference URIs inlining, raw records disabled, INLINE_TEXT_LITERALS for short strings per T213210
Apr 22 2019
Changeset created to support reference URIs inlining:
https://gerrit.wikimedia.org/r/#/c/wikidata/query/blazegraph/+/505642