Page MenuHomePhabricator

Igorkim78 (Igor Kim)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Apr 2 2019, 6:24 PM (296 w, 1 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
Igorkim78 [ Global Accounts ]

Recent Activity

Feb 7 2022

Igorkim78 placed T229655: bad interaction of lang() with wikibase:label up for grabs.
Feb 7 2022, 3:39 PM · Wikidata-Query-Service, Wikidata

Nov 3 2020

Igorkim78 added a comment to T233204: Mixup of unicode characters in Query Service.

If you will consider changing collator configuration, note, that collator type should NOT be changed from the default value ICU:
com.bigdata.btree.keys.KeyBuilder.collator=ICU
There are collator type options JDK and ASCII, but both would not be usable, as JDK is basically result in the same comparison as ICU uses, but generate much larger keys; and ASCII just assumes the source text to be ASCII and completely drops Unicode support.

Nov 3 2020, 5:02 PM · Wikidata, Wikidata-Query-Service

Jan 31 2020

Igorkim78 added a comment to T236663: Create a parallel loader to improve load performance for WDQS / Blazegraph.

@Aklapper , Thank you! Fixed the commit message.

Jan 31 2020, 8:17 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
Igorkim78 added a comment to T236663: Create a parallel loader to improve load performance for WDQS / Blazegraph.

Changeset is https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/532373/

Jan 31 2020, 7:05 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
Igorkim78 added a comment to T229655: bad interaction of lang() with wikibase:label.

The issue caused by a combination of Service node producing variable ?coDescription, which is not explicitely defined in the main query, so optimizers assume this variable not bound and do not bother with proper order of the lang function evaluation. Fixing might require reordering optimizers to make wikibase:label produced variables visible to other optimizers, but it kind of tricky because wikibase:label itself depends on results of other optimizers applied at the proper order (as wikibase:label takes a list of variables for processing from the main query).

Jan 31 2020, 5:04 AM · Wikidata-Query-Service, Wikidata

Jan 16 2020

Igorkim78 added a comment to T236663: Create a parallel loader to improve load performance for WDQS / Blazegraph.

Performance measured on dump from 20191202: https://dumps.wikimedia.org/wikidatawiki/entities/20191202/
Baseline tIme to load: 4264m29.914s, 714218864640 bytes

Jan 16 2020, 9:57 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Dec 23 2019

Igorkim78 added a comment to T237089: Create CQS puppet configs by applying query_service module.

The configuration changes for SDC data are as follows. Note that namespace 'sdc' is used to store RDF data in blazegraph journal, might be changed as needed. It is not recommended to keep the namespace the same as for Wikidata (wdq), as it might result in conflicts while deploying the services on shared server (if such configuration will be implemented) and also might result in addressing the wrong namespace in the Blazegraph journal returning improper data for the queries.

  • Blazegraph journal config (RWStore.properties)

replace the similar configuration for WDQS (search for com.bigdata.namespace.wdq prefix for the parameters to be replaced):

# Bump up the branching factor for the lexicon indices on the default kb.
com.bigdata.namespace.sdc.lex.BLOBS.com.bigdata.btree.BTree.branchingFactor=400
com.bigdata.namespace.sdc.lex.ID2TERM.com.bigdata.btree.BTree.branchingFactor=599
com.bigdata.namespace.sdc.lex.TERM2ID.com.bigdata.btree.BTree.branchingFactor=300
# Bump up the branching factor for the statement indices on the default kb.
com.bigdata.namespace.sdc.spo.JUST.com.bigdata.btree.BTree.branchingFactor=1024
com.bigdata.namespace.sdc.spo.OSP.com.bigdata.btree.BTree.branchingFactor=866
com.bigdata.namespace.sdc.spo.POS.com.bigdata.btree.BTree.branchingFactor=954
com.bigdata.namespace.sdc.spo.SPO.com.bigdata.btree.BTree.branchingFactor=934

Note, that the final configuration should be adjusted for the real production data according to instructions in T232768.

Dec 23 2019, 6:40 PM · Discovery-Search (Current work), Patch-For-Review, Structured-Data-Backlog, Structured Data Engineering, SRE, SDC General, Wikidata

Dec 6 2019

Igorkim78 added a comment to P9750 blazegraph import failure (curl client stalled, all threads waiting).

What FS is used to store wikidata.jnl file? And what is underlying physical disk? What is OS exact version?
Blazegraph applies heavy load on disk, so it might be a combination of heavy stress writes and reads, resulting in either overheating of the physical disk, leading to errors, or OS layer bugs in FS or NVMe drivers.

Dec 6 2019, 3:33 PM

Dec 3 2019

Igorkim78 added a comment to T239414: Investigate how blank nodes are used and synced between wikibase and wdqs.

We need statistics on how many triples use bnode as an object:
{code}
select ?p (count(*)as ?cnt) {

?s ?p ?o .
filter (isBlank(?o))

}
group by ?p
{code}
and as a subject (if any)
{code}
select ?p (count(*)as ?cnt) {

?s ?p ?o .
filter (isBlank(?s))

}
group by ?p
{code}

Dec 3 2019, 3:52 PM · Wikidata-Query-Service, Wikidata

Nov 19 2019

Igorkim78 added a comment to T231411: Test new Updater service.

output of
iostat -x 1
and
sudo iotop
?

Nov 19 2019, 11:35 AM · Patch-For-Review, Discovery-Search (Current work), Performance Issue, Wikidata-Query-Service, Wikidata
Igorkim78 added a comment to T231411: Test new Updater service.

What about new logger UPDATED_ENTITY_IDS does it track updated entity IDs? How many per minute/hour?

Nov 19 2019, 11:32 AM · Patch-For-Review, Discovery-Search (Current work), Performance Issue, Wikidata-Query-Service, Wikidata
Ghuron awarded T212826: Create dedicated Updater service in Blazegraph a Like token.
Nov 19 2019, 3:58 AM · Discovery-Search (Current work), Epic, Performance Issue, Wikidata-Query-Service, Wikidata

Nov 18 2019

Igorkim78 added a subtask for T231411: Test new Updater service: T238555: Create endpoint to extract low level data for a list of entity IDs..
Nov 18 2019, 3:55 PM · Patch-For-Review, Discovery-Search (Current work), Performance Issue, Wikidata-Query-Service, Wikidata
Igorkim78 added a parent task for T238555: Create endpoint to extract low level data for a list of entity IDs.: T231411: Test new Updater service.
Nov 18 2019, 3:55 PM · Wikidata, Wikidata-Query-Service
Igorkim78 added a subtask for T231411: Test new Updater service: T238557: Allow for logging recently updated entities.
Nov 18 2019, 3:54 PM · Patch-For-Review, Discovery-Search (Current work), Performance Issue, Wikidata-Query-Service, Wikidata
Igorkim78 added a parent task for T238557: Allow for logging recently updated entities: T231411: Test new Updater service.
Nov 18 2019, 3:54 PM · Wikidata-Query-Service, Wikidata
Igorkim78 added a project to T238557: Allow for logging recently updated entities: Wikidata-Query-Service.

Thanks! Yes it is Wikidata-Query-Service

Nov 18 2019, 3:53 PM · Wikidata-Query-Service, Wikidata
Igorkim78 added a project to T238555: Create endpoint to extract low level data for a list of entity IDs.: Wikidata-Query-Service.

Thanks, yes it is Wikidata-Query-Service

Nov 18 2019, 3:53 PM · Wikidata, Wikidata-Query-Service
Igorkim78 created T238557: Allow for logging recently updated entities.
Nov 18 2019, 2:36 PM · Wikidata-Query-Service, Wikidata
Igorkim78 created T238555: Create endpoint to extract low level data for a list of entity IDs..
Nov 18 2019, 2:27 PM · Wikidata, Wikidata-Query-Service

Nov 13 2019

Igorkim78 added a comment to T238232: blazegraph journal on wdqs1005 is oversized.

Wdqs1006 reports 574.6GiB are reserved for the journal and 544.3GiB are actually used (~5% of space unused).
While Wdqs1005 reports 1037.7GiB are reserved and only 543.5 are actully used (~47% of space unused).
Most of the %FileWaste or reserved for 8K allocators, but %SlotWaste is also higher than usual for 4k (10 times higher than usual), 2k, 64 (3 times), 320 and 768 allocators (2 times).

Nov 13 2019, 6:43 PM · Wikidata-Query-Service, Wikidata, Discovery-Search (Current work)

Oct 23 2019

Igorkim78 updated the task description for T234968: Measure performance impact of code optimization and/or blazegraph settings on real traffic data.
Oct 23 2019, 4:34 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
Igorkim78 added a comment to T101013: Log Wikidata Query Service queries to the event gate infrastructure.

Added link to the task T236251: Add header returning time millis to first solution similar to TTFB measured in Blazegraph.
The corresponding header X-FIRST-SOLUTION-MILLIS might be very useful while analyzing long-running queries and also comparing queries performance. If the time reported by Blazegraph is significantly less than total time of the query execution, it might be caused by:

  1. Total result is very large one, and it has consumed much time on serialization/deserialization (that is basically OK situation, if the number of results are large)
  2. Some connectivity issues, over network and/or inter-process. In this case the metric X-FIRST-SOLUTION-MILLIS will be the same for subsequent calls, but total query time vary over time.
  3. Query might be very unselective, but additional constraints filter out many potential solutions, so the first solution is computed fast but to collect all the asked results it takes much time. Such queries are subject to analysis and might need fixing in the Blazegraph code or data layout.
Oct 23 2019, 12:49 PM · Discovery-Search (Current work), Analytics, Event-Platform, Wikidata, Wikidata-Query-Service, Discovery-ARCHIVED
Igorkim78 created T236251: Add header returning time millis to first solution similar to TTFB measured in Blazegraph.
Oct 23 2019, 12:31 PM · WDQS-Optimizer

Oct 16 2019

Igorkim78 added a comment to T235540: StackOverflowError when SPARQL query uses same variable name before and after aggregation.

The LabelService optimizer was fixed (so it will not throw NPEs) this August, by reusing Blazegraph core utility com.bigdata.rdf.sparql.ast.StaticAnalysis.getVarsFromArguments(BOp) to run an introspection on variables used in filters and other clauses, so LabelService call placement could be properly adjusted, this introspection seems to come into infinite loop over the AST tree. Vars reuse to label aggregation after the original var is a common practice, so, yes it should be fixed. Looking on the workaround to extract referenced vars without catching into the infinite loop.

Oct 16 2019, 12:34 PM · Wikidata, Wikidata-Query-Service

Oct 9 2019

Igorkim78 claimed T231411: Test new Updater service.
Oct 9 2019, 8:22 AM · Patch-For-Review, Discovery-Search (Current work), Performance Issue, Wikidata-Query-Service, Wikidata

Oct 7 2019

Igorkim78 added a comment to T227365: WDQS/Blazegraph data loading has timeout.

There is a context param queryTimeout set to 10 minutes in web.xml, which is applied for all Blazegraph servlets. Stas prepared a patch, extending it 10x times, https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/520948/ you might apply it locally (or just edit web.xml file) to resolve your issue, as the change has not been applied to the WDQS master due to this timeout is system-wide and extending it might result in unexpected consumption of resources (this timeout will be also applied to queries, including very heave ones, thus allowing them running much longer before generating timeout).

Oct 7 2019, 10:44 AM · Upstream, Wikidata, Wikidata-Query-Service

Sep 30 2019

Igorkim78 added a comment to T233204: Mixup of unicode characters in Query Service.

These characters are indeed mapped to the same term in the DB.

Sep 30 2019, 4:13 PM · Wikidata, Wikidata-Query-Service

Sep 12 2019

Igorkim78 created T232768: Branching factors configuration for Blazegraph instances.
Sep 12 2019, 6:24 PM · WDQS-Optimizer
Igorkim78 created T232739: Requesting access to wmcs beta cluster for igorkim78.
Sep 12 2019, 1:00 PM · Beta-Cluster-Infrastructure, Release-Engineering-Team

Aug 29 2019

Igorkim78 added a comment to T231411: Test new Updater service.

Differences in bnodes might be tolerated with additional replacement. The cleanup stage could be merged with initial sed+sort

Aug 29 2019, 6:46 AM · Patch-For-Review, Discovery-Search (Current work), Performance Issue, Wikidata-Query-Service, Wikidata

Aug 2 2019

Igorkim78 added a comment to T229655: bad interaction of lang() with wikibase:label.

Looking at query exetution plans, ProjectionOp for the query with lang() for coDescription got arranged prior to materialization of coDescription, so it (along with its lang) has not got the way to the projection. The reason for such behavior needs some more research. Will update on that.

Aug 2 2019, 7:20 PM · Wikidata-Query-Service, Wikidata

Jul 1 2019

Igorkim78 added a comment to T175840: Using label service twice in one query results in obscure error message.

Fixed optional support and added testcase for that code path.
Service projectedVars actually include both inbound and outbound variables (those which are params for the service and those which are produced by labels lookup. But for the check if service node could be reordered prior to any clauses placed at the bottom of the query, we need to consider only inbound variables, so they would be available for the service call, and all outbound vars available for the latter filters and other clauses.

Jul 1 2019, 3:46 PM · Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Discovery-ARCHIVED, Wikidata

Jun 25 2019

Igorkim78 added a comment to T175840: Using label service twice in one query results in obscure error message.

The idea for the change is to replace runLast hint with more complicated logic. So there are 3 steps:

  • first 'most probable optimal' placement to allow for EmptyLabelServiceOptimizer to see the variables to process.
  • then EmptyLabelServiceOptimizer adds statement patterns for resolutions.
  • and then additional optimizer step rearranges LabelService to the latest possible step before any clauses, which might use the variables bound by LabelService.
Jun 25 2019, 9:05 PM · Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Discovery-ARCHIVED, Wikidata

May 7 2019

Igorkim78 added a comment to T153353: Blazegraph not properly using labels from sub-queries for filtering (omitting rows), unless they're selected.

The EmptyLabelServiceOptimizer running optimizeJoinGroup(AST2BOpContext, StaticAnalysis, IBindingSet[], JoinGroupNode) as of current takes projection from StaticAnalisys.getQueryRoot() as parent of JoinGroupNode wrapping statement pattern of the LabelService clause is unavailable.

May 7 2019, 9:35 PM · Discovery-Wikidata-Query-Service-Sprint, User-Smalyshev, Upstream, Discovery-ARCHIVED, Wikidata, Wikidata-Query-Service

May 6 2019

Igorkim78 added a comment to T213375: Inline value and reference URIs.

Additionally tested configuration option with only Raw records disabled, comparing to original baseline:

May 6 2019, 4:49 PM · Patch-For-Review, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata
Igorkim78 added a comment to T213375: Inline value and reference URIs.

Configuration options are assigned in RWStore.properties. Particular options are:

May 6 2019, 4:43 PM · Patch-For-Review, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata
Igorkim78 added a comment to T153353: Blazegraph not properly using labels from sub-queries for filtering (omitting rows), unless they're selected.

This seems to be optimizers order problem.
CompareBOp executes to check if "Ada"@en equals to ?langLabel several times but the ?langLabel is not bound on all occasions:
while running ASTDeferredIVResolution
while running com.bigdata.rdf.sparql.ast.optimizers.ASTSetValueExpressionsOptimizer
then while running ConditionalRoutingOp for ChunkedRunningQuery

May 6 2019, 4:36 PM · Discovery-Wikidata-Query-Service-Sprint, User-Smalyshev, Upstream, Discovery-ARCHIVED, Wikidata, Wikidata-Query-Service

Apr 29 2019

Igorkim78 added a comment to T213375: Inline value and reference URIs.

Complete test logs attached

Apr 29 2019, 5:00 PM · Patch-For-Review, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata
Igorkim78 added a comment to T213375: Inline value and reference URIs.

Load performance for the tested configurations on isolated environment (i7-7700HQ, 8 cores 2.8GHz, 32GB RAM, SSD Samsung 960 PRO)

Load performance.png (648×1 px, 38 KB)

Apr 29 2019, 4:50 PM · Patch-For-Review, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata
Igorkim78 added a comment to T213375: Inline value and reference URIs.

Attached results of the load 100 ttl.gz files with different configurations

Apr 29 2019, 4:41 PM · Patch-For-Review, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata

Apr 22 2019

Igorkim78 claimed T213375: Inline value and reference URIs.

Changeset created to support reference URIs inlining:
https://gerrit.wikimedia.org/r/#/c/wikidata/query/blazegraph/+/505642

Apr 22 2019, 4:55 PM · Patch-For-Review, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service, Wikidata