Page MenuHomePhabricator

Search has outdated label for P12861 (“Shape Expression for class” rather than “EntitySchema for class”)
Closed, ResolvedPublic

Description

The label for P12861 was changed from “Shape Expression for class” to “EntitySchema for class” yesterday evening (UTC), yet the old label still seems to be used wherever search is involved:

image.png (297×740 px, 37 KB)

image.png (443×943 px, 42 KB)

image.png (719×640 px, 76 KB)

action=cirrusdump link; version 2193947176 is indeed an old revision of the page.

I don’t see any P12861-related errors in Logstash that could explain this. Edit: See T369149#9949518 below for the Wikimedia-production-error.

Event Timeline

Could we force a re-index of this page?

mwscript CirrusSearch:ForceSearchIndex wikidatawiki --ids=120965176

And then we might see if it produces any error…

For future reference, the current cirrusDump contents are:

[
  {
    "_index": "wikidatawiki_content_1717167405",
    "_type": "_doc",
    "_id": "120965176",
    "_version": [],
    "_source": {
      "redirect": [],
      "template": [],
      "content_model": "wikibase-property",
      "source_text": "Shape Expression for class\nShape Expression de cette classe\nShape Expression de la clase\nShape Expression that members of a class should conform to\nShape Expressions à laquelle un élément de cette classe devrait se conformer\nShape Expresion de la clase cuyos miembros deben cumplir",
      "wiki": "wikidatawiki",
      "coordinates": [],
      "statement_keywords": [
        "P3254=https://www.wikidata.org/wiki/Wikidata:Property_proposal/Shape_Expression_for_class",
        "P1629=Q29377880",
        "P31=Q18616576",
        "P2302=Q52004125",
        "P2302=Q52004125[P2305=Q29934200]",
        "P2302=Q53869507",
        "P2302=Q53869507[P5314=Q54828448]"
      ],
      "title": "P12861",
      "descriptions": {
        "fr": [
          "Shape Expressions à laquelle un élément de cette classe devrait se conformer"
        ],
        "en": [
          "Shape Expression that members of a class should conform to"
        ],
        "es": [
          "Shape Expresion de la clase cuyos miembros deben cumplir"
        ]
      },
      "version": 2193947176,
      "external_link": [],
      "labels": {
        "fr": [
          "Shape Expression de cette classe"
        ],
        "en": [
          "Shape Expression for class"
        ],
        "es": [
          "Shape Expression de la clase"
        ]
      },
      "page_id": 120965176,
      "namespace_text": "Property",
      "create_timestamp": "2024-07-02T16:25:36Z",
      "statement_count": 5,
      "namespace": 120,
      "text_bytes": 3681,
      "label_count": 3,
      "text": "Shape Expression for class\nShape Expression de cette classe\nShape Expression de la clase\nShape Expression that members of a class should conform to\nShape Expressions à laquelle un élément de cette classe devrait se conformer\nShape Expresion de la clase cuyos miembros deben cumplir",
      "category": [],
      "outgoing_link": [
        "Q18616576",
        "Q29377880",
        "Q29934200",
        "Q52004125",
        "Q53869507",
        "Q54828448",
        "Property:P1629",
        "Property:P2302",
        "Property:P2305",
        "Property:P31",
        "Property:P3254",
        "Property:P5314"
      ],
      "timestamp": "2024-07-02T17:18:37Z"
    }
  }
]

I’ll run the above maintenance script later if nobody objects.

The process is unable to render this document: https://www.wikidata.org/w/api.php?action=query&cbbuilders=content|links&format=json&format=json&formatversion=2&pageids=120965176&prop=cirrusbuilddoc fails with Caught exception of type TypeError:

Argument 1 passed to Wikimedia\Services\ServiceContainer::{closure}() must be an instance of Wikibase\DataModel\Entity\EntityIdValue, instance of EntitySchema\Wikibase\DataValues\EntitySchemaValue given, called in /srv/mediawiki/php-1.43.0-wmf.12/extensions/WikibaseCirrusSearch/src/Fields/StatementsField.php on line 171
	
from /srv/mediawiki/php-1.43.0-wmf.12/extensions/Wikibase/repo/WikibaseRepo.datatypes.php(421)
#0 /srv/mediawiki/php-1.43.0-wmf.12/extensions/WikibaseCirrusSearch/src/Fields/StatementsField.php(171): Wikimedia\Services\ServiceContainer->{closure}(EntitySchema\Wikibase\DataValues\EntitySchemaValue)
#1 /srv/mediawiki/php-1.43.0-wmf.12/extensions/WikibaseCirrusSearch/src/Fields/StatementsField.php(188): Wikibase\Search\Elastic\Fields\StatementsField->getSnakAsPropertyIdAndValue(Wikibase\DataModel\Snak\PropertyValueSnak)
#2 /srv/mediawiki/php-1.43.0-wmf.12/extensions/WikibaseCirrusSearch/src/Fields/StatementsField.php(132): Wikibase\Search\Elastic\Fields\StatementsField->getSnakAsString(Wikibase\DataModel\Snak\PropertyValueSnak)
#3 /srv/mediawiki/php-1.43.0-wmf.12/extensions/Wikibase/repo/includes/Content/EntityHandler.php(715): Wikibase\Search\Elastic\Fields\StatementsField->getFieldData(Wikibase\DataModel\Entity\Property)
#4 /srv/mediawiki/php-1.43.0-wmf.12/extensions/Wikibase/repo/includes/Content/EntityHandler.php(696): Wikibase\Repo\Content\EntityHandler->getContentDataForSearchIndex(Wikibase\Repo\Content\PropertyContent)
#5 /srv/mediawiki/php-1.43.0-wmf.12/extensions/CirrusSearch/includes/BuildDocument/ParserOutputPageProperties.php(92): Wikibase\Repo\Content\EntityHandler->getDataForSearchIndex(WikiPage, MediaWiki\Parser\ParserOutput, CirrusSearch\CirrusSearch, MediaWiki\Revision\RevisionStoreRecord)
#6 /srv/mediawiki/php-1.43.0-wmf.12/includes/libs/objectcache/wancache/WANObjectCache.php(1726): CirrusSearch\BuildDocument\ParserOutputPageProperties->CirrusSearch\BuildDocument\{closure}(boolean, integer, array, NULL, array)
#7 /srv/mediawiki/php-1.43.0-wmf.12/includes/libs/objectcache/wancache/WANObjectCache.php(1556): WANObjectCache->fetchOrRegenerate(string, integer, Closure, array, array)
#8 /srv/mediawiki/php-1.43.0-wmf.12/extensions/CirrusSearch/includes/BuildDocument/ParserOutputPageProperties.php(95): WANObjectCache->getWithSetCallback(string, integer, Closure)
#9 /srv/mediawiki/php-1.43.0-wmf.12/extensions/CirrusSearch/includes/BuildDocument/ParserOutputPageProperties.php(51): CirrusSearch\BuildDocument\ParserOutputPageProperties->finalizeReal(Elastica\Document, WikiPage, CirrusSearch\CirrusSearch, MediaWiki\Revision\RevisionStoreRecord)
#10 /srv/mediawiki/php-1.43.0-wmf.12/extensions/CirrusSearch/includes/BuildDocument/BuildDocument.php(196): CirrusSearch\BuildDocument\ParserOutputPageProperties->finalize(Elastica\Document, MediaWiki\Title\Title, MediaWiki\Revision\RevisionStoreRecord)
#11 /srv/mediawiki/php-1.43.0-wmf.12/extensions/CirrusSearch/includes/Api/QueryBuildDocument.php(106): CirrusSearch\BuildDocument\BuildDocument->finalize(Elastica\Document, boolean, MediaWiki\Revision\RevisionStoreRecord)
#12 /srv/mediawiki/php-1.43.0-wmf.12/includes/api/ApiQuery.php(706): CirrusSearch\Api\QueryBuildDocument->execute()
#13 /srv/mediawiki/php-1.43.0-wmf.12/includes/api/ApiMain.php(1953): ApiQuery->execute()
#14 /srv/mediawiki/php-1.43.0-wmf.12/includes/api/ApiMain.php(929): ApiMain->executeAction()
#15 /srv/mediawiki/php-1.43.0-wmf.12/includes/api/ApiMain.php(900): ApiMain->executeActionWithErrorHandling()
#16 /srv/mediawiki/php-1.43.0-wmf.12/includes/api/ApiEntryPoint.php(158): ApiMain->execute()
#17 /srv/mediawiki/php-1.43.0-wmf.12/includes/MediaWikiEntryPoint.php(200): MediaWiki\Api\ApiEntryPoint->execute()
#18 /srv/mediawiki/php-1.43.0-wmf.12/api.php(44): MediaWiki\MediaWikiEntryPoint->run()
#19 /srv/mediawiki/w/api.php(3): require(string)
#20 {main}

https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-deploy-1-7.0.0-1-2024.07.03?id=6uK0eJABAJJzGk1BtcHT

Seems like \EntitySchema\Wikibase\DataValues\EntitySchemaValue::getType() is returning EntityIdValue::getType() and thus some code are considering it as EntityIdValue ('VT:wikibase-entityid`), here WikibaseCirrusSearch is calling https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/8b3312396b4b8b91790d7b33c4703fb31bd290d8/repo/WikibaseRepo.datatypes.php#421 with an EntitySchemaValue.

I'm not sure what should be done here.

Sounds like our responsibility to fix, at least. Thanks for looking into it!

I don’t see any P12861-related errors in Logstash that could explain this.

(I turns out I would’ve needed to search for the page ID, not the title – the logstash document @dcausse shared above has the URL /w/api.php?action=query&cbbuilders=content|links&format=json&format=json&formatversion=2&pageids=120965176&prop=cirrusbuilddoc.)

Hm, are we just missing a search-index-data-formatter-callback data type definition?

Let me see if I can reproduce this locally…

Alright, if I load the CirrusSearch-related extensions and set all of the following settings, I can reproduce the issue on an item or property with EntitySchema statements:

$wgSearchType = 'CirrusSearch';
$wgWBRepoSettings['searchIndexTypes'] = [ 'entity-schema' ];
$wgWBCSUseCirrus = true;

Edit: I also randomized a cache key to stop the cache interfering:

diff --git a/includes/BuildDocument/ParserOutputPageProperties.php b/includes/BuildDocument/ParserOutputPageProperties.php
index 2773a23238..3a5ccf2eb8 100644
--- a/includes/BuildDocument/ParserOutputPageProperties.php
+++ b/includes/BuildDocument/ParserOutputPageProperties.php
@@ -74,3 +74,4 @@ public function finalizeReal(
 			$page->getTouched(),
-			'v2'
+			'v2',
+			mt_rand()
 		);

Okay, I think we need two things:

EntitySchema needs to register that data type definition, probably like this:

diff --git a/src/Wikibase/Hooks/WikibaseRepoDataTypesHandler.php b/src/Wikibase/Hooks/WikibaseRepoDataTypesHandler.php
index 586640947a..19dfcf8aca 100644
--- a/src/Wikibase/Hooks/WikibaseRepoDataTypesHandler.php
+++ b/src/Wikibase/Hooks/WikibaseRepoDataTypesHandler.php
@@ -109,6 +109,9 @@ public function onWikibaseRepoDataTypes( array &$dataTypeDefinitions ): void {
 			},
 			'parser-factory-callback' => fn () => new EntitySchemaValueParser(),
 			'deserializer-builder' => EntitySchemaValue::class,
+			'search-index-data-formatter-callback' => function ( EntitySchemaValue $value ): string {
+				return $value->getSchemaId();
+			},
 		];
 	}
 }

This will result in the $searchIndexDataFormatters seen by the StatementsField containing not only VT:string, VT:quantity and vt:wikibase-entityid, but also PT:entity-schema. But also, StatementsField will then need to be updated to look for PT:$dataType first before falling back to VT:$dataValueType. I’m pretty sure we must have some code for this in Wikibase already, if I can find it…

Change #1051783 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseCirrusSearch@master] Try looking up search index data formatters by data type

https://gerrit.wikimedia.org/r/1051783

Change #1051784 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/EntitySchema@master] Define custom search-index-data-formatter-callback

https://gerrit.wikimedia.org/r/1051784

Change #1051790 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/Wikibase@master] DataTypeDefinitions: Support $mode in getSearchIndexDataFormatterCallbacks()

https://gerrit.wikimedia.org/r/1051790

Change #1051791 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseLexemeCirrusSearch@master] Pass resolved data type definitions into StatementProviderFieldDefinitions

https://gerrit.wikimedia.org/r/1051791

Change #1051792 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseMediaInfo@master] Pass resolved data type definitions into StatementProviderFieldDefinitions

https://gerrit.wikimedia.org/r/1051792

Change #1051794 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseCirrusSearch@master] Remove compatibility code

https://gerrit.wikimedia.org/r/1051794

Change #1051791 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/extensions/WikibaseLexemeCirrusSearch@master] Pass resolved data type definitions into StatementProviderFieldDefinitions

Reason:

No longer needed with the new approach in Wikibase and WikibaseCirrusSearch.

https://gerrit.wikimedia.org/r/1051791

Change #1051792 abandoned by Lucas Werkmeister (WMDE):

[mediawiki/extensions/WikibaseMediaInfo@master] Pass resolved data type definitions into StatementProviderFieldDefinitions

Reason:

No longer needed with the new approach in Wikibase and WikibaseCirrusSearch.

https://gerrit.wikimedia.org/r/1051792

Change #1051784 merged by jenkins-bot:

[mediawiki/extensions/EntitySchema@master] Define custom search-index-data-formatter-callback

https://gerrit.wikimedia.org/r/1051784

Change #1051783 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@master] Try looking up search index data formatters by data type

https://gerrit.wikimedia.org/r/1051783

Change #1051790 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Defs::getSearchIndexDataFormatterCallbacks(): Support $mode

https://gerrit.wikimedia.org/r/1051790

Change #1051794 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@master] Remove compatibility code

https://gerrit.wikimedia.org/r/1051794

Change #1052269 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/EntitySchema@wmf/1.43.0-wmf.12] Define custom search-index-data-formatter-callback

https://gerrit.wikimedia.org/r/1052269

Change #1052270 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseCirrusSearch@wmf/1.43.0-wmf.12] Try looking up search index data formatters by data type

https://gerrit.wikimedia.org/r/1052270

Change #1052269 merged by jenkins-bot:

[mediawiki/extensions/EntitySchema@wmf/1.43.0-wmf.12] Define custom search-index-data-formatter-callback

https://gerrit.wikimedia.org/r/1052269

Change #1052270 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@wmf/1.43.0-wmf.12] Try looking up search index data formatters by data type

https://gerrit.wikimedia.org/r/1052270

Mentioned in SAL (#wikimedia-operations) [2024-07-05T10:20:17Z] <logmsgbot> lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [[gerrit:1052269|Define custom search-index-data-formatter-callback (T369149)]], [[gerrit:1052270|Try looking up search index data formatters by data type (T369149)]]

Mentioned in SAL (#wikimedia-operations) [2024-07-05T10:22:46Z] <logmsgbot> lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for [[gerrit:1052269|Define custom search-index-data-formatter-callback (T369149)]], [[gerrit:1052270|Try looking up search index data formatters by data type (T369149)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

While testing the deployment ^, I noticed that this doesn’t actually affect entities with EntitySchema-type statements (e.g. Q5); it only affects entities with other statements (e.g. Item-type statements, though probably some other types too) that had an EntitySchema qualifier. Which in practice was probably only P12861 (until I reproduced the situation on the sandbox item to verify the fix).

I think this is actually a hint that we ought to add entity-schema to the $wgWBRepoSettings['searchIndexTypes'] in the production config, so that EntitySchema statements are indexed? @Lydia_Pintscher or @Arian_Bozorg: I assume we want users to be able to search for haswbstatement:P12861=E10?

Mentioned in SAL (#wikimedia-operations) [2024-07-05T10:41:39Z] <logmsgbot> lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [[gerrit:1052269|Define custom search-index-data-formatter-callback (T369149)]], [[gerrit:1052270|Try looking up search index data formatters by data type (T369149)]] (duration: 21m 22s)

The fix should be deployed now; @dcausse do we need to manually trigger a re-indexing of the affected pages (only this property, really) or is it going to happen automatically?

Mentioned in SAL (#wikimedia-operations) [2024-07-05T11:53:26Z] <dcausse> T369149: re-indexed wikidata P12861 (cirrus_rerender.rerender --wiki wikidatawiki allpages --namespace 120 --from-title P12861 --to-title P12861)

@Lucas_Werkmeister_WMDE thanks for the fix! I manually re-indexed this item with our new (WIP) tooling, it would have been fixed automatically by the cleanup process but it would have taken up to 2weeks to discover in the worst case.

Alright, thanks! Searching for the property seems to work now \o/

Arian_Bozorg claimed this task.

Looks like it's working!

@Arian_Bozorg: I assume we want users to be able to search for haswbstatement:P12861=E10?