Page MenuHomePhabricator

DatabaseTermIdsAcquirer fails on terms longer than 255 bytes
Closed, ResolvedPublic

Description

When trying to store terms longer than 255 bytes in the new normalized term store (in real-world data, this is typically descriptions in non-Latin scripts), a fail-safe exception is thrown in ReplicaMasterAwareRecordIdsAcquirer: the term text is implicitly truncated by the database (cf. T108255), and then a subsequent select with wbx_text = 'untruncated term text' can’t find it. (If ReplicaMasterAwareRecordIdsAcquirer didn’t detect this case and throw a fail-safe exception, it would continue to attempt acquiring an ID for the same text in an infinite loop.)

We need to fix this somewhere between DatabaseTermIdsAcquirer and ReplicaMasterAwareRecordIdsAcquirer: callers of the TermIdsAcquirer interface shouldn’t be expected to truncate the terms to some store-specific length.

The equivalent problem in wb_terms was discussed in T142691; since the normalized term store is only supposed to be a replacement for wb_terms so far, we’ll continue to truncate the values for now even though that behavior is ultimately considered a bug.

Event Timeline

Change 517070 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Truncate terms if necessary before they reach the database

https://gerrit.wikimedia.org/r/517070

Change 517070 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Truncate terms if necessary before they reach the database

https://gerrit.wikimedia.org/r/517070