Page MenuHomePhabricator

[Wild west] PoC REST API Search
Open, Needs TriagePublic

Description

There has been multiple feedback requests for building search into the REST API from the community. Search is also one top 3 most used endpoints in the Action API and used in multiple internal tools as well

The idea would be for us to look into exploring what this would entail, would it be best to add search endpoints to the same REST API or build a new dedicated one, would it require rethinking a lot of the architecture, would it require work on other parts of the codebase. Then PoC can then be used to get some initial feedback from the community

Acceptance criteria:

  • We have a running PoC API within 2 sprints that has the following functionality:
    • search by labels and aliases
    • mimic roughly what wbsearchentities does
    • search results should include ID, labels, description, matched term ideally
    • no performance optimization needed
    • able to search in a language specified by the user
    • PoC does not only do prefix search, i.e., not just autocomplete search (if I look for Khan, it should show me Shahrukh Khan as well)
    • Has the ability to do what haswbstatement does as part of Special:Search, either as a separate endpoint or as a single endpoints (if this doesn't get too bloated and complex)
    • build endpoints for properties and EntitySchemas (if there's time)
  • PoC is deployed to beta.wikidata.org so few community members can test it

Event Timeline

Change #1092835 had a related patch set uploaded (by Dima koushha; author: Dima koushha):

[mediawiki/extensions/Wikibase@master] REST: Prood of concept REST API search

https://gerrit.wikimedia.org/r/1092835

Here's a few more things we'd want tested out, in order of priority:

  • Ensure that the PoC is able to search in a language specified by the user
  • PoC does not only do prefix search, i.e., not just autocomplete search (if I look for Khan, it should show me Shahrukh Khan as well)
  • The ability to do what haswbstatement does as part of Special:Search, either as a separate endpoint or as a single endpoints (if this doesn't get too bloated and complex)
    • Make the PoC work for properties and EntitySchemas (if there's time)

[Updated comment on 21.11 after talking to Jakob]

@Jakob_WMDE did y'all discuss anything else in the daily? If not, I'll move these under the acceptance criteria

@Ifrahkhanyaree_WMDE We mainly discussed whether the point about sitelinks in your comment above makes sense, considering how it would bloat the search results (see e.g. https://www.wikidata.org/w/rest.php/wikibase/v1/entities/items/Q42?_fields=sitelinks for what this would contain). If we want shipping it to beta to be part of this task, then that would be good to add, too.

We added the following 4 routes:

  1. GET /search/items and GET /search/properties
    • fulltext search (equivalent to action=query&list=search)
    • uselang query param affects both the language to search in, and the language in which the results are displayed
    • supports haswbstatement etc in the search term
  1. GET /suggest/items and GET /suggest/properties
    • prefix search (equivalent to wbsearchentities)
    • uselang query param affects both the language to search in, and the language in which the results are displayed
    • does not support haswbstatement

Limitations/challenges we've encountered:

  • fulltext search doesn't tell us what was matched - very confusing if e.g. part of a statement or sitelink was matched which then does not show in the search result at all
  • fulltext search doesn't support different values for result language and search language. That's probably ok for most use cases?
  • search for entities by labels/aliases only is only possible with prefix search
  • strictlanguage is not as strict as we expected. It still finds entities with labels matching the search terms in languages other than the one specified, e.g. https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&search=%EA%B0%90%EC%9E%90&language=en&strictlanguage=1&type=item&formatversion=2
  • uselang is a "special" query parameter for MediaWiki - using a different query parameter name (or a different way to programmatically change the language) turned out to be surprisingly difficult without major effort, e.g. EntitySearchHelper::getRankedSearchResults has a language parameter adjusting the language to search *in*, but the results will always come back in a globally defined "user language"