Page MenuHomePhabricator

Investigation: Display Search Suggestion for Lexemes, Entity Schemas and Properties
Closed, ResolvedPublic1 Estimated Story Points

Description

Currently, the type-ahead search in the Wikidata header for Items produces results by searching into:

  • QIDs
  • labels (in all languages)
  • aliases (in all languages)
  • descriptions

Currently there is no type-ahead search available for Lexemes, Entity Schemas and Properties.

This investigation is to see if type-ahead search for Lexemes, Entity Schemas and Properties is possible and what is needed to make it possible.

We would like to have the following information displayed in the suggestions for the following entity types:

Lexemes

  • lemma
  • language (displayed in user-interface language)
  • lexical category (displayed in user-interface language)

EntitySchemas (displayed in user-interface language)

  • labels
  • aliases
  • descriptions

Properties (displayed in user-interface language)

  • labels
  • aliases
  • descriptions

Screenshots / Mock-up

Existing for Items

image.png (329×500 px, 42 KB)

Following the same format as Items, mock up for Lexemes:
Lemma
Language, lexical category

Frame 1.png (443×460 px, 10 KB)

Notes
This investigation is in preparation for new search functionality to include Lexemes, Entity Schemas and Properties T321543
This investigation would also be valuable as a way to make the Search Results page more legible for EntitySchemas: example

For Lexemes we are creating an automated description using "Language, lexical category"

Event Timeline

karapayneWMDE renamed this task from Investigation: Search for Lexemes, Entity Schemas and Properties to Investigation: Display Search Suggestion for Lexemes, Entity Schemas and Properties.Feb 6 2023, 10:41 AM
karapayneWMDE set the point value for this task to 1.Feb 7 2023, 10:28 AM

That autocomplete uses the wbsearchentities API endpoint. As it was recently decided* that EntitySchema should not become Entities, they will not be available here without major changes.

‍* FTR: I favored a different decision that would have made this much simpler.

For Properties and Lexemes: Individually, these Entity types are already supported by the API:

However, the wbsearchentities API currently requires exactly one type parameter (defaulting to item). Also, we need to figure out if CirrusSearch/Elastic is meaningfully able to search across entity types.

One possible approach could be to enable autocomplete / typeahead for other entity types based on a prefix:

  • no prefix -> autocomplete for Items
  • prefix P: -> autocomplete for Properties
  • prefix L: -> autocomplete for Lexemes

With this approach, it would likely only be a moderate amount of work to also make it work for EntitySchema, probably with a different API endpoint.

One possible approach could be to enable autocomplete / typeahead for other entity types based on a prefix:

  • no prefix -> autocomplete for Items
  • prefix P: -> autocomplete for Properties
  • prefix L: -> autocomplete for Lexemes

With this approach, it would likely only be a moderate amount of work to also make it work for EntitySchema, probably with a different API endpoint.

The underlying idea could also work when presenting the search results of the different Entity types / namespaces in separate lists below / next to each other. This would then be mainly a UX challenge.

That being said, if we want to show them in one single feed of suggestions, then this probably will need some dedicated work in CirrusSearch/Elastic. See for example the difference in results by just selecting the additional Lexeme namespace:

The CirrusSearch search callbacks for Items, Properties and Lexemes can be found at

However, these are all specific to their respective entity type. I have yet to understand why Special:Search shows very different results depending on which namespace is included.

(timebox used so far: ~3:15)

Yeah when I discussed this with Stas from the Search team ages ago he basically said we can forget any meaningful ranking across entity types. So the designs so far all revolve around specifying the entity type or grouping by entity type. I think this is also better from a user PoV because the fact that there are very different types of entities is more obvious.

Yeah when I discussed this with Stas from the Search team ages ago he basically said we can forget any meaningful ranking across entity types. [...]

Well, we do have meaningful ranking across Properties and Items on Special:Search, as shown above. However, I agree that this is probably tricky to achieve, and, currently, it breaks down when adding the Lexeme namespace.

I spontaneously joined the Search Team Office Hour today and asked about that and got a few pointers. (Thanks!)

But, as said above, getting this right across a diverse domain of namespaces is probably a pretty intricate challenge and maybe not the best thing to do right now.

[...] So the designs so far all revolve around specifying the entity type or grouping by entity type. I think this is also better from a user PoV because the fact that there are very different types of entities is more obvious.

Ok, so if we split this by type, then it should only need minor adjustments for Properties and Lexemes (I think a search with wbsearchentities for Lexemes currently only displays one Lemma, and we might want to see all of them).

For EntitySchema, I think, we would need to create a new API endpoint, and there is already a feature request for that: T304070: API Endpoint to search for Schemas

(total timebox used so far: ~5:00)

Thanks so much for looking into this Michael :)