Jump to content

API:Presenting Wikidata knowledge

From mediawiki.org
Revision as of 13:53, 27 July 2024 by MathXplore (talk | contribs) (Reverted edits by 2603:6081:4102:CE00:33D8:E1A9:E0FA:A25E (talk) to last version by Clump: reverting vandalism)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This page shows how to retrieve and present relevant information from Wikidata by associating it with entities in your application.

You can use Wikidata items and properties to provide language-independent information about entities (real-world things) in your application—events, places, people, works of art, concepts, etc. This is more direct and consistent than presenting descriptions and snippets from Wikipedia articles about these things, as API:Page info in search results explains.

Example

[edit]

inventaire.io showing Wikidata information about the book Les Misérables

Inventaire lets you create an inventory of your books and share with others. It displays certain properties from Wikidata about books, such as P407 "original language of work" and P50 "author". To do so, it uses Wikidata's 'Q' IDs internally to identify books. For example, its URL https://inventaire.io/entity/wd:Q180736 shows certain properties from the Wikidata entity http://www.wikidata.org/entity/Q180736 (the book "Les Misérables"). The Wikidata glossary explains entities and properties in more detail.

Recipe

[edit]
  1. Find existing wiki pages in the domain of your application, e.g. creative works, places, events, people, species.
  2. View the Wikidata information for those pages, and choose interesting properties.
  3. Associate Wikidata entity IDs with entities in your application.
  4. Display their Wikidata information in the user's language.
  5. Use the Wikidata "sitelinks" information about the item to provide links to the full Wikipedia article about the entity in the user's language.

Getting Wikidata entity IDs

[edit]

To get an article's entity ID in Wikidata, you can use the following methods:

  • Copy the link "Wikidata item" ('wikibase-dataitem' message key) in the sidebar in most skins. It ends with 'Q'NNNN'.
  • Access the wgWikibaseItemId variable in client-side JavaScript with mw.config.get( 'wgWikibaseItemId' );.
  • Use the API to query the page for the page property wikibase_item.
Here is an example of an API query for the page property wikibase_item:
Result
{
    "batchcomplete": true,
    "query": {
        "pages": [
            {
                "pageid": 61489,
                "ns": 0,
                "title": "Les Misérables",
                "pageprops": {
                    "wikibase_item": "Q180736"
                }
            }
        ]
    }
}

Choosing properties

[edit]

If you view https://www.wikidata.org/wiki/Q180736, you can see the following properties:

  • Some localized information:
    • "label"
    • "description"
    • "aliases" (displayed as "Also known as")
  • Many "Statements" about the item that give values for its properties such as "author" and "publication date"
  • Many "sitelinks" for the item, providing the titles of pages about the item in various Wikipedias and other Wikimedia projects
This diagram of a Wikidata item shows you the most important terms in Wikidata.

Clicking the title of a statement takes you to a page about that property. For example, the "author" property is Property:P50. Property pages in turn have labels, descriptions, aliases, and further statements, much like the Wikidata pages for real-world items.

The set of properties in Wikidata is steadily growing. Not all items in Wikidata have properties, and not all property values have been translated into all languages. For example, Victor Hugo's occupation as an "author" has been translated into nearly all languages, but lesser known occupations may have fewer translations. You need to consider how to fall back to an available language if a property or value isn't translated into a language you are supporting, and you shouldn't build your application around a property that only appears in a few statements. The API performs language fallback for you if possible. (Of course you can help by contributing missing statements and translations to Wikidata.)

Querying Wikibase

[edit]

The extensions Wikibase Repository and Wikibase Client power Wikidata, together with related components. Most Wikimedia sites run Wikibase Client (check with Special:Version), while only wikidata.org itself runs Wikibase Repository. Wikidata Repository implements several modules for MediaWiki's Action API, all prefixed with wb. The main API module in Wikibase Repository is wbgetentities. (See its generated API help). This returns the dataset Wikidata has about items (QNNNNN entities) or properties.

Retrieving and displaying Wikidata information

[edit]

Say you have associated Wikidata entity IDs with your application's entities, and you want to display the following information:

action=wbgetentities can return the same information that you see on an item's Wikidata page: labels, descriptions, aliases, "claims" (like statements), and sitelinks. Let's ask for this information about Les Misérables in a less popular language, Azerbaijani, to see how languagefallback and sitelinks/urls work.

  • You can give wbgetentities page titles on a wiki; but in this scenario, we provide it with the entity ids of the Wikidata items for entities in our application.
  • You can specify the languages you want for the information, and it will only return the description, labels, and aliases in that language (if they are available).
    • You can also specify languagefallback= so that values and properties without a translation in your requested languages fall back to some value.
  • wbgetentities has no means to specify which properties you want, instead you request all claims about the entity. So in this scenario, we request props=labels|descriptions|claims|sitelinks/urls.
  • You can specify a sitefilter for the wiki site links you want. In this scenario we only want the Wikipedia page (if any) on the wiki for the same language.
Example: Request information about entity Q180736
Result
{
    "entities": { 
        "Q180736": { 
            "type": "item",
            "id": "Q180736",
            "labels": { 
                "az": { 
                    "language": "az",
                    "value": "Səfillər"
                }
            },
            "descriptions": { 
                "az": { 
                    "value": "1862 Victor Hugo novel",
                    "language": "en",
                    "for-language": "az"
                }
            },
            "claims": { 
                "P840": [ 
                    {
                        "mainsnak": { 
                            "snaktype": "value",
                            "property": "P840",
                            "datavalue": { 
                                "value": { 
                                    "entity-type": "item",
                                    "numeric-id": 90
                                },
                                "type": "wikibase-entityid"
                            },
                            "datatype": "wikibase-item"
                        },
                       "type": "statement",
                        "id": "Q180736$f42b7321-40a9-758f-3722-72a960687f60",
                        "rank": "normal"
                    },
                    ...
               }   
           },
            "sitelinks": {
                "azwiki": {
                    "site": "azwiki",
                    "title": "Səfillər (roman)",
                    "badges": [],
                    "url": "https://az.wikipedia.org/wiki/S%C9%99fill%C9%99r_(roman)"
                }
            }
        }
    },
    "success": 1
}
}

From the response, you can see the following information:

  • The label for Les Misérables is available in Azerbaijani ("Səfillər").
  • The description "for-language": "az" falls back to the English description.
  • There is a wiki page for it on Azerbaijani Wikipedia: az:Səfillər (roman).
[edit]

The generated API help for action=wbgetentities includes all the possible values for site and sitefilter. (Wikimedia encompasses a lot of wikis!) Visit Special:SiteMatrix for a table listing Wikimedia wikis. The wiki names are pretty standardized except for some edge cases, so it's safe to assume that wiki names from that table that exist and are not struck out (meaning closed) in sitefilter. If you want to, for example, ask for links to Wikiquote sites that may not exist yet, your code can also query the API module action=sitematrix (documentation) and look through its response to dynamically build a list of relevant sites for sitefilter.

Parsing claims

[edit]

The claims that give properties values are unavoidably complex: there can be more than one and they may disagree, they differ in rank, they are (ideally) backed up by references, they may be qualified (for example the date range in which a claim applies).

As a result, for each property value you want, you must walk through an array of claims for it. In this example, for "author" and "genre" of Les Misérables, you would expect the value of a statement about them to be another item in Wikidata (rather than a simple number or date). To get the IDs for the genre (P136), we are looking for the following property (in the syntax for JSON elements used by jq):

.entities.Q180736.claims.P136[].mainsnak.datavalue.value."numeric-id"

In pseudocode, you would locate entities.Q180736.claims.P136 in the JSON response, then for each element in the array, you would check that its mainsnak.datavalue.value['entity-type'] exists and its value is "item", then you can safely access the numeric-id. The result is a set of numbers of items, in this case 8261 and 192239.

You then need to request the labels of items Q8261 and Q192239 in the user's language, making a similar action=wbgetentities request but only requesting props=labels. For performance, you should batch up all these follow-on queries and build a local cache of item labels, so that you don't repeatedly query the Wikidata API to find that Q8261 is a "novel" ("Roman" in Azerbaijani).

Getting the publication date (P577) is a little simpler since the value of a statement about it is a simple date rather than another item:

.entities.Q180736.claims.P577[].mainsnak.datavalue.value.time

In pseudocode, you would locate entities.Q180736.claims.P577 in the JSON response, then for each element in the array, you would check that its mainsnak.datavalue exists and its "type" is "time", then you can use its value. The result is a set of times, in this case one value: "+1862-01-01T00:00:00Z". A time value's format resembles ISO 8601; the Wikibase DataModel gives the details, including datavalue.value.precision which in this case is 9, indicating this publication date is 1862.

action=wbgetclaims for claims alone

[edit]

If you want only the claims of an item (wbgetentities' props=claims), you can instead invoke the API module action=wbgetclaims. It returns similar information.

Example: Get claims about entity Q180736

Alternatives

[edit]

You can associate an entity in your application with a page in a particular language's Wikipedia. Then as Page info in search results shows, you can query for and display useful information from that article such as a lead image thumbnail, opening text, and description (action=query&prop=pageimages|pageterms|extracts, try it for Les Misérables). A downside of this is page titles change so you may have to deal with redirects. Another downside is that it's not multilingual: you have to know the page's title in other wikis (for example, the article in Greek Wikipedia about Les Misérables is Οι Άθλιοι), or track down a "sitelink" to the page in another language. Hence that article talks about page info in the context of search—if your user is searching for articles from a wiki, you know the language and wiki to query.

See also

[edit]
  • qLabel is a JavaScript library to help create multilingual web sites. You can mark up text elements with 'Q' IDs, and the library retrieves their Wikidata labels in the user's language and replaces the text.
  • Reasonator and Autodesc are tools that create machine-generated articles and short descriptions about Wikidata items.
  • Wikidata.org maintains a growing list of external tools.
  • Consult or reuse the code in existing tools to parse claims.
    • For example, inventaire.io uses wikidata-sdk to query Wikidata and handle its responses.