Jump to content

User:PAC2/Signpost Opinion1: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Tags: Mobile edit Mobile web edit Advanced mobile edit
No edit summary
Tags: Mobile edit Mobile web edit Advanced mobile edit
Line 84: Line 84:
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-article-end-v2}}
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-article-end-v2}}
<noinclude>{{Wikipedia:Wikipedia Signpost/Templates/Signpost-article-comments-end||2024-07-22|}}</noinclude>
<noinclude>{{Wikipedia:Wikipedia Signpost/Templates/Signpost-article-comments-end||2024-07-22|}}</noinclude>

== Discussion ==

Revision as of 21:57, 11 November 2024

Are Wikipedia articles representative of Western or world knowledge?

The Signpost



Signpost Opinion1

YOUR ARTICLE'S DESCRIPTIVE TITLE HERE

Optional: write a lede — not necessarily a WP:LEAD. Interesting > encyclopedic.

Wikipedia aims at representing the sum of all knowledge. It is not so easy to define "the sum of all knowledge". We could expect the sum of all knowledge means knowledge from every region in the world (geographical distribution), from every era in History, from every culture, every ethnic group every gender group, etc.

Trying to measure diversity of knowledge in Wikipedia, we can look at diversity of contributors, number of Wikipedia articles, diversity of sources and references[1] or diversity in mentioned entities inside a given article.[2]

In this article, I look at the geographical distribution of people mentioned in an article (people mentioned with a blue link).

I apply my methodology to a selection of articles about general topics such as music, politics, culture or religion in a selection of Wikipedia versions and I discuss the results.

Methodology

Given a Wikipedia article, I select all internal links (blue links) and I call them "mentioned entities". This can be done through the endpoint "links" in the MediaWiki generator API. The magic is that this API can be integrated in a SPARQL query in the Wikidata Query Service. So I combine the call to the API with a Wikidata query. I select all mentioned entities with P31 equal to Q5 (humans) with a known birthplace (P19) and I collect the country of the birthplace with property P17.

SELECT DISTINCT ?item ?itemLabel ?country ?countryLabel ?birthplace
?birthplaceLabel
WHERE {
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:endpoint "en.wikipedia.org";
                     wikibase:api "Generator";
                     mwapi:generator "links";
                     mwapi:titles "Music";.
     ?item wikibase:apiOutputItem mwapi:item.
  } 
  FILTER BOUND (?item)
  ?item wdt:P31 wd:Q5 ; wdt:P19
?birthplace. 
  ?birthplace wdt:P17 ?country . 
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,mul". }
}
Click here to launch the Wikidata query

I then collect a mapping between actual countries and continents. The mapping comes from Wikidata but is consistent with United nations M49 classification.

SELECT DISTINCT ?continent ?continentLabel ?country  ?code WHERE {
VALUES ?continent {
wd:Q55643
wd:Q48
wd:Q15
wd:Q18
wd:Q49
wd:Q46
} 
 ?continent (wdt:P527*) ?country.
  ?country 
    wdt:P2082 ?code.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,mul". }
}
Click here to launch the Wikidata query

I perform a left join of the two data frames using the Arquero JavaScript library[3] in an Observable notebook[4] · [5].

Finally, I regroup Europe and North America as "Western World" and the four other continents as "Rest of the world". This is an opiniated and radical approach but it makes the numbers easier to read.


  1. ^ For instance, Piotr Konieczny and Włodzimierz Lewoniewski look at the number and of articles related to the United States of America and the number of American sources in references. See their presentation at Wikimania 2024: https://prezi.com/view/C7snnAZFWqZz7vPD0kLu/
  2. ^ In a previous Signpost article, I look at the gender distribution of people (human entities) mentioned in Wikipedia articles: Measuring gender diversity in Wikipedia articles, The Signpost, may 2022
  3. ^ Arquero is JavaScript library developed by Jeffrey Heer: https://idl.uw.edu/arquero/api/
  4. ^ Observable is a platform created by Melody Meckfessel and Mike Bostock which proposes to write notebooks in JavaScript. It is widely used by the data visualization community
  5. ^ All computations are performed in the appendix of the notebook : https://observablehq.com/@pac02/wwrw