Wikidata talk:WikiCite/Archive 6

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Wikisource integration project, requesting assistance for documentation + implementation

Last year, we (User:KCVelaga, on behalf of Open Heritage Foundation) have been funded through the WikiCite program to improve the integration between Wikisource and Wikidata. The project has three development components, a module, a bot and a tool. You can read about each of them in detail in this document. I am requesting help regarding the documentation of the technical development.

To get started, we have completed the development and testing of two modules on beta-Wikisource (https://en.wikisource.beta.wmflabs.org/wiki/Module:Index_data and https://en.wikisource.beta.wmflabs.org/wiki/Module:Index_template). While we are supporting and overseeing the deployment of these on a couple of mainstream Wikisources, it would be good if other Wikisource communities can deploy these themselves. Though we can support with fixes in the long-term, but not step-by-step implementation. To deploy these modules, people will primarily need to do translations and deciding whether or not to display certain properties (by commenting out lines of code). We have attempted to document which lines to be translated at, https://github.com/tshrinivasan/wikisource_wikidata_integration/blob/main/translate-examples.txt, but it didn't turn out quite well, as none of us has experience in technical documentation.

So we are requesting guidance on how we can develop documentation in the best way possible, especially in that is friendly for non-tech folks. A meeting would be helpful to get to communicate the needs more elaborately and to understand documentation practices.

Thank you, message posted by LWyatt (WMF) (talk) on behalf of User:KCVelaga. 11:22, 11 May 2021 (UTC)

@LWyatt (WMF), KCVelaga: While we are supporting and overseeing the deployment of these on a couple of mainstream Wikisources: does this include enWS, or do we need to do this outselves. I'm happy to do so, but I don't want to get in the way. Inductiveload (talk) 15:49, 11 May 2021 (UTC)

@Inductiveload: Thanks for asking. We are currently working with Indic languages and would like to develop documentation for others to deploy themselves. The project is set to conclude in the next few weeks and unfortunately, we don't have enough human resources to provide active support to more than three wikis. However, we can help with queries and fixes even after the project ends as well. Regarding enWS, it should fairly easy to deploy as there wouldn't be anything to translate, and we are happy to support the deployment. But the key step is to achieve community consensus; deploying this module and a bot that will be coming along, will literally affect all Index pages on a Wikisource, so the community should reach an agreement. If someone can start a discussion as soon as possible, we would be happy to answer any queries that might come up during the discussion. KCVelaga (talk) 04:55, 12 May 2021 (UTC)

Maybe Help:Import BLKÖ from wikisource can help you. --- Jura 06:47, 12 May 2021 (UTC)

Which sources belong to Wikidata?

Hi, all. I've been developing Cita, an addon for Zotero that adds citations metadata support using cites work (P2860) information from Wikidata. In cases where users find a citation that is not available in Wikidata, Cita lets them upload this cites work (P2860) relationship. In cases where either the citing or the cited item do not exist in Wikidata, Cita lets them create a new Wikidata item for them.

I have several layers in place to make sure users do not create duplicates. On the other hand, regarding notability, I simply ask them to make sure they follow the Wikidata notability policy. However, I checked the notability policy, and I couldn't find anything that would help a user decide whether their article, book, etc belongs to Wikidata or not. For example, a Cita user has recently created some Wikidata items for newspaper articles that the items in her Zotero library seem to cite. I have the feeling that they don't belong to Wikidata, but I couldn't find a policy or guideline I can refer to.

Do you have any criteria in place to decide whether a source belongs to Wikidata or not?

I was wondering whether having a unique identifier (be it a DOI or ISBN, or a local library catalogue ID) could be one such criterion. Which led me to another question about mapping between Wikidata properties and Zotero fields, which I'll open a separate topic for :)

Thank you! --Diegodlh (talk) 15:12, 31 May 2021 (UTC)

Here's the topic about mapping between Wikidata properties and Zotero fields/item types. I'd appreciate your comments there as well! Thanks --Diegodlh (talk) 16:20, 31 May 2021 (UTC)

Do we have a way to tag articles on Wikidata that are meta-analysises?

I think it would be interesting to query for the all meta-analysises on a given topic. Do we currently have any way to mark the relevant articles on Wikidata? ChristianKl ❪✉❫ 09:33, 28 June 2021 (UTC)

@ChristianKl I've not seen it done previously, but perhaps describes a project that uses (P4510) of meta-analysis (Q815382) or even just instance of (P31) of Q815382. T.Shafee(evo&evo) (talk) 23:35, 28 June 2021 (UTC)

modeling replies/responses to a publication

I'm aware of reply to (P2675), but is there a reciprocal property for "has reply"? In particular, the journal article Pathophysiological Basis and Rationale for Early Outpatient Treatment of SARS-CoV-2 (COVID-19) Infection (Q98282795) has replies from at least two subsequent letters: Unproven Therapy Algorithms for Early SARS-CoV-2 Infection Are Dangerous (Q107652690) and Concerning Pathophysiology and Justifying Clinical Trials (Q107652835) (and each of these have their own replies in turm). P2675 on the reply letters links them to the original article item (Q98282795), but I can't find a good way to note the existence of replies on Q98282795 itself. Existing properties like corrigendum / erratum (P2507) and followed by (P156) don't seem adequate. Ideas? -Animalparty (talk) 21:02, 26 July 2021 (UTC)

I agree that followed by (P156) will get confusing, since it's also used to link preprints to their published versions where such items exist. A work-around might be using a qualifier like has characteristic (P1552) = response message (Q57268247) to distinguish, but the added complexity leaves ample room for errors. However overall, I think that a reciprocal to reply to (P2675) would be a worthwhile new property. T.Shafee(evo&evo) (talk) 23:53, 26 July 2021 (UTC)

Call for Advisory Board members for Web2Cit

Hello everyone!

Web2Cit: Visual Editor for Citoid Web Translators project is moving!

With Diegodlh we are inviting people to apply to be an Advisory Board member. Is this you? Is this someone you know?

Check the Call for members and apply to be an Advisory Board member before August 6th!

If you are too busy this time around to apply, don't worry: we get it. You can also help us by spreading the word! We sincerely appreciate it. --Scann (talk) 19:18, 28 July 2021 (UTC)

Where is Source MetaData getting main subject (P921)?

What is the source of the subject for articles created with Source MD? I'm seeing quite a few with absolutely wrong subjects. Some articles are created with articles as their subject, articles from 1905 in the case of Observations of large-scale coherent structures in gravity currents: implications for flow dynamics (Q108151063). What source of data suggests this or causes this to be a subject for the articles created through Source MD? Trilotat (talk) 23:20, 19 August 2021 (UTC)

New sharks and other chondrichthyans from the latest Maastrichtian (Late Cretaceous) of North America (Q60781629) has the subject Q58944531 (an article titled "Palaeontology")

@Trilotat: You're listed as the creator of the article, via quickstatements - what tool did you use exactly? ArthurPSmith (talk) 16:57, 20 August 2021 (UTC)

@ArthurPSmith: I used https://sourcemd.toolforge.org/index_old.php. Trilotat (talk) 21:05, 20 August 2021 (UTC)

I confirm the old Source MD casually adds nonsensical subjects. I remember it starting some months ago. --SCIdude (talk) 06:06, 21 August 2021 (UTC)

That sounds like a bug. I'll ping @Magnus Manske: in telegram to see if he can either fix it or disable it. We can reimplement a better tool in Python+Flask if necessary, maybe with the help of @csisc:?.--So9q (talk) 19:52, 21 August 2021 (UTC)

Magnus said in telegram that he is looking into it soon.--So9q (talk) 08:26, 23 August 2021 (UTC)

@Charles_Matthews: did much work on MeSH subjects (of PMID articles) in the ScienceSource project. Why doesn't Magnus coordinate with him? --SCIdude (talk) 14:34, 24 August 2021 (UTC)

Well, normally Magnus and I have coffee in Cambridge together regularly. But these are not normal times! I collect errors for main subject (P921) statements, for example at Wikidata:ScienceSource project/Followups. I wasn't aware that people were using Source MD, actually. Charles Matthews (talk) 15:00, 24 August 2021 (UTC)

@Charles Matthews:, I thought Source MD was the best tool for adding scientific articles with a DOIs. Please tell me if there's something better for a non-super user type. Trilotat (talk) 20:50, 24 August 2021 (UTC)

@Trilotat: I suggest this. At Wikidata:SourceMD/instructions#SourceMD stages information for review it says "The user can edit the text which SourceMD presents. Typically there is no reason to change anything." Well, these days I think unsourced P921 statements are not an asset. You can find the lines starting "LAST P921" and remove them, by text editing. Charles Matthews (talk) 07:18, 25 August 2021 (UTC)

@ArthurPSmith, So9q, Trilotat, Charles Matthews:: Thank you for bringing me here. Of course, there are several methods to bring main subject statements for scholarly publications. The best solution we currently have is Bibliometric-Enhanced Information Retrieval. I am working on the topic for years and I have published several research letters and opinion papers in highly referred scholarly journals. Please refer to https://www.jclinepi.com/article/S0895-4356(17)31073-9/fulltext, https://www.sciencedirect.com/science/article/pii/S1532046418300947?via%3Dihub and https://internal-journal.frontiersin.org/articles/10.3389/frma.2021.694307/full for further information. I think that I should do a presentation in Wikidata:Events/Data Quality Days 2021. --Csisc (talk) 09:37, 25 August 2021 (UTC)

@Csisc: That's surely a debate. The method I have been applying since 2019 is import from a PubMed API of the MeSH major terms, and those are added to PubMed by expert humans. The only issues I know of there are incorrect disambiguations of the terms. MeSH terms occur in other contexts, such as clinical trials. In my time at ContentMine there were certainly discussions of automated methods, but I would still regard that as a research field. A precision of 90% would be considered good; but on the other hand an error rate of 10% would be considered bad for additions here. Charles Matthews (talk) 09:49, 25 August 2021 (UTC)

@Charles Matthews: I discussed this in my work. The solution is to restrict your extraction to MeSH terms having qualifiers. Where you have the qualifier, you can predict the class of the heading. An example is Hepatitis C/Drug Therapy. Here, Hepatitis C is the heading and Drug Therapy is the qualifier for this term. The qualifier Drug Therapy can only be associated to diseases. So, even if you are not familiar with what Hepatitis C stands for, you can find that it is certainly a disease. --Csisc (talk) 09:57, 25 August 2021 (UTC)

@Csisc: Well, I think I know what you are talking about. NCBI2wikidata which was written as a custom tool for the ScienceSource project has a deductive step, though it is not the one you describe. But the scope of MeSH term import is broader than the project work done in 2018/9. Translation through MeSH descriptor ID (P486) of MeSH terms is possible, and further work is being done about it. There are practical steps about automation, and issues with the MeSH statement quality here. I do think you need to justify the statement "best solution we currently have". I work all the time on the main subject (P921) space here, and see various methods, good and not so good, that are going on. I'm primarily interested in the PubMed publication ID (P698) area of the problem: which is about 30M. Obviously this is not everything. Charles Matthews (talk) 10:13, 25 August 2021 (UTC)

Property proposal: reprinted in

I have made a proposal for a "reprinted in" property, for a "larger work in which a shorter work has been reprinted or anthologised".

Please make your comments on the proposal page. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:58, 28 August 2021 (UTC)

Translations as works

I'm wondering about best practices for modeling translations of works. For example, I recently have worked on the following items.

Death's End (Q607511): 2010 science fiction novel by Liu Cixin
Death's End (Q108381184): English translation of Liu Cixin's novel
Death's End (Q54810512): first English edition of Liu Cixin's novel 9781784971625

(Please ignore the revision histories. It will only make this discussion more complicated.)

Currently, all of these statements exist:

Death's End (Q108381184)instance of (P31)version, edition or translation (Q3331189)
Death's End (Q108381184)instance of (P31)literary work (Q7725634) - because I figure a translation is a creative work in itself.
Death's End (Q108381184)edition or translation of (P629)Death's End (Q607511)
Death's End (Q108381184)has edition or translation (P747)Death's End (Q54810512)

This raises a constraint violation error because items aren't allowed to have both edition or translation of (P629) and has edition or translation (P747). What am I doing wrong? What should be the relationship between these items if not what I have done? Daask (talk) 13:44, 2 September 2021 (UTC)

@Daask: This is probably more for Wikidata:WikiProject Books - however, I believe Death's End (Q108381184) is unnecessary - are there really going to be many editions of the same translation? ArthurPSmith (talk) 16:53, 2 September 2021 (UTC)
I'd just have an item for each edition of interest, whether or not it's a translation, and link them all to the "work" item which is Death's End (Q607511). I don't think it's desirable to create a network of relationships between editions. Ghouston (talk) 04:20, 3 September 2021 (UTC)

website vs journal

from Talk:Q180445... Hello. There is a lot of content on the nature.com website that is not officially included in the Nature journal per se (e.g. blogs.nature.com). Would it make sense to have two separate Wikidata items? -- one for the nature.com website and one for the journal? (I would like to cite the website, but not the journal. Example here). Thoughts? Thank you. -- Oa01 (talk) 10:39, 15 October 2021 (UTC)

@Oa01: That seems fine to me, we have plenty of other items for websites. ArthurPSmith (talk) 17:40, 15 October 2021 (UTC)

@ArthurPSmith: thanks! -- Oa01 (talk) 08:20, 16 October 2021 (UTC)

Author IDs

LensIDs currently offer a metarecord for publications, but they have author IDs and organization IDs as well under the hood. The author IDs will become publicly accessible in the next year.

If they end up both being called "LensID" but referring to different sorts of entities [works, people, organizations] would we want different records for the different facets of the identifier? Sj (talk) 18:55, 21 October 2021 (UTC)

@Sj: Lens ID (P7100) should probably be limited to works, and new properties created for other datatypes. However if the identifiers all fall within a single namespace and resolve the same way then perhaps it wouldn't hurt to just expand Lens ID (P7100) to cover all cases. But usually separate properties are the way to go in these cases. ArthurPSmith (talk) 21:16, 21 October 2021 (UTC)
I believe that even internally they have separate namespace-facets, this is a fine Q to discuss with the ID maintainers. Thanks. Sj (talk) 22:20, 21 October 2021 (UTC)

They have different IDs fir authirs6, eg see https://www.lens.org/lens/profile/280312543/scholar. The data is copied from ORCID, and linked to Lens publications.

The problem is that in 3 years, only 20 IDs are added, and even one of the samples is wrong. Which is a shame because Lens is one of the main SciKG sources.

Please move this section to the Lens ID page, thanks! Vladimir Alexiev (talk) 05:27, 26 January 2022 (UTC)

Letters to the editor

Moved from User talk:Aluxosm#letter to the editor (Q651270) v. scholarly article (Q13442814)

I saw you replaced scholarly article (Q13442814) with letter to the editor (Q651270) on Reply to Boslough et al.: Decades of comet research counter their claims (Q28661563).

That creates type constraint for articles with the properties PMC publication ID (P932), PubMed publication ID (P698), and ResearchGate publication ID (P5875). Perhaps you could add letter to the editor (Q651270) as genre (P136) of scholarly article (Q13442814) instead? Trilotat (talk) 01:46, 14 January 2022 (UTC)

@Trilotat: This is a tricky one. I've actually done this for all of the YDIH related letters. My thinking was that they rarely go through much peer review so it's best to keep them separate. That, and it makes the Scholia profile graphs more useful (I know that's probably not a great way to look at it). Would a better option be to make a new entity for scholarly letters/replies and have them as a subclass of scientific publication (Q591041) or something similar? Really not too sure what to do here, any help would be much appreciated, cheers! Aluxosm (talk) 16:54, 15 January 2022 (UTC)

There are many - a very great many - scholarly articles that never went through much peer review. Please don't conflate "scholarly article" with "peer reviewed article". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:41, 24 January 2022 (UTC)

@Pigsonthewing: It's a good point; I've come across a more than a few where it seemed as though the only person who read the article before publication was the original author! The en description for scholarly article (Q13442814) is, "article in an academic publication, usually peer reviewed", so all of the items I changed do still fit the description and are probably closer to that than they are to letter to the editor (Q651270). All in all, it wasn't my best reasoning but I'm still not sure what to do; I think there should be some kind of distinction between an extensive scholarly article with dozens to hundreds of references and a single page reply with only a handful. Do you have any thoughts on the idea of creating a new item (e.g. scholarly letter), or do you think that they should just have both scholarly article (Q13442814) and letter to the editor (Q651270) applied to them? Cheers! Aluxosm (talk) 15:00, 26 January 2022 (UTC)

The problem is that there is a continuum, it's not a binary issue, and has no clear point of delineation. Scholarly (or scientific) peer review is a relatively modern phenomenon; see en:Scholarly peer review which tells us, for example, that "Nature itself instituted formal peer review only in 1967.". The works of Linnaeus and Darwin were not peer reviewed. Many taxon names were first published in papers that we would not today consider peer reviewed. Clearly this needs more thought, and eventual consensus, and an improved model. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:48, 27 January 2022 (UTC)

@Pigsonthewing: Wow, I had no idea that the idea of peer review was so recent, thanks for the pointer! Sorry for not explaining myself particularly well again, I probably shouldn't have mentioned peer review. To clear up what I'm proposing, I created scholarly letter/reply (Q110716513) with the description: "article in an academic publication that focuses on another article, does not usually present any new evidence", and then changed all of the articles in question to that. Worst case, scholarly letter/reply (Q110716513) can just be merged back into scholarly article (Q13442814). Hope this makes a bit more sense! What do you reckon? Aluxosm (talk) 22:42, 27 January 2022 (UTC)

New wikibase for all scientific articles?

I recently found non-LOD metadata about 1.4M scientific papers from Swedish research institutions (Swepub). I talked to the universities, but they were not interested in leveraging their metadata to the LOD level.

I'm planning to dismbiguate the authers and maybe the subjects in a proper Wikibase myself. When talking to @Harej: in telegram he asked if we should make a science.wikibase.cloud wikibase for all articles. The current issues with the WDQS backend for Wikidata are probably not going away soon.

If we create a proper Wikibase, then we need to decide whether to use federated properties. Unfortunately, they are not working with Blazegraph at the moment, so I suggest we have our own properties to avoid issues with queries. That will make federated queries somewhat harder, e.g. for Scholia, but not impossible.

WDYT? The Source MetaData WikiProject does not exist. Please correct the name.--So9q (talk) 20:19, 24 January 2022 (UTC) The Source MetaData/More WikiProject does not exist. Please correct the name.

Hm, could you elaborate on what are the issues with the WDQS backend? So I am not sure why is there a need for another instance of Wikibase and we could not just add data to Wikidata itself? Maybe I am missing something obvious? Mitar (talk) 22:38, 24 January 2022 (UTC)

@Mitar: Essentially, the backend is based on Blazegraph (Q20127748) which is no longer being actively maintained and the size of Wikidata is starting to become an issue. At the current rate the graph only has a couple of years left before becoming overwhelmed. One of the ideas to lessen the load was to split off all of the scholarly articles (see here). Hopefully this all works out for the best and doesn't cause too many headaches! Aluxosm (talk) 15:31, 26 January 2022 (UTC)

I would prefer if data is kept stored in Wikidata itself, and if necessary only the WDQS part is split out into a separate instance which allows querying the scholarly articles subgraph. So I would say: let's add all Swepub papers to Wikidata and if necessary we can move querying to another instance itself. Mitar (talk) 23:28, 26 January 2022 (UTC)

@Mitar: The data wouldn't be deleted from Wikidata, we'd just lose the ability to query some of it. The WDQS part is already separate, the problem is that it'd start crashing/just wouldn't work if the graph got too big. The tricky part is how to query all of that data, not how to store it. I somewhat agree though, it does all sound a bit worrying but I don't think it means that we should stop adding data. I just hope that the Wikimedia Foundation has this as a top priority! Aluxosm (talk) 17:51, 29 January 2022 (UTC)

This was a big question about a year ago and many people disliked importing millions of articles before you had the foundational data toi ground them (e.g subjects, institutions, journals). This was assuaged by demoting articles in WD search and autocompletion.

So right now I think people are happy to keep articles in WD, for the value of being able to link to all contextual items.

Your 1.2M won't add too much burden on top the 40M existing :). And adding up to a million researchers is also ok, as long as you deduplicate them. Vladimir Alexiev (talk) 05:14, 26 January 2022 (UTC)

I think you might have missed the somewhat alarming status about the Wikidata Query Service at the end of 2021.

Even if we try to keep the existing corpus current, I don't think venturing into additional fields is a something to do at the moment.

Maybe swepub could indeed be a good model to host them in a separate instance. --- Jura 15:43, 26 January 2022 (UTC)

Manually adding CiTO annotations

Moved to Wikidata talk:WikiProject Source MetaData/Citation Typing Ontology#Manually adding CiTO annotations

WikiProject redesign, cleanup, icon?

Please no one have high expectations or want quick reactions, but I am thinking about cleaning up this WikiProject with a redesign. Since there are lots of participants here I thought I would post to the talk page first.

Tabs

I am thinking of setting up this page with tabs. It will probably look like this

Icon

Lots of WikiProjects have some image which represents the WikiProject. I like images because I think they make people remember or recognize the project better, especially since on Wikidata we have long-term users who may only visit projects yearly or every few years. Seeing a familiar image helps them remember it. Also pictures are fun.

I like these "source" icons from nounproject.com. They all have Wikimedia compatible licenses. Check out any of the images there and propose a favorite.

https://thenounproject.com/search/icons/?iconspage=1&q=source

My favorites are

Thoughts from anyone? Bluerasberry (talk) 20:43, 15 February 2022 (UTC)

Support here. I like your second "favorite" icon suggestion. ArthurPSmith (talk) 21:10, 15 February 2022 (UTC)

Chaotic end to the activities of WikiCite and WikiProject Source MetaData

Conversations happen in lots of places and somehow some important ones missed this talk page. I do not know who has a brief and accurate explanation. Here is a brief oversimplified explanation which might be close enough: the Wikimedia Foundation says that Blazegraph (Q20127748) has reached its limit, Wikidata is full and people have to quit adding content and querying it, and the problem is contributors to WikiCite and WikiProject Source MetaData. By stopping WikiProject Source MetaData, the Wikimedia Foundation will gain 2 years to find and design an alternative solution. I am not sure what happens after that.

August 2021 paper - Wikitech:User:AKhatun/Wikidata Scholarly Articles Subgraph Analysis
2018 - Wikidata:WikiCite/Roadmap

Somewhere there is the proposal of what the Wikimedia Foundation is going to do next. They presented at Wikimania and also have a page somewhere. I forgot where. Does anyone have the link?

Of course I care a lot but for typical Wikidata editors, here is my own suggestion: if you making fewer than 1 million edits these changes may not affect you much, but be aware that some bigger projects are paused. Many big projects were paused for years before this anyway. Also feel encouraged to continue to do data modeling, because we still need examples and best practice recommendations for all sorts of source metadata.

Bluerasberry (talk) 22:19, 15 February 2022 (UTC)

@Bluerasberry: There's a section above where this was discussed (#New wikibase for all scientific articles?) which includes a link to the Blazegraph failure playbook. Could you point to an official statement from the WMF calling for this project should be shut down? Aluxosm (talk) 06:59, 16 February 2022 (UTC)

@Aluxosm: There is a WMF statement but I forgot where it is. There is a YouTube video explaining the problem and then the published statement either in Wikidata or on Meta. Let me look more, or maybe someone else here knows. Bluerasberry (talk) 12:13, 16 February 2022 (UTC)

@Bluerasberry: There was a panel discussion at WikidataCon 2021 and lots of talk of plans but I haven't seen anything to suggest that they needed to be implemented immediately. Shutting Wikicite down would be a pretty big deal for a large number of contributors/users; not really the sort of news that would be passed on via the grapevine (no offense and apologies if this is actually the case). Aluxosm (talk) 13:36, 16 February 2022 (UTC)

Ah, it is that Blazegraph failure playbook. That panel discussion video is not the one with the WMF presentation on the failure playbook, but yes, that it is the right issue and that is all the statement we have. I do not think there is an implementation statement saying when and exactly how all this will happen. Bluerasberry (talk) 22:02, 16 February 2022 (UTC)

Wikimedia Foundation has open event on this tomorrow Wikidata:SPARQL query service/Feb 2022 scaling community meetings Bluerasberry (talk) 22:03, 16 February 2022 (UTC)

If you think about numbers, 40 million articles versus a few thousands/week added now, I think that a closure of SourceMD would be barking up the wrong tree IMHO. --SCIdude (talk) 10:33, 17 February 2022 (UTC)

Importing from OpenAlex

Hi. I just posted this in the Wikidata chat in Telegram:

Related to the issues concerning BlazeGraph there is a new thread here https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Source_MetaData#Chaotic_end_to_the_activities_of_WikiCite_and_WikiProject_Source_MetaData
I participated in the WDQS scaling community call yesterday and I invite anyone interested to join the next call.
Meanwhile I'm continuing my work on my new bot with the goal of importing 20M+ articles into Wikidata from OpenAlex now that we have a disaster plan and don't have to make fear-based decisions. If BG breaks, WMF simply cuts out the scientific articles from WDQS according to the plan.
Anyone can set up a Wikibase and import a part of Wikidata and make it possible to make SPARQL queries on the scientific items and I predict someone will do it within a month from the disaster plan is executed.
I will post the request for botflag here once it is ready.

The code is here https://github.com/dpriskorn/OpenAlexBot --So9q (talk) 08:09, 18 February 2022 (UTC)

Here is the request https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/OpenAlexBot--So9q (talk) 15:00, 23 February 2022 (UTC)

@So9q: It looks like you are lower-casing DOI's instead of upper-casing them? All DOI's in Wikidata right now are upper-case, and you will not find matches with WDQS (or, I think, haswbstatement) if you have the wrong case. ArthurPSmith (talk) 17:30, 24 February 2022 (UTC)

In the Wikicite group @Harej suggested we lowercase them all (in Wikidata). I use CirrusSearch which is based on Elasticsearch which has case-handling built in. Compare [1] and [2] :) So9q (talk) 08:48, 25 February 2022 (UTC)

I want to clarify that although that is my personal opinion, as I understand there is currently consensus to capitalize DOIs in Wikidata, and drift away from this has been accidental (and largely a product of inconsistent enforcement). Harej (talk) 17:06, 25 February 2022 (UTC)

University adding portraits

This project collects a lot of academic publications, and because of that makes structured data for authors. Until now I do not think we have an example of an organization which has tried to give us an image collection of their researchers, but here is one -

Bluerasberry (talk) 20:19, 7 March 2022 (UTC)

Wikidata software profiling hackathon, June 6&8

Those interested in software + Wikidata are invited to the Scholia Hackathon 6&8 June 2022.

Wikidata:WikiProject Scholia/June 2022 hackathon

WD:Scholia is a Wikidata front end which does scholarly profiling, and is best known as tool for browsing the WikiCite collection of WD:WikiProject Source Metadata.

An example Scholia profile for the software Stata (Q1204300) is

https://scholia.toolforge.org/software/Q1204300

Anyone interested in examining any part of Wikidata connecting to software is welcome. Bluerasberry (talk) 20:40, 19 May 2022 (UTC)

Suggestions on adding affiliation string to author names

Hi there. I used my automated tool to create a scholarly article item that added affiliation strings to each author from ADS database to Wikidata. Link is here: https://www.wikidata.org/wiki/Q113322652 Would adding affiliation string to author or author name string be useful? I'd like to hear advise and your suggestion. Feliciss (talk) 12:29, 28 July 2022 (UTC)

@Feliciss: Yes, adding these would be useful, but I would prefer they be exactly as in the article, not parsed/edited. For your example, the Stanford affiliation in the article is listed as "Division of Applied Mechanics Stanford University, Stanford, CA 94305, U.S.A.", so I would think that should be the string used here? ArthurPSmith (talk) 20:36, 28 July 2022 (UTC)

@ArthurPSmith Can you link to where the string "Division of Applied Mechanics Stanford University, Stanford, CA 94305, U.S.A." comes from? In my case, it's exactly as same as what we see in the article. I get the whole affiliation strings for each author from https://ui.adsabs.harvard.edu/abs/1982CMAME..32..199B/abstract. You can view the affiliations of each author by clicking "Show affiliations". Feliciss (talk) 07:28, 29 July 2022 (UTC)

I see. It's from DOI in the article. Since my bot only get affiliation strings from ADS, it's not possible (or does not make sense) to get the affiliation strings twice from the DOI in the article. Feliciss (talk) 07:37, 29 July 2022 (UTC)

Ok, you are adding the reference to the ADS bibcode there so I guess that's fine. Obviously ADS is doing some parsing of affiliations but I think they're pretty reliable about that so this is ok. ArthurPSmith (talk) 17:11, 29 July 2022 (UTC)

Another example: https://www.wikidata.org/wiki/Q113380669

I think ADS is parsing some but not all affiliation strings. Feliciss (talk) 08:27, 2 August 2022 (UTC)

Who is the author "JC Shakespeare"?

A cautionary tale: https://shkspr.mobi/blog/2022/08/who-is-the-author-jc-shakespeare/ - Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:28, 20 August 2022 (UTC)

Reports published by policy and research organisations, can they be considered generally reliable?

also posted at w:WP:Village pump (policy) since it’s relevant there too

I’m looking for opinions on institutional policy and research reports in general as reliable sources as part of the WikiProject Policy Reports project. The example source types on WP:RS (scholarship, news, vendor etc) don’t quite cover our area of interest: reports, conference papers, discussion and briefing papers, strategies, policies and other docs (sometimes called grey literature). These are generally self-published by organisations (e.g. the WHO publishes WHO reports) but it’s obviously not the same as someone’s self-published blog or book.

I realise that for specific citations in WP it’s case-by-case. However, we’re looking for some guidance on what principles or criteria we could use to prioritise/sort organisations into 1) Generally reliable / 2) unclear / 3) generally unreliable since these sorts of items are likely often useful as potential WP sources in addition to books/journals/newspapers. As part of the project we’re looking to prioritise which organisations’ reports are most useful to upload metadata to Wikidata about. If general principles aren’t really possible, it’d be helpful to have some examples to calibrate on e.g. these five organisations:

The Australia Institute is an independent public policy think tank based in Canberra, Australia that carries out research on a broad range of economic, social, and environmental issues (APO-listed reports)

Australian Institute of Health and Welfare (AIHW) is Australia's national agency for information and statistics on Australia's health and welfare (APO-listed reports).

Ministry of Business, Innovation and Employment is the New Zealand government department responsible for contributing to economic productivity and growth (APO-listed reports).

Lowitja Institute is is a national research centre focusing on Aboriginal and Torres Strait Islander health and wellbeing (APO-listed reports).

Australian Council of Social Service (ACOSS) is the peak body for the community services sector in Australia and advocates for action to reduce poverty and inequality (APO-listed reports).

Thanks in advance for the feedback on these! We’ve >70 publishing organisations that we’re focusing on so these will help us calibrate which sorts of organisations are worth focusing on uploading metadata to Wikidata. If anyone has an interest in the full list, please let me know and I can loop you in on the full project. Brigid vW (talk) 07:06, 28 September 2022 (UTC)

Notability of the organization might be a useful guide, for example the number of sitelinks to language wikipedias for the organization? ArthurPSmith (talk) 21:19, 28 September 2022 (UTC)