User talk:Kriomet

From Wikidata
Jump to navigation Jump to search

Logo of Wikidata Welcome to Wikidata, Kriomet!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, don't hesitate to ask on Project chat. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards! ArthurPSmith (talk) 14:39, 23 April 2021 (UTC)[reply]

Stems in Swedish

[edit]

Hi, I write in English since you seem to be from fr.wiktionary.org. Please add a babelbox to your page here.

I noticed that you added stems to a lot of Swedish lexemes, thank you!

Unfortunately these 2700 verbs now have a stem ending in -a which to the best of my knowledge is wrong. Could you fix that? E.g. krishantera (L46483) should be "krishanter" so that it can have "-ing" to form a noun. As it is now it would be "krishanteraing" which is incorrect. See https://sv.wikipedia.org/wiki/Ordstam. Thanks in advance :)--So9q (talk) 08:18, 9 August 2021 (UTC)[reply]

Thanks, I'll fix these! I based the stem on https://en.wikipedia.org/wiki/Swedish_grammar#Conjugating_verbs but as you pointed out this is only correct in the context of verb conjugation. Kriomet (talk) 09:26, 9 August 2021 (UTC)[reply]
Perfect, I see you already started a batch :). How do you prepare your batches? Sparql in Petscan -> QS?--So9q (talk) 08:52, 10 August 2021 (UTC)[reply]
For changing the stems, I used a CSV from Query Service and a text editor with some regex. I've also written small Python scripts for more complex tasks but should probably look into bots in the future. Kriomet (talk) 12:11, 10 August 2021 (UTC)[reply]

Idea for new script :)

[edit]

Hi, WDYT about adding P31:verbalsubstantiv to all nouns that are derived from verbs like this one sanering (L592856)?--So9q (talk) 14:56, 16 September 2021 (UTC)[reply]

Good idea! My bot has already been adding -ing/-ning/-are (see garnering (L584503), mätning (L33331) and visare (L242790)) so these kind of verbal nouns could already be queried and adding P31 shouldn't be too hard. Kriomet (talk) 15:37, 16 September 2021 (UTC)[reply]
Now the bot will mark new lexemes with these suffixes as verbal nouns. I've also backfilled lexemes with QuickStatements. In the future, it may be worth to handle verbal nouns more generally. Kriomet (talk) 16:10, 19 September 2021 (UTC)[reply]

Synonymer.se

[edit]

Hi, I got a see you are matching with synonymer.se. I got a file from synonymer.se with all swedish lexemes matching at that time. see https://gist.github.com/dpriskorn/42ffa15ffeafedc54f3bc6245d9c5479--So9q (talk) 09:20, 12 October 2021 (UTC)[reply]

Great! I only did a one-time import based on Common Crawl index as the lemma is found in the URL but this matched merely 1577 lexemes. Kriomet (talk) 12:06, 12 October 2021 (UTC)[reply]
I could ask them for a new file like the above if you would like? So9q (talk) 17:32, 23 December 2023 (UTC)[reply]

FHO

[edit]

@Kriomet: Hello, I’m writing in English instead of Finnish/Swedish for others to easily follow. I saw that your bot added paradigm class (P5911) for some Swedish words I added as lexemes, thank you! I’m wondering if you have tips for me how to effectively put in more lexemes.

I’m working on putting in content from Förvaltningshistorisk Ordbok (Q98414308) into lexemes, and also adding the source to Wikidata objects with Förvaltningshistorisk Ordbok ID (P11123) and Wikipedia (svwp, reference template sv:Mall:FHO). I have put in 36 lexemes so far, you should see them with this query.

At the moment I’m adding lexemes by individually filling in the form for utrum and neuter and then adding manually described by source (P1343), a gloss in Swedish and Finnish, and item for this sense (P5137). Would your script be able to add the forms automatically? Either from wiktionary, or perhaps automatically in cases where a word has combines lexemes (P5238) where the latter word already has forms entered?

The source data also contains Swedish synonyms, Finnish descriptions and translations, which could be used to fill in synonym (P5973) and translation (P5972). Are there any scripts (python?) or tools (openrefine?) that could automatically search and suggest entries? Adding them manually through the wikidata GUI is so slow.

Thank you for any thoughts on this! Robertsilen (talk) 08:40, 28 October 2022 (UTC)[reply]

@Robertsilen Hello and sorry for the late reply. The new lexemes look very nice!
I agree that editing lexemes manually is quite cumbersome but unfortunately I'm not aware of any better tool. I've only tried to improve existing lexemes with my scripts based on existing forms, so the scripts cannot be used to automatically add forms for instance. But using combines lexemes (P5238) for new forms sounds like a good idea! Kriomet (talk) 19:19, 19 December 2022 (UTC)[reply]
I could help you with this. Lexeme Forms now support Wikifunctions, but currently we are missing swedish language functions.
For synonyms and translation I could help you create tooling to semi-automate the work. Feel free to reach out to me. So9q (talk) 17:35, 23 December 2023 (UTC)[reply]

Question about prefix.py

[edit]

Hi,

I've read the code but I'm not sure to really understand how this script works so I figured I'll ask you directly. I'M interrested in adding French (and maybe Breton) but my main question how do this script works when there is an uncertainty (like in English, unlockable which is both "un- + lockable" and "unlock + -able" or in French "ressortir" which is both "re- + sortir" for ressortir (L17373) and "ressort + ir" for ressortir (L691143)). Is there some check somewhere in the code or not? (I didn't saw it... but my python skills are not very high).

Cdlt, VIGNERON (talk) 08:35, 22 May 2023 (UTC)[reply]

It would be great to support more languages!
The script doesn't currently check these kind of ambiguous words and would incorrectly split them both the same way. Maybe the script could detect and skip these words and let humans handle them. However, for this to work properly, both lexemes "sortir" and "ressort" would have to exist already. Kriomet (talk) 15:05, 22 May 2023 (UTC)[reply]
For Breton, I don't think it's a good idea (too many similar suffix like -us and -adus or -eg, -adeg) and there is too few words anyway.
For French, I'm sure something is possible, I need to think about it. I had a look at existing suffixes, https://w.wiki/6k9X, there is a few similar one (eg. -rice and -trice) but in most of the case, it should ok. How do you want to proceed ? Should I give a list of examples like simplifier (L16539) (verb) = simple (L7027) (adj) + -ifier (L478548) (in the exact same way -ify works in English).
Cheers, VIGNERON (talk) 10:13, 23 May 2023 (UTC)[reply]

Odd combination

[edit]

This edit seems odd since it combines a lexeme with totally different sense. It is pertaining to klippa (L38115), not klipp (L38114). How should it be fixed so that the bot doesn't repeat it? Ainali (talk) 19:33, 14 July 2023 (UTC)[reply]

Thanks for raising this issue! Currently the bot doesn't handle this situation correctly and probably shouldn't do anything in ambiguous situations like this. I've disabled the bot for now until I have more time to work on it. Kriomet (talk) 20:30, 14 July 2023 (UTC)[reply]

Source for swedish noun declensions is missing

[edit]

Hi, I'm curious if you used a source when adding edits like this https://www.wikidata.org/w/index.php?title=Lexeme:L243358&diff=prev&oldid=1407643509

Do you know of a good source for this kind of information? So9q (talk) 18:14, 16 July 2024 (UTC)[reply]

It's based on the existing forms of the lexeme, so no external source was used. Kriomet (talk) 21:26, 16 July 2024 (UTC)[reply]