User:DerbethBot

From Wiktionary, the free dictionary
Jump to navigation Jump to search

A bot by Derbeth used for adding IPA and pronunciation files to entries. Also working on Commons, where it uploads new pronunciation files.

Source code: GitHub

Frequently Asked Questions

[edit]
  1. The bot just added an audio file that is unacceptable because <reason here>. Can you please fix your bot?
    TL;DR: It's much better to fix the data (Commons).
    Commons should not contain useless or misleading files.
    If an audio file is broken, please nominate it for deletion on Commons. I'm very much willing to help: if you mention me on the deletion request page ([[User:Derbeth|Derbeth]]), I will get a notification about it and will vote in support (if there is a good explanation given).
    In case of languages with non-Latin script, my bot tries to read the pronounced word from the file description on Commons (an example). If the description is misleading, you can simply edit and fix it now.
  2. I reverted your bot's wrong edit. Does it end the issue?
    No.
    I regularly re-run my bot (sometimes every week), so it will repeat the same edit over and over again.
    If the problem is with Commons file, please fix the Commons file.
    If the Commons file is correct, but my bot does incorrect things, contact me, it may be a bug.
  3. Can you add the problematic file to a blacklist?
    I prefer not to.
    If I create such a blacklist, the wrong files will remain on Commons, a poor state of affairs. The best course of action is to delete or rename the files on Commons. Audio files are used not only on English Wiktionary, but also on other Wiktionaries. There are other bots that add audio files - I am not able to fix them, and I am not even aware how many of such bots exist.
  4. What if a file is not acceptable as pronunciation here but cannot be deleted from Commons?
    1. Renaming will help. The new name should somehow suggest the file is 'nonstandard', hypercorrective, synthesized an so on. Use commons:Commons:File renaming.
    2. Add the file to at least one of categories that my bot excludes (less optimal, as other bots probably don't use those). See below for those categories.
  5. The bot does not add audio files to a page, although it's clear that they match. Why?
    1. Perhaps the page contains definition consisting of more than 1 definition, more than 1 speech part. In such case it impossible to automatically judge whether the audio files matches first, second or all parts. The audio file needs to be added by a human. To help with finding and fixing such cases, generated reports contain list of files that could not be added automatically. Help is appreciated: the bot owner does not have time to edit such a big number of pages.
    2. Lingua Libre files are rejected by default. This because English Wiktionary users requested less trust for Lingua Libre data. There were cases of poor quality Lingua Libre data uploaded to Commons. Contact me so that I can add the author to the bot 'whitelist'.
    3. In other cases, please contact me - it might be a bug in the bot.

Selection rules

[edit]

Lingua Libre files

[edit]

Rejected by default. Only authors from bot 'whitelist' accepted. I maintain the list in the source code.

Other files

[edit]

Correctly-named files (like En-cat.ogg or De-Katze.oga) are accepted by default.

Non-standard naming (like Cat.ogg) are mostly rejected, but I do some exceptions and have some rules to 'guess' from bad names (no guarantees for them).

Files from commons:Category:Deletion requests and commons:Category:Speech impediments are always rejected ([https://github.com/Derbeth/perlwiki/blob/master/audio_fetcher.pl#L61 source).

How to prepare a good deletion request

[edit]

Examples:

  1. To find more, search Deletion requests for "pronunciation".

Reports

[edit]

Reports contain list of files that need to be added manually. There are cases where the bot is not able to add audio files because the article structure is non-standard, sometimes it is explicitly prohibited from doing so, for example when there are multiple etymologies (and only human can decide what to do with the audio file).

If you can help, please examine the latest report and remove entries you check from the list.