User:Rukhabot

From Wiktionary, the free dictionary
Jump to navigation Jump to search

I'm a bot created and controlled by Ruakh. I'm one of the more prolific bots on the English Wiktionary, performing some regular high-volume tasks; last I checked, I accounted for more than 7% of all edits ever made to this site.

My source code is in Perl, making use of the URI, LWP::UserAgent, and JSON modules, as well as home-grown modules. If you'd like to see my source code, ask, but don't expect me to be released under the GPL anytime soon.

My username is a romanization of a hypothetical Hebrew רוּחֲבּוֹט, which isn't perfectly grammatical Hebrew (as the "b" would be a "v" in really proper Hebrew), but is a quasi-plausible neologism meaning "a bot's wind" or "a bot's spirit". (Ruakh's username, by comparison, is a romanization of the Hebrew word for "wind" or "spirit".)

If you see me do something bad, please leave me a comment on User talk:Rukhabot; I'll notice immediately, and will refuse to make any more edits until Ruakh has seen the comment.

Interwikis

[edit]

It used to be that adding/removing/sorting/organizing/formatting main-namespace interwiki-links was a task that required bots, instead of being handled automagically on the MediaWiki servers. Back then, that was my main task; it still accounts for the vast majority of my edits to date. I operated differently from other interwiki-bots, operating solely based on what pages exist (rather than using Wikipedia-derived code that searched for inconsistencies between different Wiktionaries' interwiki-links).

Translation-templates

[edit]

Another significant task, but not accounting for nearly as many edits as the interwiki-links task, is conversion between {{t}} and {{t+}}, between {{tt}} and {{tt+}}, and between {{t-check}} and {{t+check}}. Various random facts about me in my guise as a translation-template-bot:

  • I only edit the English Wiktionary.
  • I only edit pages in the main namespace (regular entries) and the Appendix namespace.
  • I don't examine the page-creation, page-deletion, and page-move logs; rather, I operate based on database-dumps from https://dumps.wikimedia.org/ (specifically enwiktionary-YYYYMMDD-pages-articles.xml.bz2 and PREFIXwiktionary-YYYYMMDD-all-titles-in-ns0.gz). Since the dumps take a few days to be generated, this means that my information is typically about a week out of date.
  • I only convert between those pairs of templates, plus converting {{t-}} to {{t}} or {{t+}}. If a translation does not use any of those templates, it will not be touched.
  • I choose between {{t}}/{{tt}}/{{t-check}} and {{t+}}/{{tt+}}/{{t+check}} using the rules you'd expect ({{t+}}/{{tt+}}/{{t+check}} when the foreign-language wikt exists and has the entry; {{t}}/{{tt}}/{{t-check}} in all other cases), with a few special cases:
    • When the translation contains an explicit link, I use {{t}}/{{tt}}/{{t-check}} (since {{t+}}/{{tt+}}/{{t+check}} don't support that case).
    • I know that the language-codes nan, cmn, nb, rup, kmr, and nds-de/nds-nl/pdt correspond to zh-min-nan.wikt, zh.wikt, no.wikt, roa-rup.wikt, ku.wikt, and nds.wikt, so I use {{t+}} for them when appropriate. For example, no:yes exists, so I will convert {{t|nb|yes}} to {{t+|nb|yes}} and {{t|no|yes}} to {{t+|no|yes}}.
    • zh.wikt has a feature whereby, if a page doesn't exist, the software will try converting the pagename from Traditional Chinese to Simplified Chinese, and vice versa, to see if one of those exists; if so, it will issue an HTTP 301 redirection to the existing page. I'm fully aware of this feature, so I'll write {{t+|cmn|...}} if zh:... either is an entry or redirects to one. Likewise for ku.wikt, sr.wikt, and iu.wikt, which all have the same sort of feature as sr.wikt (but with different conversion rules). If you're interested in how I do this, see User:Rukhabot/Tbot and language variants.
  • I do not change any formatting outside of the template call.
  • I don't try very hard to understand the subtle complexities of MediaWiki template syntax. For example, I will be fooled by {{t+|fr|asfasefasefase|2=le}}, which looks like it links to fr:asfasefasefase, but which actually links to fr:le. However, even in such pathological cases, I won't cause any serious harm — I just might select the wrong template.
  • I don't examine context at all; I'm just as happy to update a {{t}} in a ====Synonyms==== section, or inside a comment, as a properly-used {{t}} in a ====Translations==== section.
  • I have no special behavior for B/C/S/M; for example, I will convert {{t|hr|Leiter}} to {{t+|hr|Leiter}} and will leave {{t|sh|Leiter}} alone.

One-off tasks

[edit]

I have performed a variety of one-off page-editing tasks over the years.

Normalization of entries

[edit]

I have recently (since September 2020) been performing some tasks to bring pages into compliance with various of the rules at Wiktionary:Normalization of entries.