Page MenuHomePhabricator

Scripts fail if only family:wikidata is specified
Open, LowestPublic

Description

python pwb.py listpages.py -family:wikidata -start:A

Traceback (most recent call last):
  File "pwb.py", line 166, in <module>
    run_python_file(fn, argv, argvu)
  File "pwb.py", line 67, in run_python_file
    exec(compile(source, filename, "exec"), main_mod.__dict__)
  File "scripts/listpages.py", line 58, in <module>
    main()
  File "scripts/listpages.py", line 35, in main
    local_args = pywikibot.handleArgs(*args)
  File "/home/user/python/core/pywikibot/bot.py", line 638, in handleArgs
    init_handlers()
  File "/home/user/python/core/pywikibot/bot.py", line 246, in init_handlers
    writelogheader()
  File "/home/user/python/core/pywikibot/bot.py", line 257, in writelogheader
    site = pywikibot.Site()
  File "/home/user/python/core/pywikibot/__init__.py", line 527, in Site
    _sites[key] = __Site(code=code, fam=fam, user=user, sysop=sysop)
  File "/home/user/python/core/pywikibot/site.py", line 636, in __init__
    BaseSite.__init__(self, code, fam, user, sysop)
  File "/home/user/python/core/pywikibot/site.py", line 167, in __init__
    % (self.__code, self.__family.name))
pywikibot.exceptions.NoSuchSite: Language en does not exist in family wikidata
<class 'pywikibot.exceptions.NoSuchSite'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort

In site.py, the following fails:

158     if (self.__family.name in list(self.__family.langs.keys()) and
159             len(self.__family.langs) == 1):

len(self.__family.langs) is not 1:

{'test': 'test.wikidata.org', 'wikidata': 'www.wikidata.org'}

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 3:28 AM
bzimport set Reference to bz69255.

Do you really think this needs to fix? when there is a test repo (two wikis instead of one) so It's unreasonable to fix it

There is an inconsistent status, so I would fix this.
Up to you and the others' opinion how to move FW.

(In reply to Mpaa from comment #2)

There is an inconsistent status, so I would fix this.
Up to you and the others' opinion how to move FW.

I agree this should be fixed.

Something like this should be OK; very little change of causing problems with custom family files.

  • len(self.__family.langs) == 1):

+ len(self.__family.langs - ('test')) == 1):

This has been proposed to be investigated as part of a Google-Code-in-2014 task.
https://www.google-melange.com/gci/task/view/google/gci2014/5826944515964928

Change 179586 had a related patch set uploaded (by M4tx):
Implement wbsearchentities

https://gerrit.wikimedia.org/r/179586

Patch-For-Review

Change 179599 had a related patch set uploaded (by M4tx):
Fix NoSuchSite error on multi-lang sites that have 'test' language family.

https://gerrit.wikimedia.org/r/179599

Patch-For-Review

Change 179586 had a related patch set uploaded (by M4tx):
Implement wbsearchentities

https://gerrit.wikimedia.org/r/179586

Patch-For-Review

Unrelated.

If I understand the problem correctly it is, that when the family does contain only one language (e.g. 'commons' in comparison to 'wikipedia') the -lang parameter should be optional.

Now I'd then suggest that the family itself says if it contains a primary code which is added to 'langs' but 'langs' then contains mostly unused codes (usually only 'test'). The comparison could then be (if langs contains more than one element) if the primary code is not empty.

If I understand the problem correctly it is, that when the family does contain only one language (e.g. 'commons' in comparison to 'wikipedia') the -lang parameter should be optional.

Now I'd then suggest that the family itself says if it contains a primary code which is added to 'langs' but 'langs' then contains mostly unused codes (usually only 'test'). The comparison could then be (if langs contains more than one element) if the primary code is not empty.

Exactly what I proposed on https://gerrit.wikimedia.org/r/179599 :-)

that when the family does contain only one language (e.g. 'commons' in comparison to 'wikipedia') the -lang parameter _is_ optional! This bug is invalid imho.

But there is basically only one wikidata (and one minor test wikidata) like there is only one commons.

I think we should rather scrap the magic 'choose the single language if only a single language is available' instead of making that magic even more magic by ignoring 'test' wikis. If we want to keep the behavior, we should make it explicit, and not depend on 'we have only a single language' (which is basically what M4tx implemented)

Assigning to the patch uploader.

I've thought a bit about what a sensible user interface would look like.

First an assumption: I think our code internally never uses the 'en:wikipedia' -> set family to 'commons' -> 'en:commons' -> 'commons:commons' magic. After all, otherwise wikidata wouldn't work at all. If this is not the case, I think we should make this the dase.

Then the only interface we have is the command line, where one can pass

  1. -family:XX -lang:YY, or
  2. -lang:YY, (implicit family), or
  3. -family:XX (implicit lang)

My suggested behavior would be:

  1. Always explicitly defines YY:XX. If that doesn't exist, we should raise an exception. So:
    • -family:wikidata -lang:wikidata gives wikidata:wikidata, but
    • -family:wikidata -lang:en raises an exception
  2. is (1) where the family is specified in the user-config.py 3. a) if the family file does not specify a default: use the mylang specified in the user-config file, i.e.
    • mylang=en, -family:wiktionary --> en.wiktionary
    • mylang=ru, -family:myrandomwiki (where myrandomwiki does not specify a default, and does not have a ru site) --> error b) if the family file specifies a default, always use that default, so
    • mylang=test, -family:wikidata ->wikidata:wikidata [we can still reach test.wikidata with -family:wikidata -lang:wikidata]

for specifying what the default is, I think m4tx's implementation makes sense.

That makes more sense than the current implementation and would improve @m4tx's implementation. The problem with “what is default?” still remains.

About the first suggested behavior: It's currently not possible to determine where the language is defined from; if it's from the config or command line. And 'mylang=test; -family=wikidata' using 'wikidata:wikidata' could be confusing so maybe this should be highlighted: If a family provides a default and the language is a valid language (but not the default).

Why not use a dict for default sites in user-config like
default_sites = {

'wikipedia': 'de'
'wikisource': 'en'
'wikidata': 'test'
'myownproject': 'klingon'

}
family = 'wikipedia'
mylang = None # maybe obsolete now

-family option will use the default language code except -lang option is given or there is only one langage in that project.
this means

  1. -family overrides config.family
  2. if -lang is given, take it and raise an error if site does not exist, otherwise
  3. if default_sites[family] is given, take that language code and raise an error if site does not exist, otherwise
  4. take mylang (as fallback, maybe deprecated) and raise an error if site does not exist.

This means bot operators may have the choice of the default sites for each project and if not defined an error would be the right hint. But there would be no surprise which site is used anymore.

IMO the BaseSite.init shouldnt be where lang/code is auto-guessed. This should be done in the pywikibot.Site factory function, with some help from the command line arg parsing routines if required.

An approach I have been mulling over is : the default site (URL) for any family is the one which has the same code as the family name. i.e. 'wikidata:wikidata' , 'commons:commons', etc. This is only *necessary* where the family has multiple codes, but it would be good to make that rule universal, which would mean changing the code of some sites, such as the wikitech families only site would be changed from 'en'->'wikitech', and the osm family needs the same change, and lyricwiki could be changed, however there are other languages of this family which are not in the family file, so I'd suggest not touching that one.

Then -family:wikidata (i.e. on the command line) would implicitly be -lang:wikidata also. To use test.wikidata via the command line, it needs to be explicitly mentioned: i.e. -family:wikidata -lang:test .

I like @Xqt's idea of having different default codes for different sites and I agree with @jayvdb that the logic shouldn't be in the BaseSite.__init__.

Change 179599 abandoned by M4tx:
Fix error on multi-lang sites with invalid lang set.

https://gerrit.wikimedia.org/r/179599

Xqt removed m4tx as the assignee of this task.Nov 21 2017, 1:08 PM
Xqt claimed this task.

Is this issue occurs in commons also?

I am getting this error

pywikibot.exceptions.UnknownSite: Language 'en' does not exist in family commons

Aklapper removed Xqt as the assignee of this task.Jun 19 2020, 4:31 PM
Aklapper subscribed.

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)