Page MenuHomePhabricator

Roundtable on Listeria enhancements
Closed, ResolvedPublic

Description

The Listeria tool (creating on-wiki tables and lists from Wikidata SPARQL queries) has become one of the most important community tools for creating work lists and managing content areas in the Wikimedia space.

A session or roundtable to discuss enhancements and new features would be useful during the hackathon. Some ideas include:

  • Ability to align content to the top of cells/rows, rather than having content center-aligned
  • Ability to designate a separator for fields that have multiple values, rather than having them run into each other consecutively
  • Support for creating lists based on Structured Data on Commons query
  • Better documentation of best practices
  • Better support for rendering sources in Wikipedia (P248)

Event Timeline

Additionally, we are seeing more frequent "Killed by OS for overloading memory" errors with Listeria, which seems like we are stressing it out.

See also @Multichill comments here:
https://www.wikidata.org/wiki/Topic:W795pi3msyletjnh

Log of Listeriabot:
https://listeria.toolforge.org/botstatus.php

Courtesy ping: @Magnus

@Fuzheado Thanks for proposing! I have some questions: would you be able to moderate this roundtable? Do we need @Magnus ? And in the case he can't join, does it make sense to have this roundtable without him?

It’s probably a good idea to have a specific “round table” if only to gather the various issues. Personally I have found listeria extremely useful both for quick checks and long term control of various projects but the decreasing reliability (which I chalk up to exponential growth, more or less fueled by the ease of edits enabled by listeria) is an issue. That said, observing my own behavior over time, I admit that listeria lists are slowly becoming a place to park my queries and I prefer to run queries directly when I am working on a specific corner of Wikidata. That said, it may be useful to have two types of lists, one for more static, “parked” usage, and one for more active usage. I am thinking of a situation where fewer than 3 manual updates a month might trigger parked status.

Thanks for your reply @Jane023. I guess it would be useful if some notes are taken during the session, so the feedback/suggestions for improvements can be brought to the tool maintainer in a structured form :)

We will then schedule this session in one of the open hacking room. We're looking at Sunday 23rd at 13:00 UTC during 55min. Let me know if this doesn't work for you, we still have room to find another slot.

@Fuzheado Thanks for proposing! I have some questions: would you be able to moderate this roundtable? Do we need @Magnus ? And in the case he can't join, does it make sense to have this roundtable without him?

I'm afraid that the Hackathon weekend is looking increasingly congested with another obligation, so I'm not sure how much I can "host" things but am happy to help in any way I can.

On the top of my mind are some of the things we discussed in Telegram: I have a sinking feeling we have a 90/10 problem with Listeria, mainly because there has been no controlled monitoring or pushback to its growth/use. That is, 90% of the issues we have with load are caused by 10% of the users, who might be using it at scale and might be able to pull back on the frequency of the updates or even eliminate queries that are no longer interesting. To wit:

  • cywiki seems to be using Listeria on a scale 4-10x larger than any other site. This is not to blame them, but is this a possible area that is causing bottlenecks, and might it be time to split out Listeria resources for "wiki-wide" publishing use, versus Listeria being used for personal or maintenance queries.
  • @GerardM has a lot of user subpage Listeria lists on many different wikis. Again, these may be useful and popular pages, but we should probably have a way to audit or measure this. I know for myself, I have some query pages I have completely forgot about that can be outright eliminated, or at least moved to 30 days of frequency.

Thanks for your quick reply. Then I'll put this on hold until someone else volunteer to be the host (in that case, it would mean welcoming participants, facilitating the discussions, and making sure that notes are taken)

Magnus told me at one time that he had a function ready for deployment where the data of Listeria would NOT be on each and every Wikipedia but in a central place. That makes sense because the resulting data is basically the same for every language, the difference is in the labels available for a language. It would reduce the amount of processing a lot.

The listeria lists I have are relevant because they represent for instance the geography of Africa, the politicians of Africa, major awards. Because Listeria stopped being functional, I am waiting for things to resolve. It is not only Listeria that stopped functioning. On "office holders" there is a query in the talk page of the property. It also does no longer update.

Just an observation, I have noticed that we have lost important functionality because of "performance". As it is I am reduced to adding one scientific paper at a time, I am reduced to manually link authors to papers. Never mind the best effort I put in, The quality, our quality suffers.

A few notes:

  • happy to help with any new development, as time allows; prioritized list preferred
  • issue tracker at https://github.com/magnusmanske/listeria_rs but it's a bit messy, partially because no one reads the existing issues...
  • the "killed" errors come from Toolforge killing the processes even though there is enough memory, but they allocate memory in large blocks which is suspicious, apparently. I have an idea how to get around that, but will require a bit of time
  • the "central location" is the Commons Data: namespace. This works ~80-90%, and requires a final push, both in Rust and Lua. See http://magnusmanske.de/wordpress/?p=650

Hello all,
We're still looking for someone who could facilitate this session: make sure that everyone has a chance to speak, take notes, etc.
If no one volunteers for this task in the upcoming days, we will not schedule it during this hackathon. Maybe it can be run another time, for example during Wikimania's hackathon :)

I'll just put this bug here, because I'm not 100% sure I can attend the session: https://github.com/magnusmanske/listeria_rs/issues/67

This is a rather urgent bug we noticed a couple of months ago, and might at this point be related to the 90/10 issue Andrew was discussing in a previous comment.

@Lea_Lacroix_WMDE I could volunteer to facilitate the session, but only if it is scheduled on Friday: unfortunately I will not be able to participate neither on Saturday nor on Sunday :-(.

If we will have this this session, it would be great to know from @Magnus whether there is some kind of output that would be useful for him.

Update: I removed the session from the schedule (it was so far proposed on Sunday).
If anyone is willing to take the ownership of this session, feel free to reschedule it in one of the Hacking Rooms here: https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2021/Schedule -> @Laurentius the floor is yours if you want to schedule it on Friday.

I'll just put this bug here, because I'm not 100% sure I can attend the session: https://github.com/magnusmanske/listeria_rs/issues/67

This is a rather urgent bug we noticed a couple of months ago, and might at this point be related to the 90/10 issue Andrew was discussing in a previous comment.

FWIW this is fixed now.

Thanks for participating in the Wikimedia Hackathon 2021! We hope you had a great time.

  • If this task was being worked on and resolved at the Hackathon: Please change the task status to "resolved" via the Add Action...Change Status dropdown.
  • If this task is still valid and should stay open: Please add another active project tag to this task, , so others can find thise task (as likely nobody in the future will look back at Wikimedia-Hackathon-2021 tasks when trying to find something they are interested in).
  • In case there is nothing else to do for this task, or nobody plans to work on this task anymore: Please set the task status to "declined".

Thank you,
your Hackathon venue housekeeping service

No reply to preview comment; assuming there are no followup actions and closing this task.