Page MenuHomePhabricator

Create a Special:GadgetUsage QueryPage to show Gadget usage stats per wiki
Closed, ResolvedPublic5 Estimated Story Points

Description

The Gadgets extension should include a Special page that shows how many users have enabled each gadget on that wiki.

MVP acceptance criteria:

  • A special page is available when the Gadgets extension is installed called Special:GadgetUsage
  • The special page shows a list of all the Gadgets that have been enabled by at least one user, as well as the number of users that have enabled it.
  • Add a feature flag (i.e. global config variable) for enabling or disabling the special page. It should be enabled by default.

For example:

+-------------------------+
| Gadget       | Users    |
+-------------------------+
| UTCLiveClock | 254      |
+-------------------------+

This could be implemented as a QueryPage that looks for entries in the user_properties table that start with 'gadget-'. It would probably be an expensive query (at least on larger wikis), so it should have isExpensive() return true. See /includes/specials/SpecialMediaStatistics.php for example.

In the future, more features could be added, like showing how many recently active users have each gadget enabled. T116894: Add recently active users count to Special:GadgetUsage

Event Timeline

kaldari raised the priority of this task from to Needs Triage.
kaldari updated the task description. (Show Details)
kaldari subscribed.

Keep in mind that some third-party wikis might want to keep the user_properties table fully private.

QueryPage? Isn't that supposed to be used for articles etc? Is there no better way to cache expensive queries?

Looks like Special:MediaStatistics also uses this and works... so maybe it is okay though there should probably be a better implementation for special pages like this.

@Glaisher: Yeah, it does seem like an awkward fit for a Query page, but I'm not aware of any other infrastructure for supporting report-type pages that are expensive to generate (other than writing something from scratch with heavy caching). Any other ideas are welcome.

kaldari triaged this task as Medium priority.Oct 13 2015, 8:03 PM

@NiharikaKohli: Keep in mind that QueryPage is designed for queries that output lists of page titles. You might have to hack around that a bit for this (See SpecialMediaStatistics.php). My best guess is that QueryPage is still going to be the easiest thing to use for generating and caching this. Maybe @Reedy or @aaron would know if there's a better approach though.

Change 247559 had a related patch set uploaded (by Niharika29):
Add a special page to show number of users for each gadget on the wiki

https://gerrit.wikimedia.org/r/247559

@NiharikaKohli: Keep in mind that QueryPage is designed for queries that output lists of page titles. You might have to hack around that a bit for this (See SpecialMediaStatistics.php). My best guess is that QueryPage is still going to be the easiest thing to use for generating and caching this. Maybe @Reedy or @aaron would know if there's a better approach though.

The alternative is to do something closer to what Special:ActiveUsers does. I think query page is easier.

The third alternative is to use the subquery with a limit trick (dbr->selectRowCount()) to count at most 10000 people, and then cache that in memcached. That would make sense if you wanted to put the results on Special:Gadgets, and didn't care about the exact number so much as lots of people vs very few people.

(edit) Fourth alternative is to do something like Beta Features does (Which is?? Seems to have something to do with the job queue. I haven't looked at the code recently)

The query page approach that was chosen sounds fine to me.

@Bawolff: It looks like there are 3 possible values for Gadget preferences: 1, 0, and empty string. I assume 1 means opt-in and 0 means opt-out (of a default gadget), but what does empty string mean? Is this the legacy opt-in value from before there were default gadgets?

When I was testing locally, empty string seemed to mean opt-out of default. Disabling a gadget that isn't default on seemed to result in that key being removed from user_properties table entirely.

Probably these values come from casting a boolean to a string. Maybe older versions of MW cast to integers instead. /me doesn't know. MediaWiki's preference code itself probably works by checking the truthiness of the preference key, so I imagine that both 0 and empty string are probably considered false in terms of should that preference be enabled.

Actually it looks like 1 means opt-in and empty string means opt-out. I'm not sure where 0 comes from, but it appears to be functionally equivalent to empty string from my local tests.

Looks like we need to add WHERE up_value = 1 to our query and also add a note to the page description that says something like "It does not include default gadgets."

Wrote my comments before seeing Bawolff's. Looks like we reached the same conclusions though :)

Ah, looks like it should actually be WHERE up_value = '1' then (1 as a string instead of an integer). I didn't notice the difference from the command line.

WHERE up_value = '1'

SQL would coerce up_value = 1 to a string comparison anyways, so it doesn't make a difference

Doing a where on a (unindexed) blob seems like a big no-no to me especially for a huge table like this. It seems to me that what we could do here is to select the up_value as well and then filter out unneeded values on the PHP side by using that. This would allow us to have another separate table on this page for users who have disabled default gadgets as well. I don't think we can do this with QueryPage so we might have to do something similar to what ActiveUsers does like using a job which is run monthly to update these stats.

To sum it up:
For non-default gadgets, a value of 1/"1" means enabled and 0/"0"/"" means disabled. (Although in my local install, disabling a non-default gadget just removed its entry from the table).
For default gadgets, "0"/0/" " means disabled and there is no entry to indicate if it is enabled. If it was initially enabled and then disabled, the entry is removed from the table, as tested on my local install. But I am not sure if this functionality varies across different wiki versions.

I am not sure how to determine if a gadget is default in the first place. From IRC:

19:16 <Niharika> Hi. How can I tell if a gadget is 'default on' from the database?
19:33 <Krenair> Niharika, I think you either need memcached access or to parse the definitions page?

Aside, I updated the patch to SUM( up_value ) instead of doing a COUNT(*) to account for the entries with up_value = 0.

Okay, I now know how to figure out if a gadget is on by default, thanks to Krenair. Basically, https://phabricator.wikimedia.org/diffusion/EGAD/browse/master/GadgetHooks.php;HEAD$52-66

So does it make sense to add a column to the table stating default on/off status? Will that help with this conundrum?

Change 247559 merged by jenkins-bot:
Add a special page to show number of users for each gadget on the wiki

https://gerrit.wikimedia.org/r/247559

So does it make sense to add a column to the table stating default on/off status? Will that help with this conundrum?

Sounds like a good idea. I'll create a separate task for that. The page is still useful as it is, though, so I'll go ahead and +2 it.

kaldari moved this task from In Development to Q1 2018-19 on the Community-Tech-Sprint board.

Doing a where on a (unindexed) blob seems like a big no-no to me especially for a huge table like this. It seems to me that what we could do here is to select the up_value as well and then filter out unneeded values on the PHP side by using that. This would allow us to have another separate table on this page for users who have disabled default gadgets as well. I don't think we can do this with QueryPage so we might have to do something similar to what ActiveUsers does like using a job which is run monthly to update these stats.

Its fine in the context of a cached query page. Either way the database has to look at the same number of items (The no longer using a covering index probably makes a bit of difference, but probably not huge).

In any case, this would still probably be one of the cheapest "expensive" query pages. Many of them are many orders of magnitude worse.

Should be mentioned in Tech News once the update script is run.

I'm going to re-open for now. As noted by @Glaisher on the patchset, the extension is missing the wgQueryPage hook, so the page won't get updated by updateSpecialPages.php (During testing, its a good idea to set $wgMiserMode = true; so your local wiki more closely mirrors production)

I've also asked some other questions at https://gerrit.wikimedia.org/r/#/c/248814/

Following that line of reasoning, we've thought about abandoning that patch (as it is now) and dropping all default gadgets off the table. Instead it would make more sense to have a separate table for default gadgets and having a column for number of people who have disabled it. Since COUNT() would work better when dealing with default gadgets, it means we would have to run two queries from the same QueryPage. Or we modify the same query to keep COUNT() and SUM() both.

Thoughts?

As I previously stated in T115152#1747837, I think we could have a job for updating these stats. Basically, it would be something like this: Firstly, it would check whether the stats are outdated and if they are outdated, add a job to run the queries (probably 15 or 30 days or sth). One query would retrieve the number of users who have enabled the non-default gadgets. Another one would get the number of users who have disabled the gadget. Then both these would save the results into querycache tables. This would allow us to have two different tables for the statistics. I don't think it would be easy to do this with QueryPage so you could just have a normal special page. ActiveUsers also probably does something similar to this.

Since implementing proper support for default gadgets is going to require some refactoring, I'm going to split that off into a separate card. For now, we should just remove default gadgets from Special:GadgetUsage and add a note that default gadgets are not included. Hopefully that will minimize confusion in the meantime.

In T115152#1757017, @NiharikaKohli wrote:

I've also asked some other questions at https://gerrit.wikimedia.org/r/#/c/248814/

Following that line of reasoning, we've thought about abandoning that patch (as it is now) and dropping all default gadgets off the table. Instead it would make more sense to have a separate table for default gadgets and having a column for number of people who have disabled it. Since COUNT() would work better when dealing with default gadgets, it means we would have to run two queries from the same QueryPage. Or we modify the same query to keep COUNT() and SUM() both.

Thoughts?

Depending on how hacky you want to make it, its definitely possible to do that with one query. Option 1 is to use the sql UNION keyword to combine two queries into 1. Option 2 would be a hacky group by. (Include up_value in the group by, encoding the value in either the fake namespace or the fake page title [also you'd have to coerce alternative ways of displaying false into a canoncial representation])

Original ticket goals met. Page live at: https://en.wikipedia.org/wiki/Special:GadgetUsage (query hasn't finished yet :o).

In T115152#1766717, @NiharikaKohli wrote:

Original ticket goals met. Page live at: https://en.wikipedia.org/wiki/Special:GadgetUsage (query hasn't finished yet :o).

Or started. Its supposed to run once every three days, looks like the hook change wasn't deployed fast enough for it to be hit on enwiki in the last run. Next update (by the time it reaches enwiki) will probably be early morning nov 1. (next special page update will start at aa.wikipedia.org at 5:00 utc oct 31, and probably take about a day to make its way down to enwiki)

I just started running foreachwiki updateSpecialPages.php --only GadgetUsage on terbium. You can see it in action for example at https://az.wikipedia.org/wiki/X%C3%BCsusi:GadgetUsage