Page MenuHomePhabricator

Major overhaul for "throttle" action in AbuseFilter
Open, MediumPublic

Description

The "throttle" action has been coded with some poor choices, which are now slowly coming up and producing errors. While all subtasks provide some information about what is wrong, the goal of this task is to coordinate such a delicate overhaul. Below is how I'd envision fixing this up, but any other thought will be highly appreciated.

Proposed workflow

  1. Merge this patch to avoid some tricky problems in new filters
  2. Merge the patch for T203336 to start reducing errors. The included maintenance script (which also covers T203585) has to be checked carefully, since it could potentially change many rows.
  3. Execute the above script in dry mode.
  4. Merge this patch to add it to update.php
  5. Use a new, safer syntax for throttle parameters (T203554) to avoid problems like the one with commas, the one resolved at point 1, and T203535.
  6. As in point 3., apply the devised deprecation process to the code related to old parameters.
  7. Wait a few MW releases (or less, to be decided during point 5), then remove all back-compat code.

Any proposed workflow (included the one above) should be carefully examined to avoid any error at any cost. The code already has enough problems without adding others. All but point 8 should really be addressed ASAP.

As for other subtasks not mentioned above:

  1. T196995 is a minor proposal for improvement, mostly for better clarity
  2. T24623 is a proposed improvement to avoid throttling too easily
  3. T195699 is another throttle problem of which causes haven't been determined, and which could get fixed during the process of this task

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Daimona triaged this task as Unbreak Now! priority.Sep 5 2018, 5:36 PM

Setting the priority to UBN because this action has lots of problems, ranging from no-harm to "work in unpredictable ways".

Daimona added subscribers: matej_suchanek, Huji.

Adding people actively working on AbuseFilter, and Platform Team per maintainers on mediawiki.org.

Legoktm lowered the priority of this task from Unbreak Now! to High.Sep 6 2018, 5:13 AM
Legoktm subscribed.

Setting the priority to UBN because this action has lots of problems, ranging from no-harm to "work in unpredictable ways".

Unless I misunderstood the problem, I don't think this is UBN. UBN means to drop everything else and take care of this first (https://www.mediawiki.org/wiki/Phabricator/Project_management#Priority_levels) - if that's actually the case then please reset the priority.

@Legoktm Well, I read that page on mw.org some time ago, however I can't easily say if the definition applies here. Summing up, this is what we have:

  • When creating new filters, they're stored with ID = 'new'. This could potentially create identical throttle keys, thus randomly throttling people (and thus executing other actions for that filter, which can range from 'nothing' to 'block') (point 1)
  • Some filters where comma was used in throttle groups will lose part of their settings if saved again, without the user noticing. Unless a DBQueryError is thrown. (point 2)
  • Unassessed risks from point 4
  • 700 times per week, a PHP notice is thrown due to badly formatted throttle parameters. This may lead (depends on the single filter) to all the problems from the first two bullets here.

IMHO this is something that we cannot stand any more, and I'd like to have it fixed (i.e. points 1-7) before 1.33. I don't know if that's enough to set everything else aside (for me, it is), but I can say that this is surely the biggest problem we currently have in the AbuseFilter workboard.

@Daimona: "I'd like to have it fixed" is very different from "This must be fixed immediately" (UBN)... :)

@Aklapper of course :-) That's just my assessment, and after all one of the point of discussing here on phab is hearing other thoughts. Again, I can't say if this must be fixed immediately, but if I were to choose a task from AbuseFilter workboard to work on, it would be this one; it may have a strong impact on user experience. Maybe priority is not UBN, but personally I'd really suggest setting other AbuseFilter problems aside, for the moment.

While checking the patch for T203336, I realized something which will force us to rethink this whole process. As you may know, before switching to OOUI, throttle groups could be separated either by commas or linebreaks. The placeholder said

Group throttle by:\n:''(one per line, combine with commas)''

where this "combine" means the following: elements on different lines get joined with ORs, elements on the same line split by a comma get instead joined with ANDs. So, if you specified the following:

user
ip,page
range,page

it would translate in "xxx edits in xxx seconds by: the same user OR (the same IP on the same page) OR (the same IP range on the same page)".
And here comes the problem. As a user, I only use AbuseFilter on itwiki, in italian; as a hacker, I use it on my wiki, where I check the UI with uselang=en, but the main language is once again italian. The message quoted above, however, was poorly translated, so it apparently had a completely different meaning. If I were to translate it back to english, it'd read something like

Grouping criteria for throttle:\n:''(one per line, or joined with commas)''

Having seen the italian message for a long time, I always read it as "whatever you split the groups with, it's the same", and never noticed that the english message tells a completely different story. Also, this feature was intended to be like this, looking at the docs on mediawiki.org, which I erroneously changed after the OOUI switch. This introduces a different kind of problems:

  1. The switch to OOUI changed the textarea to be a checkboxMultiselect, where every group can only be specified once, and where it's not possible to create AND conditions. This translates in having removed a feature, and made several filters behave differently.
  2. The maintenance script for T203336 is now completely wrong and useless, since it would completely remove the old syntax from filters, specifically by converting commas to linebreaks and removing duplicates. The rest of the corrections are still correct.
  3. Methods from T203359 cannot be simplified, because they're intended to work like that.

While some things in this task remain valid (points 1 and 6), we need to address the other differently. Now, the first thing to do is replace the checkboxMultiselect with something allowing joins between groups; at a quick glance, it seems that the best one would be a TagMultiselect, like the one used for change tags. We should ensure that, this time, it'll be backward compatible. Then, the script should be reworked to only perform needed fixes, and the validation for throttle groups should be changed accordingly. Then we may go on with point 6.

Change 459368 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] [WIP] Change throttle selector to restore lost functionality

https://gerrit.wikimedia.org/r/459368

Change 468007 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@REL1_32] Change throttle selector to restore old functionality, overall improvement

https://gerrit.wikimedia.org/r/468007

Change 459368 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Change throttle selector to restore old functionality, overall improvement

https://gerrit.wikimedia.org/r/459368

Change 459245 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Add normalizeThrottleParameters script to update.php

https://gerrit.wikimedia.org/r/459245

Change 468007 merged by Daimona Eaytoy:
[mediawiki/extensions/AbuseFilter@REL1_32] Change throttle selector to restore old functionality, overall improvement

https://gerrit.wikimedia.org/r/468007

@matmarex Not yet, being a parent task I guess we want to wait for all subtasks to be resolved.

This seems to be blocked on T209565: Dry run for normalizeThrottleParameters.php being complete. Or rather, the review necessary for merging the patch that is still open is being provided in the form of doing the dry run.

daniel lowered the priority of this task from High to Medium.Jan 22 2019, 3:41 PM

It seems like the immediate problem is fixed, so lowering prio to normal.

@daniel yes... And then blocked on normalizeThrottleParameters.php being executed (without --dry-run). That alone should resolve most of the subtasks.

daniel renamed this task from Major overhaul for "throttle" action to Major overhaul for "throttle" action in AbuseFilter.Mar 5 2019, 3:22 PM

Change 459245 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Add normalizeThrottleParameters script to update.php

https://gerrit.wikimedia.org/r/459245

Change 459245 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Add normalizeThrottleParameters script to update.php

https://gerrit.wikimedia.org/r/459245

Daimona moved this task from Backlog to Stretch on the AbuseFilter (Overhaul-2020) board.

To be reconsidered after the architecture review.