Page MenuHomePhabricator

Log accessing private information by those with 'abusefilter-private' permission
Closed, ResolvedPublic15 Estimated Story Points

Description

We have put in place abuse filters that catch users when they create sleeper accounts. But we cannot conduct CU on those IPs used for actions that trigger these filters, because the filter actually prevent account creation (i.e. the log generated is not replicated in the CheckUser tables). The only way for our CUs to get the private details of the user who triggered these filters is if the local CUs hold the abusefilter-private right.

But that is currently not allowed in WMF, because accessing private information through AbuseFilter is not logged. Therefore, we should modify AbuseFilter to keep a log every time the private information is accessed.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Hi,

I have some comments about the table structure you are proposing on the gerrit changeset, where should we discuss them? Here or on the changeset itself?

Thanks!

On the changeset is probably best. Just pick one of the SQL files and comment on it.

Change 335960 had a related patch set uploaded (by Huji):
Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/AbuseFilter into review/huji/T152934

https://gerrit.wikimedia.org/r/335960

Change 335960 abandoned by Huji:
Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/AbuseFilter into review/huji/T152934

https://gerrit.wikimedia.org/r/335960

Removing DBA, as the latest patch doesn't introduce any schema changes.

MarcoAurelio renamed this task from Log accessing private information by those with `abusefilter-private` permission to Log accessing private information by those with 'abusefilter-private' permission.May 30 2017, 10:20 AM
Huji moved this task from Incoming to Ready on the Security-Team board.
Huji set the point value for this task to 2.
Huji raised the priority of this task from Medium to High.Oct 22 2017, 2:51 AM

@aaron I am escalating the task to High priority, based on the support it just received on CU-l mailing list. Would you mind reviewing the patch one more time please?

Preferably if this affects the main code ideally a site admin who uses it on another site (i.e. not wikipedia or not wanting this log enabled) should be able to toggle a variable that disables the log. However, on the wikimedia network this should ideally be enabled (as per above). Thanks.

@Sau226 fair point, a global variable is now added in PS69 and defaults to true.

What about to separate rights for logs and for view information as in the CU extension? This will make it more configurable.
For example. A trusted user group can be able to see only the log without any risk.

@Melos: Taken care of in PS70.

While I'm testing the patch above I was thinking to create another configuration variable to require a valid reason. Since are private data site admins could decide to require from users a valid reason to see them. So setting a "AbuseFilterLogAccessingPrivateRequestReason": false in extension.json and checking for it in the code should be a good point.

Another point: In Wikimedia sites we could decide to view data only from a X period (for ex 3 months). Since IP are stored in afl_ip field and we can't saefly drop it from the database we should add another time variable for it to compare with afl_timestamp. Then, if filter log timestamp is older than our max time limit, user will be warning with a "too old" message.

While I'm testing the patch above I was thinking to create another configuration variable to require a valid reason. Since are private data site admins could decide to require from users a valid reason to see them. So setting a "AbuseFilterLogAccessingPrivateRequestReason": false in extension.json and checking for it in the code should be a good point.

Done in PS72.

Another point: In Wikimedia sites we could decide to view data only from a X period (for ex 3 months). Since IP are stored in afl_ip field and we can't saefly drop it from the database we should add another time variable for it to compare with afl_timestamp. Then, if filter log timestamp is older than our max time limit, user will be warning with a "too old" message.

Excellent point. Will work on it next. What do you recommend by the way? Just reusing $wgCUDMaxAge from CheckUser when available, or creating another variable specific to AbuseFilter ($wgAbuseFilterPrivateDetailsMaxAge)?

@Melos actually, there already exists $wgAbuseFilterLogIPMaxAge and it is used maintenance/purgeOldLogIPData.php

More importantly, CheckUser's version of this variable, called $wgCUDMaxAge is never used in the code to restrict the output of the query that is shown to the user; it is only used in the maintenance script maintenance/purgeOldData.php and also in CheckUser.hooks.php inside maybePruneIPData() where 1 out of every 50 queries is used to also prune old CU data.

So I think for AbuseFilter, we should just use the same strategy as well: keep the large purges for purgeOldLogIPData.php and not use $wgAbuseFilterLogIPMaxAge in the WHERE clause of the queries. If the user tries an old log entry, they will see a blank instead of the IP, not an error message. Maybe we can also add a separate function similar to maybePruneIPData() in AbuseFilter, but that I think is a separate task.

In short: we already have both a variable for max IP age, and a mechanism for purging it. So I won't work on any of it for this patch/task.

We could check for global $wgCUDMaxAge but if wgAbuseFilterPrivateDetailsMaxAge is setted it override $wgCUDMaxAge so site admins can have more flexibility in configuring AF.

@Melos actually, there already exists $wgAbuseFilterLogIPMaxAge and it is used maintenance/purgeOldLogIPData.php

More importantly, CheckUser's version of this variable, called $wgCUDMaxAge is never used in the code to restrict the output of the query that is shown to the user; it is only used in the maintenance script maintenance/purgeOldData.php and also in CheckUser.hooks.php inside maybePruneIPData() where 1 out of every 50 queries is used to also prune old CU data.

So I think for AbuseFilter, we should just use the same strategy as well: keep the large purges for purgeOldLogIPData.php and not use $wgAbuseFilterLogIPMaxAge in the WHERE clause of the queries. If the user tries an old log entry, they will see a blank instead of the IP, not an error message. Maybe we can also add a separate function similar to maybePruneIPData() in AbuseFilter, but that I think is a separate task.

In short: we already have both a variable for max IP age, and a mechanism for purging it. So I won't work on any of it for this patch/task.

Sure, this is another task.

This comment was removed by Huji.

When 'afl_ip' => '' should we add a message to inform user that Originating IP address was dropped/too old or leave the user to see an empty "Originating IP address" in the results? In every case the try to access private data must be logged if $wgAbuseFilterLogAccessingPrivateDetails = true.

Excellent point. Will work on it next. What do you recommend by the way? Just reusing $wgCUDMaxAge from CheckUser when available, or creating another variable specific to AbuseFilter ($wgAbuseFilterPrivateDetailsMaxAge)?

I do not think that AbuseFilter should have dependencies on CheckUser config variables, since they are separate extensions which may not be installed together.

Excellent point. Will work on it next. What do you recommend by the way? Just reusing $wgCUDMaxAge from CheckUser when available, or creating another variable specific to AbuseFilter ($wgAbuseFilterPrivateDetailsMaxAge)?

I do not think that AbuseFilter should have dependencies on CheckUser config variables, since they are separate extensions which may not be installed together.

@Bawolff see my next comment above. The variable already exists!

When 'afl_ip' => '' should we add a message to inform user that Originating IP address was dropped/too old or leave the user to see an empty "Originating IP address" in the results? In every case the try to access private data must be logged if $wgAbuseFilterLogAccessingPrivateDetails = true.

In PS73, it will show "Not Available" when afl_ip is blank. Submitting PS73 in a moment.

Apologies for the bump, but where-abouts are we on this task? Just so us mere onlookers understand if things are moving forwards, or if there is something blocking 😺

Huji raised the priority of this task from High to Unbreak Now!.Dec 30 2017, 8:49 PM

We have an LTA actively harassing the fawiki community, who could be curbed more easily if this feature was enabled. Elevating the priority for that reason.

We have an LTA actively harassing the fawiki community, who could be curbed more easily if this feature was enabled. Elevating the priority for that reason.

Realistically speaking this is not going to happen as soon as "unbreak now" would apply. At best we are probably talking at least a month if not more. Its possible there are other things we could do to help with the lta.

I will try to take another look at the patch when i get back from winter holidays.

Several other measures are actively taken against this LTA; but this would make things much easier, as stated above. I appreciate your re-review

Aklapper lowered the priority of this task from Unbreak Now! to High.Dec 31 2017, 12:09 PM

We have an LTA actively harassing the fawiki community, who could be curbed more easily if this feature was enabled. Elevating the priority for that reason.

No, as "if this got fixed then we could do XYZ" could be applied to any and every task. See definitions. No major existing functionality got entirely broken here.

Any updates from anyone about this?

@Bawolff any chance you could do another review here? I resolved the merge conflicts.

Change 326465 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Log accessing private information in abuse filter logs

https://gerrit.wikimedia.org/r/326465

Huji removed a project: Patch-For-Review.

Sorry about the slow review...this extension has a bit of an ownership problem, with random people stepping in for CR. I was thinking someone else would have merged this by now.

MarcoAurelio added a subscriber: Jalexander.

@Huji Sorry to reopen. I've been testing this new feature on beta with @Jalexander and while accessing the private data is indeed by request (fill out the form, redirects you to a page with the data), when I go to Special:Log/abusefilterprivatedetails none of the testings I did were logged there. Is this because of some beta limitation or a bug in the logging system? Thanks.

@MarcoAurelio In my local wiki on a VM, I tested this code every time I submitted a new patch and it did log the data appropriately. I have no way to tell if there is a beta limitation involved or not. Do you think I could be temporarily given access to beta to test things out there?

@Huji I'm sure we can arrange that if @Jalexander is okay :)

@Huji I'm sure we can arrange that if @Jalexander is okay :)

👍

@MarcoAurelio In my local wiki on a VM, I tested this code every time I submitted a new patch and it did log the data appropriately. I have no way to tell if there is a beta limitation involved or not. Do you think I could be temporarily given access to beta to test things out there?

You should be able to create an account on https://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page (I may have missed it but don't see one there yet) and Marco or I can get you the rights

You should be able to create an account on https://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page (I may have missed it but don't see one there yet) and Marco or I can get you the rights

Done. Let the testing begin!

@Huji I've created a group for you to test the feature on deploymentwiki. It includes all abusefilter-* rights plus a few others. Let me know if you need more permissions and I'll see what I can do. I hope this is okay @Jalexander and @Legoktm. Regards.

@Huji I've created a group for you to test the feature on deploymentwiki. It includes all abusefilter-* rights plus a few others. Let me know if you need more permissions and I'll see what I can do. I hope this is okay @Jalexander and @Legoktm. Regards.

Sorry, but I think this will not work. The extension CheckUser is not active at the whole beta-cluster for several reasons (since it is not really possible to control who would have access then), so I guess this will affect the private log data as well, and that this info does not get collected?

Sorry, but I think this will not work. The extension CheckUser is not active at the whole beta-cluster for several reasons (since it is not really possible to control who would have access then), so I guess this will affect the private log data as well, and that this info does not get collected?

Not relevant. This patch and what it deals with, has nothing to do with CheckUser. The IP address is stored in the abuse_filter_log table's afl_ip column. And the fact that you accessed such an IP address is stored in the logging table. Both of these are independent of the CheckUser extension.

@MarcoAurelio I think I found the problem. I retrieved the private logs for Special:AbuseLog/23882 (I chose this since it is an edit by an anonymous reason, so fetching the private details won't give me any non-public information). Then I went here here and there was no log. The reason is that, by default, the value for the global variable $wgAbuseFilterPrivateLog is false.

So first, you or @Jalexander or whoever has access to the InitializeSettings.php for the beta cluster should go ahead and flip that switch to true so that accessing private info is logged (I am updating T160357 accordingly). Only then you will see the logs coming in.

Side note: I also pulled the private details for Special:AbuseLog/50 which is from 2012 (again, the log entry is from an anonymous user, so no non-public data was accessed). I would expect *not* to see the IP in the private details, assuming the purge script was run already to remove it; but I did get the IP, indicating that the purge script is not run on beta clusters. We need to make sure it is, both on beta and the prod clusters. Again, I am updating T160357.

I can run the purge script on the beta cluster. I'll get in touch with releng before doing so. Note: after the testing is complete, abusefilter-private and abusefilter-private-log should be unset from the Beta Cluster.

@MarcoAurelio can you please verify that the log works as expected, once https://gerrit.wikimedia.org/r/#/c/409445/ is merged and deployed to the beta cluster? (And if so, close this task at that time)

Also, please revoke my special access on the beta cluster, as it is no longer needed.

@MarcoAurelio can you please verify that the log works as expected, once https://gerrit.wikimedia.org/r/#/c/409445/ is merged and deployed to the beta cluster? (And if so, close this task at that time)

I'll test once merged.

Also, please revoke my special access on the beta cluster, as it is no longer needed.

Done.

Okay so this patch is now merged and it's live on WMF Production. Once @Jalexander or @jrbs confirms that they can access private abusefilter log details and that such checks are logged in the appropriate log, we can move on the parent task.

Huji changed the point value for this task from 2 to 15.

In all fairness, it took a lot of effort from multiple people.

It is my understanding that the log is working both on production and beta cluster.