Page MenuHomePhabricator

Check if filtering is needed in labs for new term store data
Closed, ResolvedPublic

Description

From parent task:

As far as I can see, wb_terms is listed in https://github.com/wikimedia/puppet/blob/production/modules/profile/templates/labs/db/views/maintain-views.yaml#L138, and there does not seem to be any column filtering setup on it in https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/role/files/mariadb/filtered_tables.txt.

This means our new schema tables will as well be configured to copy, without any column filtering, at least for the beginning.

Is that claim true? or do we need to filter anything?

Related Objects

StatusSubtypeAssignedTask
ResolvedAddshore
ResolvedArielGlenn
ResolvedAddshore
ResolvedAddshore
Resolved alaa_wmde
Resolved alaa_wmde
Resolved alaa_wmde
ResolvedNone
DeclinedNone
Declined alaa_wmde
ResolvedLadsgroup
ResolvedAddshore
ResolvedLadsgroup
ResolvedJeroenDeDauw
DeclinedNone
ResolvedNone
ResolvedNone
ResolvedLadsgroup
ResolvedNone
InvalidNone
DeclinedNone
ResolvedLucas_Werkmeister_WMDE
InvalidNone
DeclinedNone
ResolvedLadsgroup
DeclinedNone
ResolvedJeroenDeDauw
DeclinedNone
Resolved alaa_wmde
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedLucas_Werkmeister_WMDE
ResolvedAddshore
ResolvedNone
InvalidNone
Resolved alaa_wmde
ResolvedLadsgroup
InvalidNone
DuplicateLadsgroup
ResolvedLadsgroup
DuplicateLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedAddshore
ResolvedAddshore
ResolvedAddshore
ResolvedLadsgroup
ResolvedAddshore
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedJun 17 2019 alaa_wmde
ResolvedNone

Event Timeline

Given that wb_terms is also fully replicated I don't see any reason to change it now. We remove terms from the term store when the page gets deleted/suppressed etc.

I agree we don’t need to filter anything – terms that are no longer used in a latest revision are cleaned up almost immediately (currently as part of the edit transaction, but will probably move to a post-request handler or a job before production deployment), so if someone puts sensitive information in a term, removing it from the entity should be enough to remove it from the replicas as well, even before the old revision is deleted.

I also don't see any reason to filter this, the current design is mostly equivalent to wb_terms which is also fully available (only public data is present).