Page MenuHomePhabricator

Drop index rev_page_id (rev_page, rev_id)
Closed, ResolvedPublic

Description

This index doesn't make sense, and almost everything that uses it is a bug.

The correct ordering of revisions within a page is by timestamp, except where there are duplicate timestamps, in which case rev_id is used to break the tie.

rev_id does not increase monotonically with rev_timestamp, so it's not appropriate to use it for revision ordering within a page. It can be assumed to reflect time only when rev_timestamp matches.

Let's consider the callers.

ApiQueryRevisions has startid/endid parameters. These can be mapped to a (rev_timestamp,rev_id) tuple in the proposed total order before performing the query. If an unknown rev_id is requested as an endpoint, it can fail. Currently startid/endid is broken, since it partitions by rev_id but sorts by rev_timestamp, and so causes queries to take tens of seconds (T163495).

Title::getNextRevisionID() gets the next revision, ordering by ID, getPreviousRevisionID() is the same in the opposite direction. The original use case for these functions were the prev/next links on the old revision page views. The user certainly expects a time-based ordering in that case. The other callers:

  • WatchedItemStore::getNotificationTimestamp() uses it to check if a given rev_id is the latest, which is totally broken. It should use page_latest. This is duplicated in ApiSetNotificationTimestamp and User::clearNotification().
  • Article::view() has a direction parameter, next or prev, which was introduced to avoid the need for an expensive getNextRevisionID() call on old revision page views. As mentioned above, this should use time-based ordering.
  • Article::setOldSubtitle() calls getPreviousRevisionID() to check if the "previous" link needs to be displayed on an old revision view, which rather defeats the efficiency purpose of the direction parameter. rev_parent_id could be used, avoiding the need for a DB query, assuming the column is fully populated.
  • RawAction has a direction parameter analogous to the action=view one, which should function in the same way as action=view.
  • The "undo" feature in EditPage and duplicated in ApiEditPage has a range of revisions specified by the rev_id of the start and end of the range. It uses Revision::getNext() to determine whether the start and end are consecutive revisions. This should use the time-based total order or rev_parent_id.
  • DifferenceEngine has a diff=prev, diff=next parameters and uses getNextRevisionID()/getPreviousRevisionID() to interpret these parameters. Like direction parameters, this would benefit from using a time-based order. HistoryAction::feedItem() uses getPreviousRevisionID() similarly.
  • CategoryMembershipChange uses getPreviousRevisionID() to get the timestamp for sending $lastTimestamp to RecentChange. It should ideally use the pre-update page_latest value like we do on page save.

{T159319} would be fixed by making this proposed change.

Searching for SQL with rev_id inequalities:

  • dumpBackup.php has --revrange, possibly a legitimate use case, but could be answered by scanning the table.
  • compressOld.php uses rev_id<page_latest, it should use rev_id<>page_latest instead.
  • Already using the proposed total order: populateParentId.php, MysqlUpdater::doSchemaRestructuring(), CategoryMembershipChangeJob.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

The original use case for these functions were the prev/next links on the old revision page views. The user certainly expects a time-based ordering in that case.

This behavior not matching expectations is a bug filed as T4930: Edit history is based on timestamp while diff prev/next links are based on revision id (yup, that's old).

Shall we hold: T132416 and T162611 until we decide what to do with rev_page_id index? Before starting to alter the pending hosts in eqiad?

I think you should go ahead with those consistency changes, since it will take a while to get rid of all uses of rev_page_id and to verify that we have done so.

Shall we hold: T132416 and T162611 until we decide what to do with rev_page_id index? Before starting to alter the pending hosts in eqiad?

I think you should go ahead with those consistency changes, since it will take a while to get rid of all uses of rev_page_id and to verify that we have done so.

Ok! Thanks! I will continue then :-)

Change 349501 merged by jenkins-bot:
[mediawiki/core@master] Have Title::get(Next|Previous)RevisionID sort by timestamp

https://gerrit.wikimedia.org/r/349501

Change 349448 merged by jenkins-bot:
[mediawiki/core@master] API: Convert rvstartid/rvendid to timestamps for query

https://gerrit.wikimedia.org/r/349448

Sadly the last change seems to change the behavior of the API in some cases where the timestamps of two revisions are identical and messes with the RevisionSlider :-/. e.g.:

We have the id of a revision, in our example it is the newest revision. We use the API to see if there are any newer revisions so we use action=query&prop=revisions&rvstartid=123&rvdir=newer .

Since there are no newer revisions the API is expected to return only the current revision and this should also work when the revisions share the same timestamp. That worked like a charm before that patch. With the patch we now get two ( ore even more ) revisions as a result when the newest revision ( 123 ) has the same timestamp as the second newest revision ( e.g. 122 ).

I guess this is not intended and it would be really great if it would be fixed before this gets deployed.

Change 352632 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] ApiQueryRevisions: Restore use of rvstartid/rvendid as a tiebreaker

https://gerrit.wikimedia.org/r/352632

Change 352632 merged by jenkins-bot:
[mediawiki/core@master] ApiQueryRevisions: Restore use of rvstartid/rvendid as a tiebreaker

https://gerrit.wikimedia.org/r/352632

Sadly the last change seems to change the behavior of the API in some cases where the timestamps of two revisions are identical and messes with the RevisionSlider :-/. e.g.:

That should be fixed for you now, it will use the specified rev_id to break the tie when there are multiple revisions with the same rev_timestamp.

Note that if revisions somehow get out of order so revision 122 has a later timestamp than 123 it will still return both 123 and 122, but if you see that somehow then something is really screwed up.

That should be fixed for you now, it will use the specified rev_id to break the tie when there are multiple revisions with the same rev_timestamp.

Thanks for addressing the issue so quickly it works as expected again!

Note that if revisions somehow get out of order so revision 122 has a later timestamp than 123 it will still return both 123 and 122, but if you see that somehow then something is really screwed up.

Jepp, and thats how we would want it so everything is fine with that ;-).

Is this index still used? I don't have access to the tables schema_unused_indexes and schema_index_statistics to check it myself.

It does look used:

root@cumin1001:/home/marostegui# ./section s1 | grep eqiad | egrep -v "labsdb|dbstore|db1124|db1154|cloudb*|db1140|db1139|db1133" | while read host port; do echo "$host:$port"; mysql.py -h$host:$port sys -e "select * from schema_unused_indexes where object_name='revision';";done
db1169.eqiad.wmnet:3306
object_schema   object_name     index_name
enwiki  revision        rev_actor_timestamp
db1163.eqiad.wmnet:3306
object_schema   object_name     index_name
enwiki  revision        rev_actor_timestamp
db1135.eqiad.wmnet:3306
object_schema   object_name     index_name
enwiki  revision        rev_actor_timestamp
db1134.eqiad.wmnet:3306
ERROR 2002 (HY000): Can't connect to MySQL server on 'db1134.eqiad.wmnet' (115)
db1119.eqiad.wmnet:3306
object_schema   object_name     index_name
enwiki  revision        rev_actor_timestamp
db1118.eqiad.wmnet:3306
object_schema   object_name     index_name
enwiki  revision        rev_actor_timestamp
db1106.eqiad.wmnet:3306
object_schema   object_name     index_name
enwiki  revision        rev_actor_timestamp
db1105.eqiad.wmnet:3311
object_schema   object_name     index_name
enwiki  revision        rev_actor_timestamp
db1099.eqiad.wmnet:3311
object_schema   object_name     index_name
enwiki  revision        rev_actor_timestamp
db1084.eqiad.wmnet:3306
object_schema   object_name     index_name
enwiki  revision        rev_actor_timestamp
db1083.eqiad.wmnet:3306
object_schema   object_name     index_name
enwiki  revision        rev_actor_timestamp
[email protected][sys]> select * from schema_index_statistics where index_name='rev_page_id';
+--------------+------------+-------------+---------------+----------------+---------------+----------------+--------------+----------------+--------------+----------------+
| table_schema | table_name | index_name  | rows_selected | select_latency | rows_inserted | insert_latency | rows_updated | update_latency | rows_deleted | delete_latency |
+--------------+------------+-------------+---------------+----------------+---------------+----------------+--------------+----------------+--------------+----------------+
| enwiki       | revision   | rev_page_id |     140183155 | 7.22 m         |             0 | 0 ps           |            0 | 0 ps           |            0 | 0 ps           |
+--------------+------------+-------------+---------------+----------------+---------------+----------------+--------------+----------------+--------------+----------------+
1 row in set (0.024 sec)

Macro facepalm:

I check what we can do about it...

One update on this. @Marostegui found that the planner basically picks up the index at random and when ignoring the index, the queries are still fast. We are slowly removing them to see if any issue shows up.

Good reminder that I have to pick up a host and do it!

Mentioned in SAL (#wikimedia-operations) [2021-06-09T04:44:28Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1135 to remove rev_page_id index T163532', diff saved to https://phabricator.wikimedia.org/P16330 and previous config saved to /var/cache/conftool/dbconfig/20210609-044428-marostegui.json

Index dropped on db1135 - let's monitor it for a few days on Tendril to see if we see any increase on latency on queries for revision table:

mysql:root@localhost [enwiki]> alter table revision drop key rev_page_id;
Query OK, 0 rows affected (0.126 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql:root@localhost [enwiki]> show create table revision;
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table    | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| revision | CREATE TABLE `revision` (
  `rev_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `rev_page` int(8) unsigned NOT NULL DEFAULT 0,
  `rev_comment_id` bigint(20) unsigned NOT NULL DEFAULT 0,
  `rev_actor` bigint(20) unsigned NOT NULL DEFAULT 0,
  `rev_timestamp` varbinary(14) NOT NULL DEFAULT '',
  `rev_minor_edit` tinyint(1) unsigned NOT NULL DEFAULT 0,
  `rev_deleted` tinyint(1) unsigned NOT NULL DEFAULT 0,
  `rev_len` int(8) unsigned DEFAULT NULL,
  `rev_parent_id` int(8) unsigned DEFAULT NULL,
  `rev_sha1` varbinary(32) NOT NULL DEFAULT '',
  PRIMARY KEY (`rev_id`),
  KEY `rev_timestamp` (`rev_timestamp`),
  KEY `page_timestamp` (`rev_page`,`rev_timestamp`),
  KEY `rev_actor_timestamp` (`rev_actor`,`rev_timestamp`,`rev_id`),
  KEY `rev_page_actor_timestamp` (`rev_page`,`rev_actor`,`rev_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=1027645497 DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED
Marostegui moved this task from Triage to In progress on the DBA board.

So far, I haven't seen any different query plans between db1135 and the rest of enwiki hosts. Let's give till Monday to decide if we want to deploy this everywhere.
@Ladsgroup if this looks good by Monday, can you create a specific schema change task for this? It is easier for us to track it that way.

Thanks. I will and I will also create the patch for removing it in codebase too.

Dropped also on db1099:3311 so we have a host serving on rc,watchlist,logpager, contributions... with the index dropped, just to check if we have some queries there that might use this.

Removing the DBA tag as we have the above ticket created by @Ladsgroup

Actually I just saw this query:

db1099:3311	WikiExporter::dumpPages	Key 'rev_page_id' doesn't exist in table 'revision' (db1099:3311)	SELECT  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,temp_rev_user.revactor_actor AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model,page_restrictions  FROM `revision` FORCE INDEX (rev_page_id) JOIN `page` ON ((rev_page=page_id)) JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) JOIN `revision_actor_temp` `temp_rev_user` ON ((temp_rev_user.revactor_rev = rev_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = temp_rev_user.revactor_actor)) JOIN `slots` ON ((slot_revision_id = rev_id)) JOIN `content` ON ((content_id = slot_content_id))   WHERE (page_id >= 2942255 AND page_id < 2942817) AND (rev_page>0 OR (rev_page=0 AND rev_id>0))  ORDER BY rev_page ASC,rev_id ASC LIMIT 50000

It is using a FORCE INDEX, so it needs to be removed from code before we can proceed.

Mentioned in SAL (#wikimedia-operations) [2021-06-21T04:39:41Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1099:3311 T163532', diff saved to https://phabricator.wikimedia.org/P16645 and previous config saved to /var/cache/conftool/dbconfig/20210621-043941-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-06-21T04:40:36Z] <marostegui> Re-add rev_page_id to db1099:3311 T163532 T285149

Thanks for adding that back, stub dumps rely on this.

mysql:root@localhost [enwiki]> alter table revision add key `rev_page_id` (`rev_page`,`rev_id`);
Query OK, 0 rows affected (48 min 39.318 sec)       ne
Records: 0  Duplicates: 0  Warnings: 0

mysql:root@localhost [enwiki]>

Mentioned in SAL (#wikimedia-operations) [2021-06-21T06:20:15Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db1135 T163532', diff saved to https://phabricator.wikimedia.org/P16653 and previous config saved to /var/cache/conftool/dbconfig/20210621-062014-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2021-06-21T06:20:43Z] <marostegui> Re-add rev_page_id to db1135 T163532 T285149

db1135 got the index readded:

root@db1135:~# mysql -A enwiki
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 2245966711
Server version: 10.4.18-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql:root@localhost [enwiki]>  alter table revision add key `rev_page_id` (`rev_page`,`rev_id`);
Query OK, 0 rows affected (45 min 38.348 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql:root@localhost [enwiki]>

Change 752619 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] export: Remove ignoring rev_page_id index

https://gerrit.wikimedia.org/r/752619

Change 752619 merged by jenkins-bot:

[mediawiki/core@master] export: Remove ignoring rev_page_id index

https://gerrit.wikimedia.org/r/752619

Change 753069 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.17] export: Remove ignoring rev_page_id index

https://gerrit.wikimedia.org/r/753069

Change 753069 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.17] export: Remove ignoring rev_page_id index

https://gerrit.wikimedia.org/r/753069

Change 753501 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.16] export: Remove ignoring rev_page_id index

https://gerrit.wikimedia.org/r/753501

Change 753501 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.16] export: Remove ignoring rev_page_id index

https://gerrit.wikimedia.org/r/753501

Mentioned in SAL (#wikimedia-operations) [2022-01-13T06:41:24Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.16/includes/export/WikiExporter.php: Backport: [[gerrit:753501|export: Remove ignoring rev_page_id index (T163532)]] (duration: 01m 28s)

Change 757611 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Drop rev_page_id index on revision

https://gerrit.wikimedia.org/r/757611

Change 757611 merged by jenkins-bot:

[mediawiki/core@master] Drop rev_page_id index on revision

https://gerrit.wikimedia.org/r/757611

Ladsgroup moved this task from In progress to Done on the DBA board.

The schema change in itself is now removed from code, the deploying it in production is almost done as well: T285149

This comment was removed by awight.