Page MenuHomePhabricator

Store expiry date and status of temporary account in database
Closed, DeclinedPublic

Description

Context

What is the issue that we're seeing that is motivating this decision or change?

When we expire temporary accounts, we do this with a maintenance script, ExpireTemporaryAccounts, which checks if now - account creation date > number of configured days for temp account expiry. Then, the script calls AuthManager::revokeAccessForUser and SessionManager::invalidateSessionsForUser.

What we don't do is store in the database when exactly an account was expired, or an indicator that allows us to know that it is expired.

We have a few tasks, however, where it would be useful to have this information:

Proposal

What is the change that we're proposing and/or doing?

Option 1

@Dreamy_Jazz and I discussed reusing user_email_token_expires or user_password_expires to store the timestamp of when a temporary account was expired. (Neither of those fields could ever be used by a temporary account.) Coupled with some helper methods, one would then be able to easily see if a temp account is expired, and the time that it expired. The downside is that we would then be storing data that is unrelated to the original purpose of those fields, potentially resulting in confusion and adding technical debt.

Option 2

Use the logging table and record when an account is expired. Downside is that we are adding another row to the DB for every temporary account. Benefit is that this is a much more logical place to record the information, compared to repurposing an existing column on the user table.

Additional notes:

  • if a temporary account is created via an edit on a wiki, then it visits all our other projects, we'd potentially be creating a log entry on every wiki where this user has an account
    • we could consider only creating a log entry on wikis where the user has made a loggable event (beyond autocreation).

Option 3

Add a new column to the user table to store a timestamp of the actual expiration of a temporary account.

We'd need to do a schema change for this. And we'd also end up adding a column that isn't used by named accounts.

Option 4

Add a column to a CentralAuth global table and use that to denote the expiry of the temp

Option 5

Create a new table specifically for holding data related to temporary accounts (user ID, expiry timestamp).

Consequences

What becomes easier or more difficult to do because of this change?

Option 1

  • We introduce some amount of technical debt / confusion by repurposing an unused column (for temp accounts) on the user table to indicate the expiry date of a temporary account
  • We make it easy to know if and when a temp account is expired, without any joins

Option 2

  • We add a row in the logging table for every temporary account
  • We make it easy to know if and when a temp account is expired, requiring a query to the logging table
  • The same row is written to all wikis where the temporary account exists

Option 3

  • We need to do a schema change
  • We add a column for data that is only going to be used by some user accounts
  • We make it easy to know if and when a temp account is expired, without any joins

Option 4

  • We need to do a schema change
  • We add a column for data that is only going to be used by some user accounts
  • We make it easy to know if and when a temp account is expired, but will require access to a global, shared database when looking up information for temporary account users

Option 5

  • We need to create a new table (globally or local?)
  • We'll need to add a row for each temporary account (globally or local?)
  • We'll need to join with this table when loading user data to be able to say if/when an account is expired

Event Timeline

One point to consider is user_email_token_expires and user_password_expires may be at some point moved to another table, database or even cluster: T352823/T183420. So it may be impossible to join it with other tables. Furthermore, the time temporary account is expired should be considered public, where these two fields are not.

An alternative solution is to store such information in logging table.

One point to consider is user_email_token_expires and user_password_expires may be at some point moved to another table, database or even cluster: T352823/T183420. So it may be impossible to join it with other tables. Furthermore, the time temporary account is expired should be considered public, where these two fields are not.

An alternative solution is to store such information in logging table.

Sure, the logging table is an alternative.

Hi DBA -- requesting input from your team on which of these storage options you'd prefer, for storing the expiry date of a temporary account in the database. The options are in the task description.

Add a column to a CentralAuth global table and use that to denote the expiry of the temp

(1) For wikis not using CentralAuth, we still need an alternative. So we may at some point need to add a column to both local (for consistency of non-CentralAuth wikis) and CentralAuth user tables, where in SUL wiki the local column is never used.
(2) Alternatively (as an Option 5), we can create a new dedicated table for expiry date and status of temporary account. A schema change is still needed but it is easier and table may be a bit smaller.
(3) If we want to lookup expiry date and status of temporary account in any wiki, in Option 2 we can just select one wiki to store expiration information. The more logical one is loginwiki or metawiki. loginwiki obviously has a smaller logging table so querying it may be quicker, but I am not sure the long-term future of loginwiki (it may be completely replaced after T120484: Create password-authentication service for use by CentralAuth). I still prefer Option 2 (in my opinion 2>3,4,5>>>1).

Add a column to a CentralAuth global table and use that to denote the expiry of the temp

(1) For wikis not using CentralAuth, we still need an alternative. So we may at some point need to add a column to both local (for consistency of non-CentralAuth wikis) and CentralAuth user tables, where in SUL wiki the local column is never used.
(2) Alternatively (as an Option 5), we can create a new dedicated table for expiry date and status of temporary account. A schema change is still needed but it is easier and table may be a bit smaller.
(3) If we want to lookup expiry date and status of temporary account in any wiki, in Option 2 we can just select one wiki to store expiration information. The more logical one is loginwiki or metawiki. loginwiki obviously has a smaller logging table so querying it may be quicker, but I am not sure the long-term future of loginwiki (it may be completely replaced after T120484: Create password-authentication service for use by CentralAuth). I still prefer Option 2 (in my opinion 2>3,4,5>>>1).

Thank you, I've updated the description with option 5.

Hi DBA -- requesting input from your team on which of these storage options you'd prefer, for storing the expiry date of a temporary account in the database. The options are in the task description.

I might sound a bit weird but what about not storing anything at all? Just compute if expired using registration timestamp and expiry config on the fly all the time. The registration timestamp should be always already queried and loaded in the user object. That's exactly how autoconfirm right works, it's never stored in the database. I know there might some downsides but given the tickets I saw, none are too hard to implement and regardless you have to query the db anyway (which with this, you wouldn't as it already loads registration date for other reasons).

I tend to agree with Bugreporter that the whole user table needs rethinking and we probably should pick it up with temp accounts potentially increasing its size (even though it's not that big and that's why it's not on my roadmap but it doesn't mean it can't be in some other teams roadmap).

FWIW, user is around 9.4GB in enwiki (which is roughly 0.9% of it). That's quite small but it will grow probably (with temp users).

I suggest just doing it on the fly and if you want to store it for any reason, completely rethink who we are doing user data storage.

I think we would also want to mark a temp account as expired if the user logs out (cc @Niharika). In that case, we need to know the time of when the account expired (or was logged-out from).

You can probably store the logout time in a separate table and still keep the expiry only based on the registration date. It wouldn't cover all cases though. If someone just close a private window, they cookies are flushed but not logged out.

I think we would also want to mark a temp account as expired if the user logs out (cc @Niharika). In that case, we need to know the time of when the account expired (or was logged-out from).

I am okay with us not storing the logout-expiry for now. I have some reservations about the helpfulness of the 'Logout' option for users. I hope to get some answers to that after launch. We could decide whether we need to account for that use case later.

If we disable temporary accounts (setting enabled to false and known to true), and also expire all temporary accounts via the maintenance script, then using account creation time + number of days configured will not accurately show a temp account's expiry. That's probably niche enough that we don't need to care about it. So, I am going to decline this task.

I think we would also want to mark a temp account as expired if the user logs out (cc @Niharika). In that case, we need to know the time of when the account expired (or was logged-out from).

I am okay with us not storing the logout-expiry for now. I have some reservations about the helpfulness of the 'Logout' option for users. I hope to get some answers to that after launch.

@Niharika I believe we'll need some instrumentation for this; I don't think the buttons in the temp accounts toolbar are currently instrumented.

If we disable temporary accounts (setting enabled to false and known to true), and also expire all temporary accounts via the maintenance script, then using account creation time + number of days configured will not accurately show a temp account's expiry. That's probably niche enough that we don't need to care about it. So, I am going to decline this task.

I think we would also want to mark a temp account as expired if the user logs out (cc @Niharika). In that case, we need to know the time of when the account expired (or was logged-out from).

I am okay with us not storing the logout-expiry for now. I have some reservations about the helpfulness of the 'Logout' option for users. I hope to get some answers to that after launch.

@Niharika I believe we'll need some instrumentation for this; I don't think the buttons in the temp accounts toolbar are currently instrumented.

As a note, in this case we can also set lifetime of temporary accounts to zero so all temporary accounts will become expired.