Page MenuHomePhabricator

2FA for SSH access to the production cluster
Open, MediumPublic

Description

Experiment with two-factor-authentication for SSH access to the production cluster. Initially with a dedicated bastion host on which 2fa is enabled.

Yubikey tokens (produced by Yubico) are the most suitable selection for an authentication hardware token; they provide an open source-friendly software stack and are a proven solution.

Event Timeline

MoritzMuehlenhoff raised the priority of this task from to Needs Triage.
MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff added a project: SRE.
MoritzMuehlenhoff subscribed.
chasemp set Security to None.

Summarising some bits already mentioned at the offsite along with further tests done and a plan how to move forward (see below).

Objective
The important security property to gain is protection against compromised notebooks; endpoint security is one of the biggest risks for the WMF cluster (especially on Mac OS X). If somone compromises a notebook of a user with cluster access, the attacker can steal the SSH key/passphrase and access the cluster. The hardware design of the authentication tokens ensures that keys cannot be extracted, so in case of a compromised notebook the attacker does not have access to the second factor.

Hardware choice for an authentication token:
Yubikey, as produced by Yubico is the most sensible hardware option:

  • Widely deployed
  • FOSS-friendly, entire software stack is open source and included in Debian (all enduser tools are also available on Mac OS X)
  • Developed by people deeply involved in the FOSS and security communities

Yubikey hardware
Yubikeys are a simple, robust USB form factor. From the operating system point of view they present themselves as a USB keyboard: The Yubikeys have a small touchpad, if that is touched the generated password is emitted via "keyboard input". As a result, all OSes more recent than MS-DOS support it out of the box.

They come in three different variants (princes are single unit prices, volumes are cheaper). All Yubikeys also come in nano variants which are only the size of the actual USB connector, but that doesn't seem desirable in terms of robustness.

  • Yubikey Standard: Only supports the Yubikey OTP protocol (25 dollars)
  • Yubikey Edge: Supports U2F in addition (for Google Apps) (30 dollars)
  • Yubikey Neo: Additional smartcard features and NFC (50 dollars)

Generation of one-time-passwords
Each Yubikey ships with a unique 128 bits AES key. By pressing a pad on the key, a one time password is generated, encrypted with that key consisting of of key id, counters and timers, a random nonce and a checksum (see Section 2 of https://www.yubico.com/wp-content/uploads/2015/03/YubiKeyManual_v3.4.pdf )

The OTP contains of a static 12 character identifier +32 characters OTP.
Representation is done in "modhex" (a reduced set of characters to avoid key scan code ambiguities: This avoids characters which are at different positions on internationalised keyboards)
(see Section 2.1.3 of https://www.yubico.com/wp-content/uploads/2015/03/YubiKeyManual_v3.4.pdf )

Validation of keys

A validation server asserts whether the OTP is valid or not: Possible choices are:

  • Use the YubiCloud service operated by Yubico (which has knowledge of all shipped keys, in factory settings each key has a pre-shipped AES key)
  • Create a custom key and run your own validation server (keys need to be personalised in that procedure). Keys can be managed/store securely by using the YubiHSM hardware security module: It's a USB device which stores the keys and performs all cryptographic operations / OTP validation in hardware, i.e. its hardware design protects against root compromise of authentication server. A YubiHSM device costs 500 dollars and also provides a hardware random number generator available for enhancing the system's randomness.

For securing access to the production cluster, the YubiCloud has a number of inherent drawbacks: Availability of the authentication service is not in our hands (both for general availability and also attackers inhibiting network level access to the YubiCloud. Running a local validation server also provides more fine-grained control over which keys are in use.

For system-level authentication, Yubico provides a PAM module (pam_yubico), which available in jessie and trusty. Users could be mapped to Yubikey token IDs via puppet.

Yubikey Neo PIV mode

The Neo model variant provides smartcard capabilities via the CCID USB interface. One notable interface is the "Personal Identity Verification" (PIV) mode which allows using secure on-smartcard storage of a private key. The key and the cryptographic operations on the keys are exposed via a PKCS11 interface.

As a second authentication factor, access to the PIV key is secured via a PIN (if the PIN is entered incorrectly thrice, access is denied and can only be unlocked with a PUK).

I bought a Yubikey Neo and tried that out, it works very well on Linux using opensc as the interface to the Yubikey CCID and accessing it from the SSH client via PKCS11. Mac OS X is known to be fully supported as well (but haven't tried it myself).

One notable difference is the level of server-side control: When using PIV/SSH login, the storage method of the key cannot be validated on the server side. From the view of the SSH server this is just another authorised key (only securely stored).

Token management procedures

Procurement: Every Yubikey would need to be personalised before being shipped out. Yubikeys could be ordered in quantity by OIT and a batch personalised in advance. Access to the PAM authfile would be managed through ops/puppet.

The Yubikeys have two slots, the first one contains the factory setting YubiCloud credentials. The personalised key would be written to the second slot, i.e. it would still be possible for OIT to use that credential for YubiCloud authentication for external services like Google Apps (since the number of users of these services is far bigger than people with cluster access and these services have external dependencies already).

First time cluster access (e.g.new employees): OIT hands out/ships the Yubikey to the person with cluster access. The token ID becomes part of the initial access request and is registered in puppet by ops.

Replacement of Yubikeys: If a Yubikey is broken or lost, a replacement key would be handed out by OIT for SF employees and shipped to remoties (AFAICT express shipping should reach every employee with 24h). OIT would notify TechOps of the new token ID and they would change the ID in puppet.

Proposed implementation

  1. Procure Yubikey Neo devices for cluster users with cluster-wide root access and setup PIV SSH access for those:

These users are the most critical targets and moving SSH access for these to a secure second factor reduces the attack footprint for a compromised notebook significantly. Also, people will full cluster access already have high responsibility and while the use of PIV storage for that key cannot be enforced, these users are already trusted to handle cluster access carefully.

(The procurement of the Yubikey tokes can also serve as an opportunity to review full cluster root access privileges.)

The Yubikey NEO smartcard mode also provides the means to store PGP keys. This is limited to 2048 bits RSA keys (larger keys are not feasible due to hardware limitations of the Yubikey chip, ECC might be available in the future, but currently not available due to IP reasons).

Hardware budget needed: 24 * 50 dollars if all members of the "ops" group receive a Yubikey Neo -> 1200 dollars. (Plus possible shipping costs)

That can be resolved within a few weeks/this quarter and brings significant security benefits to the most sensitive logins.

  1. Procure a YubiHSM and setup a validation server (low hardware requirements, but cannot be a Ganeti VM due to the YubiHSM USB interface).

Procure an initial set of Yubikey Edge tokens and sort out personalisation / handling procedures for token personalisation with OIT.

Ops users would switch to the pam_yubico authentication scheme employed by non-ops users, the PIV/SSH keys of ops users could be left in place for emergency logins (e.g. in case of problems with the validation server).

Personalise an initial set of Yubikeys from the ops group and an initial batch of other users with cluster access and distribute the tokens to them.

Convert one of the bastion hosts to only allow SSH access using pam_yubico and have run this for a few weeks to check for operational problems (handling problems, hardware problems like the occasional USB power issues observed by Jeff Green in the FR environment).

Hardware budget needed: 500 dollars for YubiHSM and 10 * 30 dollars for an initial set test users with a Yubikey Edge -> 800 dollars Plus a server from spares. (Plus possible shipping costs)

  1. Full rollout, distribute and enable Yubikeys for all users with cluster access and enable it on all the bastion hosts.

Hardware budget needed: approx. another 110 Yubikey Edge -> 3300 dollars (Plus possible shipping costs)

Hardware budget needed: 24 * 50 dollars if all members of the "ops" group receive a Yubikey Neo -> 1200 dollars. (Plus possible shipping costs)

Approved to buy 25 Yubikey NEOs.

Buying a tray of 50 might make sense, depending on practical details.

That can be resolved within a few weeks/this quarter and brings significant security benefits to the most sensitive logins.

  1. Procure a YubiHSM and setup a validation server (low hardware requirements, but cannot be a Ganeti VM due to the YubiHSM USB interface).

Procure an initial set of Yubikey Edge tokens and sort out personalisation / handling procedures for token personalisation with OIT.

Ops users would switch to the pam_yubico authentication scheme employed by non-ops users, the PIV/SSH keys of ops users could be left in place for emergency logins (e.g. in case of problems with the validation server).

Personalise an initial set of Yubikeys from the ops group and an initial batch of other users with cluster access and distribute the tokens to them.

Convert one of the bastion hosts to only allow SSH access using pam_yubico and have run this for a few weeks to check for operational problems (handling problems, hardware problems like the occasional USB power issues observed by Jeff Green in the FR environment).

Hardware budget needed: 500 dollars for YubiHSM and 10 * 30 dollars for an initial set test users with a Yubikey Edge -> 800 dollars Plus a server from spares. (Plus possible shipping costs)

No, spare ("in stock") servers cost the same. :) Spec a server according to need, and I'll approve this.

It seems like https://shop.nitrokey.com/shop is a viable fully free hardware+software alternative to Yubikey NEO with a similar price tag. (Their businesses address is 5 underground stations away from the WMDE office. I own one key from the generation previous to the Nitrokey Pro.) According to documentation the Nitrokey Pro is limited to 4k RSA key sizes as opposed to the 2k of the Yubikey NEO and the Nitrokey Start. I didn't compare the U2F and HSM products from Nitrokey with Yubikeys.

When using PIV/SSH login, the storage method of the key cannot be validated on the server side.

It think it can in the same sense as with the OTP hardware. If one observed the generation of the private key on the hardware fop and remember on the server the associated public key. (If one didn't observe generating the secret on the OTP hardware nor provisioned it oneself then the server can not validate that either.)

Updated implementation details:

Objective

The important security property to gain is protection against compromised notebooks; endpoint security is one of the biggest risks for the WMF cluster (especially on Mac OS X). If somone compromises a notebook of a user with cluster access, the attacker can steal the SSH key/passphrase and access the cluster. The hardware design of the authentication tokens ensures that keys cannot be extracted, so in case of a compromised notebook the attacker does not have access to the second factor.

Hardware choice for an authentication token:
Yubikey, as produced by Yubico is the most sensible hardware option:

  • Widely deployed
  • FOSS-friendly, entire software stack is open source and included in Debian (all enduser tools are also available on Mac OS X)
  • Developed by people deeply involved in the FOSS and security communities

Yubikey hardware

Yubikeys are a simple, robust USB form factor. From the operating system point of view they present themselves as a USB keyboard: The Yubikeys have a small touchpad, if that is touched the generated password is emitted via "keyboard input". As a result, all OSes more recent than MS-DOS support it out of the box.

The latest generation is the Yubikey 4. It's also available in nano variants which are only the size of the actual USB connector, but that doesn't seem desirable in terms of robustness.

Generation of one-time-passwords

Each Yubikey ships with a unique 128 bits AES key. By pressing a pad on the key, a one time password is generated, encrypted with that key consisting of of key id, counters and timers,
a random nonce and a checksum (see Section 2 of https://www.yubico.com/wp-content/uploads/2015/03/YubiKeyManual_v3.4.pdf )

The OTP contains of a static 12 character identifier +32 characters OTP.

Representation is done in "modhex" (a reduced set of characters to avoid key scan code ambiguities: This avoids characters which are at different positions on internationalised keyboards)
(see Section 2.1.3 of https://www.yubico.com/wp-content/uploads/2015/03/YubiKeyManual_v3.4.pdf )

Validation of keys

A validation server asserts whether the OTP is valid or not: Possible choices are:

  • Use the YubiCloud service operated by Yubico (which has knowledge of all shipped keys, in factory settings each key has a pre-shipped AES key)
  • Create a custom key and run your own validation server (keys need to be personalised in that procedure). Keys can be managed/store securely by using the YubiHSM hardware security module: It's a USB device which stores the keys and performs all cryptographic operations / OTP validation in hardware, i.e. its hardware design protects against root compromise of authentication server. A YubiHSM device costs 500 dollars and also provides a hardware random number generator available for enhancing the system's randomness.

For securing access to the production cluster, the YubiCloud has a number of inherent drawbacks: Availability of the authentication service is not in our hands (both for
general availability and also attackers inhibiting network level access to the YubiCloud. Running a local validation server also provides more fine-grained control over
which keys are in use.

We'll be using the HSM approach, see below.

For system-level authentication, Yubico provides a PAM module (pam_yubico), which available in jessie and trusty. Users could be mapped to Yubikey token IDs via puppet.

Token management procedures

Procurement: Every Yubikey would need to be personalised before being shipped out. Yubikeys can be ordered in quantity by OIT and a batch personalised in advance. Access to the PAM authfile would be managed through ops/puppet.

The Yubikeys have two slots, the first one contains the factory setting YubiCloud credentials. The personalised key would be written to the second slot, i.e. it would still be possible for OIT to use that credential for YubiCloud authentication for external services like Google Apps (since the number of users of these services is far bigger than people with cluster access and these services have external dependencies already).

First time cluster access (e.g.new employees): OIT hands out/ships the Yubikey to the person with cluster access. The token ID becomes part of the initial access request and is registered in puppet by ops.

Replacement of Yubikeys: If a Yubikey is broken or lost, a replacement key would be handed out by OIT for SF employees and shipped to remoties (AFAICT express shipping should reach every employee with 24h). OIT would notify TechOps of the new token ID and they would change the ID in puppet.

Implementation of key generation / Yubikey personalisation

Keys are generated on a locked-down Debian notebook without internet access (aka the personalisation site).

With YubiHSMs, the keys are generated on the HSM (which also ensures that quality randomness is used by using the hardware RNG on chip). Each HSM supports different mode flags (e.g. creating/verifying etc) (by default an HSM supports the full range of operations ). To enhance the security of the setup, the HSMs used in the auth servers in codfw and eqiad will be configured to only allow OTP validation, but not key creation. Key generation would only be done on the personalisation notebook/HSM.

There are two methods to store the keys; internally on the HSM or in the form of AEAD blocks (authenticated encryption with associated data). We're using the AEAD storage since we want to use two auth servers/HSMs for reduncancy.

To verify and decrypt an AEAD block, a symmetric key and a random nonce are required. These are stored securely on the YubiHSM devices, i.e. they cannot be read by an attacker who compromised the authentication servers. The encrypted AEAD blocks can thus be stored on the local filesystem of the authentication servers (and replicated between the auth servers)

An application validating an OTP passes the OTP and the AEAD to the YubiHSM, which then decrypts the key, validates it and returns the result along with a timestamp to counter replay attacks. These counters are stored on the validation servers (and also needs to be replicated among them).

These keys are then used to personalise Yubikeys with the created keys (using ykpersonalize).

The keystore on the HSMs is encrypted by a master key (unique per HSM). The keystore needs to be unlocked after each boot using the yhsm-keystore-unlock command.

To validate an OTP, the YubiHSM offers the YSM_AEAD_YUBIKEY_OTP_DECODE command, requiring the following arguments:

  • key handle (an internal identifier for a key and it's associated flags)
  • the AEAD block
  • the public ID
  • the OTP

This runs the following steps and return OK or an error code:

  • The AEAD block is decrypted and the AES key retrieved.
  • The OTP is decrypted using that AES key
  • The private ID is compared to the value from the AEAD block

Changes to the HSM can only be made in configuration mode; it can be enabled by re-inserting the YubiHSM into the USB socket and pressing the button on the HSM for three seconds (i.e. only through physical access). If someone enters the datacentre and steals the HSM (or tries to enter config mode on her own), that doesn't do much harm since the keys are still encrypted using the master key.

Once config mode enabled, the light on the HSM will switch to a slow blinking pace.

Access to the config interface is provided by a USB serial console. To access it minicom can be used:

To run the initial minicom setup, use "minicom -s" and

  • Go to "serial port setup"
  • Set "Serial Device" to /dev/ttyACM0 (double-check with dmesg on the correct device name)
  • Select "Save setup as dfl"

An HSM can always be reset using the "zap" command (but that loses all stored keys).

Setup of the authentication servers:

There will be two authentication servers

  • auth1001 in eqiad
  • auth2001 in codfw

All AEADs are replicated to/between auth1001 and auth2001 using rsync.

Each authentication server runs two low level service

  • yhsm-daemon mediates the concurrent access to the USB serial interface of the YubiHSM
  • yhsm-yubikey-ksm is the YubiHSM backend for the validation service (running the actual validation of the OTP)

In addition each authentication server runs vubikey-val, the higher level validation server (it also takes care of replicating counter values between the auth servers to prevent replay attacks).

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

MoritzMuehlenhoff closed subtask Restricted Task as Declined.Oct 27 2020, 3:45 PM