Page MenuHomePhabricator

[Session] Managing Wikimedia servers with Puppet
Closed, ResolvedPublic

Description

  • Title of session: Managing Wikimedia servers with Puppet
  • Session description: Brief introduction to Puppet, the main configuration management tool used on the Wikimedia servers, our way of using it and what you need to know about it if you're working on a something that uses it.
  • Username for contact: @taavi
  • Session duration (25 or 50 min):
  • Session type (presentation, workshop, discussion, etc.): presentation
  • Language of session (English, Arabic, etc.): English
  • Prerequisites (some Python, etc.): A basic understanding of Linux is useful but not strictly required.
  • Any other details to share?:
  • Interested? Add your username below:

Notes from session:

Managing Wikimedia servers with Puppet

Date and time: May 4, 2024 14:00

Relevant links

Presenter

[[User:Taavi|Taavi Väänänen]]

Participants

Notes

  • Puppet: tool to manage ~1000 Wikimedia servers across several different environments
  • Agenda: Puppet in general; how it’s used at Wikimedia; and how to make Puppet changes when needed
  • Puppet is managed by WMF SRE team (50+ people)
  • all running on our own hardware, across 7 data centers

What is Puppet?
declarative configuration management tool.
agent figures out facts about a node; puppet server uses a domain-specific language to apply the necessary changes
Puppet’s DSL is inspired by Ruby
a Puppet class can declare some parameters and then have some rules(?)
the parameters allow the same class to be used in different environments (e.g. main and testing environments)
advanced features (not covered here): PuppetDB, External Node Classifier (ENC)

Advanced stuff:

  • PuppetDb (allows sharing facts between nodes)
  • ENC

Puppet at Wikimedia
two main environments: “production” (2000+ physical servers, 250+ VMs) and Cloud VPS (~900 VMs?), both use the same Puppet repo
using Puppet to fully manage Cloud VPS machines is possible, but not recommended
Puppet configuration lives in operations/puppet.git – big monorepo for everything (very powerful, so merge rights are very restricted)
puppet.git is historically a mess of licenses, slowly being migrated to Apache-2.0
puppet.git structure:

  • the base unit in Puppet is the module, which manages an individual technology
  • profiles use resources and modules to manage a specific tech stack
  • roles are the highest level – each node should have exactly one role, which can use one or more profiles

How to make sure Puppet changes work / do the right thing
main tool: pcc (puppet catalog compiler) – compiles the Puppet configuration and shows effective changes
two ways to use pcc: add a Hosts: trailer to the commit message and comment check experimental on Gerrit, or run ./utils/pcc in puppet.git locally
other testing tools: Rspec, Pontoon, dcl [I missed the details]
rspec - allows unit testing for puppet (since puppet config files are based on Ruby)
dcl can create a best-effort copy of a single Production machine (e.g. mw1234)
in production we have a lot of secret data too (e.g. credentials, passwords), they’re stored in one of two mechanisms – simple strings from Hiera via secret(), or in a private repository (in which case you need to add another copy to labs/private.git for Beta – note that labs/private.git isn’t actually private, it contains fake data that looks like the real private data)

Questions

Q: Urbanecm: The diff between profiles and modules
A: modules are used when you want to reuse something (e.g. HAproxy is used for TLS termination and load balancing); if you have specific software that only has one use case, you can put it all in a profile
as soon as you use it in multiple places, put it in a module; otherwise it doesn’t make much difference if it’s a profile or module

Q: Tuukka: Are there simple tutorials on how to use puppet?
A: General recommendation: unless you’re working with SREs, don’t use Wikimedia Puppet, as you would have to bug SREs all the time to merge your changes; either use your own separate Puppet, or don’t use it (for small projects it’s very much overkill)

Q: Urbanecm: Also Cloud VPS: Need to replace a node from time to time (e.g. OS upgrade); what’s the best way to avoid having to rebuild VMs manually?
A: depends on how complicated the setup is; Ansible is one alternative that’s more suitable for smaller setups; you can also work with the WMF SREs but for smaller projects it’s not recommended; recommendation for smaller setups: Ansible, or just simple shell scripts

Q: Urbanecm: Do we have in-house docs on how to start with Ansible?
A: don’t think so, but Ansible is common in the wider industry

Q: Urbanecm: wondering how to integrate Ansible with VMs that are already managed by Puppet
A: [I didn’t catch that answer]

Q: Brennen: We have a standalone puppetserver (project puppet server) in the dev tools project, mainly because we’re mimicking production; is that a recommended pattern in general, or should it be avoided?
A: In your case, mimicking production and testing changes to Puppet, a standalone Puppet server makes sense (also because you’re already working with SREs) – but not recommended for most projects

Event Timeline

Slst2020 awarded a token.
Slst2020 subscribed.

Hello! 👋 The 2024 Hackathon Program is now open for scheduling! If you are still interested in organizing a session, you can claim a slot on a first-come, first-serve basis by adding your session to the daily program, following these instructions. We look forward to hearing your presentation!

I scheduled this for Saturday at 14:00.

taavi renamed this task from [Session] Puppet to [Session] Managing Wikimedia servers with Puppet.Apr 26 2024, 7:51 PM
taavi updated the task description. (Show Details)
debt triaged this task as Medium priority.Apr 26 2024, 7:54 PM