The Scrubber component in Gitpod is a Go library that provides functionality for removing or masking sensitive information from data. It's designed to protect personally identifiable information (PII) and other sensitive data from being exposed in logs, error messages, and other outputs. The component offers various methods for scrubbing different types of data structures, including strings, key-value pairs, JSON, and Go structs.
The primary purposes of the Scrubber component are:
- Remove or mask personally identifiable information (PII) from data
- Protect sensitive information such as passwords, tokens, and secrets
- Provide consistent data sanitization across the Gitpod platform
- Support various data formats and structures
- Enable customizable scrubbing rules
- Reduce the risk of sensitive data exposure
- Comply with privacy regulations and best practices
- Facilitate safe logging and error reporting
The Scrubber component is structured as a Go library with several key parts:
- Core Scrubber Interface: Defines the methods for scrubbing different types of data
- Scrubber Implementation: Provides the actual scrubbing functionality
- Sanitization Functions: Implements different sanitization strategies (redaction, hashing)
- Configuration: Defines what fields and patterns should be scrubbed
- Struct Walking: Uses reflection to traverse and scrub complex data structures
The component is designed to be used by other Gitpod components that need to sanitize data before logging, storing, or transmitting it.
The Scrubber interface provides several methods for scrubbing different types of data:
- Value: Scrubs a single string value using heuristics to detect sensitive data
- KeyValue: Scrubs a key-value pair, using the key as a hint for how to sanitize the value
- JSON: Scrubs a JSON structure, handling nested objects and arrays
- Struct: Scrubs a Go struct in-place, respecting struct tags for customization
- DeepCopyStruct: Creates a scrubbed deep copy of a Go struct
The component implements different sanitization strategies:
- Redaction: Replaces sensitive values with
[redacted]
or[redacted:keyname]
- Hashing: Replaces sensitive values with an MD5 hash (
[redacted:md5:hash:keyname]
) - URL Path Hashing: Specially handles URLs by preserving the structure but hashing path segments
The scrubber is configured with several lists and patterns:
- RedactedFieldNames: Field names whose values should be completely redacted
- HashedFieldNames: Field names whose values should be hashed
- HashedURLPathsFieldNames: Field names containing URLs whose paths should be hashed
- HashedValues: Regular expressions that, when matched, cause values to be hashed
- RedactedValues: Regular expressions that, when matched, cause values to be redacted
When scrubbing structs, the component respects the scrub
struct tag:
scrub:"ignore"
: Skip scrubbing this fieldscrub:"hash"
: Hash this field's valuescrub:"redact"
: Redact this field's value
The component supports a TrustedValue
interface that allows marking specific values to be exempted from scrubbing:
type TrustedValue interface {
IsTrustedValue()
}
// Scrub a single value
scrubbedValue := scrubber.Default.Value("user@example.com")
// Result: "[redacted:md5:hash]" or similar
// Scrub a value with key context
scrubbedValue := scrubber.Default.KeyValue("password", "secret123")
// Result: "[redacted]"
// Scrub a JSON structure
jsonData := []byte(`{"username": "johndoe", "email": "john@example.com"}`)
scrubbedJSON, err := scrubber.Default.JSON(jsonData)
// Result: {"username": "[redacted:md5:hash]", "email": "[redacted]"}
// Scrub a struct in-place
type User struct {
Username string
Email string `scrub:"redact"`
Password string
}
user := User{Username: "johndoe", Email: "john@example.com", Password: "secret123"}
err := scrubber.Default.Struct(&user)
// Result: user.Username is hashed, user.Email is redacted, user.Password is redacted
// Create a scrubbed copy of a struct
type User struct {
Username string
Email string `scrub:"redact"`
Password string
}
user := User{Username: "johndoe", Email: "john@example.com", Password: "secret123"}
scrubbedUser := scrubber.Default.DeepCopyStruct(user).(User)
// Original user is unchanged, scrubbedUser has sanitized values
The Scrubber component integrates with:
- Logging Systems: To sanitize log messages
- Error Handling: To sanitize error messages
- API Responses: To sanitize sensitive data in responses
- Monitoring Systems: To sanitize metrics and traces
- Other Gitpod Components: To provide consistent data sanitization
None specified in the component's build configuration.
github.com/hashicorp/golang-lru
: For caching sanitization decisionsgithub.com/mitchellh/reflectwalk
: For traversing complex data structures
The component implements several security measures:
- Default Deny: Fields are scrubbed by default if they match sensitive patterns
- Multiple Strategies: Different sanitization strategies for different types of data
- Caching: Caches sanitization decisions for performance
- Customization: Allows customization of scrubbing rules
- Trusted Values: Supports marking values as trusted to exempt them from scrubbing
- Common-Go: Uses the Scrubber for logging
- Server: Uses the Scrubber for API request/response sanitization
- Workspace Services: Use the Scrubber to protect workspace data
- Monitoring Components: Use the Scrubber to sanitize metrics and traces