Tools For Config
Tools For Config
Tools For Config
2
Repository
sysadmin input Translation agent
operator
2. Instance distribution rules: Instance distribution the files itself. Examples of tools that operate on
rules specify the distribution of instances in the net- this level are imaging systems like Partimage [21],
work. We define an instance as a unit of configura- g4u [9] and Norton Ghost [24].
tion specification that can be decomposed in a set of
parameters. Examples of instances are mail servers, Figure 2 shows the six abstraction levels for system
DNS clients, firewalls and web servers. A web configuration, illustrated with an email setup. The illus-
server, for example, has parameters for expressing tration in Figure 2 is derived from an example discussed
its port, virtual hosts and supported scripting lan- in [3]. The different abstraction levels are tied to the con-
guages. In Figure 2, the instance distribution rule text of system configuration. In the context of policy lan-
prescribes the number of mail servers that need to guages, the classification of policy languages at different
be activated in an infrastructure. The need for such levels of abstraction is often done by distinguishing be-
a language is explicited in [3] and [2]. tween high-level and low-level policies [16,25]. The dis-
tinction of what exactly is a high-level and low-level pol-
3. Instance configurations: At the level of instance icy language is rather vague. In many cases, high-level
configurations, each instance is an implementation policies are associated with the level that we call end-to-
independent representation of a configuration. An end requirements, while low-level policies are associated
example of a tool at this level is Firmato [6]. Fir- with the implementation dependent instances level. We
mato allows modeling firewall configurations inde- believe that a classification tied to the context of system
pendent from the implementation software used. configuration gives a better insight in the different ab-
straction levels used by system configuration tools.
4. Implementation dependent instances The level of
In conclusion, a system configuration tool automates
implementation dependent instances specifies the
the deployment of configuration specifications. At the
required configuration in more detail. It describes
level of bit-configurations, deployment is simply copying
the configuration specification in terms of the con-
bit-sequences to disks, while deploying configurations
tents of software configuration files. In the example
specified as end-to-end requirements is a much more
in Figure 2 a sendmail.cf file is used to describe the
complex process.
configuration of mail server instances.
5. Configuration files: At the level of configuration 2.1.3 Modularization mechanisms
files, complete configuration files are mapped on a
device or set of devices. In contrast with the pre- One of the main reason system administrators want to
vious level, this level has no knowledge of the con- automate the configuration of their devices is to avoid
tents of a configuration file. repetitive tasks. Repetitive tasks are not cost efficient.
Moreover, they raise the chances of introducing errors.
6. Bit-configurations: At the level of Bit- Repetitive tasks exist in a computer infrastructure be-
configurations, disk images or diffs between cause there are large parts of the configuration that are
disk images are mapped to a device or set of shared between a subset (or multiple overlapping sub-
devices. This is the lowest level of configuration sets) of devices ( [3]). For example, devices need the
specification. Bit-level specifications have no same DNS client configuration, authentication mecha-
knowledge of the contents of configuration files or nism, shared file systems, . . . A system configuration tool
3
1. End-to-end requirements
Configure enough mail servers to guarantee an SMTP response time of X seconds
2. Instance distribution rules
Configure N suitable machines as a mail server for this cluster
3. Instance configurations
Configure machines X, Y, Z as a mail server
4. Implementation dependent instances
Put these lines in sendmail.cf on machines X, Y, Z
5. Configuration files
Put configuration files on machines
6. Bit-configurations
Copy disk images onto machines
Figure 2: An example of different abstraction levels of configuration specification for an email setup.
that supports the modularization of configuration chunks when the location of a DNS server changes and the re-
reduces repetition in the configuration specification. lation between the DNS server and clients is modeled
In its most basic form, modularization is achieved in the configuration specification, a system configuration
through a grouping mechanism: a device A is declared tool can automatically adapt the client configurations to
to be a member of group X and as a consequence inherits use the new server. Again, modeling relations reduces
all system configuration chunks associated with X. More the possibility of introducing errors in the configuration
advanced mechanisms include query based groups, auto- specification.
matic definition of groups based on environmental data To evaluate how well a tool supports modeling of rela-
of the target device and hierarchical groups. tions, we describe two orthogonal properties of relations:
An additional property of a modularization mecha- their granularity and their arity.
nism is whether it enables third parties to contribute
partial configuration specifications. Third parties can 1. granularity: In Section 2.1.2, we defined an in-
be hardware and software vendors or consultancy firms. stance as a unit of configuration specification that
System administrators can then model their infrastruc- can be decomposed in a set of parameters. Exam-
ture in function of the abstractions provided by the third- ples of instances are mail servers, DNS clients, fire-
party modules and reuse the expertise or rely on support walls and web servers. A web server, for example,
that a third party provides on their configuration mod- has parameters for expressing its port, virtual hosts
ules. and supported scripting languages. Based on this
definition, we can classify relations in three cate-
2.1.4 Modeling of relations gories: (1) relations between instances, (2) relations
between parameters and (3) relations between a pa-
One of the largest contributors to errors and downtime in rameter and an instance.
infrastructures are wrong configurations [19, 20, 22] due
to human error. An error in a configuration is commonly (a) Instance relations represent a coarse grained
caused by an inconsistent configuration. For example, a dependency between instances. Instance de-
DNS service that has been moved to an other server or pendencies can exist between instances on the
moving an entire infrastructure to a new IP range. Ex- same device, or between instances on different
plicitly modeling relations that exist in the network helps devices. An example of the former is the de-
keeping a configuration model consistent. pendency between a DNS server instance and
Modeling relations is, like the modularization prop- the startup system instance on a device: if a
erty of Section 2.1.3, a mechanism for minimizing re- startup system instance is not present on a de-
dundancy in the configuration specification. When rela- vice (for example: /etc/init.d), the DNS server
tions are made explicit, a tool can automatically change instance will not work. An example of depen-
configurations that depend on each other. For example, dencies between instances on different devices
4
is the dependency between DNS servers and Every server needs to be configured redun-
their clients. dantly with a master and a slave server.
(b) Parameter relations represent a dependency
2. generative constraints are expressions that leave
between parameters of instances. An example a degree of freedom between a chunk of config-
of this is a CNAME record in the DNS system:
uration specification and the device on which this
every CNAME record also needs an A record. chunk needs to be applied. Languages without sup-
(c) Parameter - instance relations are used to port for generative constraints need a 1-1 link be-
express a relation between an individual pa- tween a chunk of configuration specification and the
rameter and an instance. For example a mail device on which is needs to be applied. Languages
server depends on the existence of an MX with support for generative constraints leave more
record in the DNS server. degrees of freedom for the tool. An example of a
generative constraint is: One of the machines in
Note that it depends on the abstraction level of a tool
this set of machines needs to be a mail server.
which dependencies it can support. The two low-
est abstraction layers in Figure 2, configuration files
and bit-configurations, have no knowledge of pa- 2.2 Deployment properties
rameters and as a consequence, they can only model
2.2.1 Scalability
instance dependencies.
Large infrastructures are subject to constant change in
2. arity: Relations can range from one-to-one to
their configuration. System configuration tools must deal
many-to-many relationships. A simple one-to-one
with these changes and be able to quickly enforce the
relationship is a middleware platform depending on
configuration specification, even for large infrastructures
a language runtime. A many-to-many relationship
with thousands of nodes, ten thousands of relations and
is for example the relation between all DNS clients
millions of parameters.
and DNS servers in a network. A system configura-
Large infrastructures typically get more benefit of us-
tion tool can also provide support facilities to query
ing a higher level specification (see Figure 2). How-
and navigate relations in the system configuration
ever, the higher-level the specification, the more process-
specification. An example that motivates such facil-
ing power is needed to translate this high level specifi-
ities for navigating and querying relations involves
cation to enforceable specifications on all managed de-
an Internet service. For example, a webservice runs
vices. System configuration tools must find efficient al-
on a machine in the DMZ. This DMZ has a dedi-
gorithms to deal with this problem or restrict the expres-
cated firewall that connects to the Internet through
siveness of the system configuration tool.
an edge router in the network. The webservice con-
figuration has a relation to the host it is running on
and a relation to the Internet. The model also con- 2.2.2 Workflow
tains relations that represent all physical network Workflow management deals with planning and execu-
connections. Using these relations, a firewall spec- tion of (composite) changes in a configuration specifica-
ification should be able to derive firewall rules for tion. Changes can affect services distributed over mul-
the webservice host, the DMZ router and the edge tiple machines and with dependencies on other services
router [6]. [3, 18].
An extra feature is the tools ability to support the One aspect of workflow management is state transfer.
modeling of constraints on relations. We distinguish two The behavior of a service is not only driven by its config-
types of constraints: validation constraints and genera- uration specification, but also by the data it uses. In the
tive constraints. case of a mail server, the data are the mail spool and mail-
boxes, while web pages serve as data for a web server.
1. validation constraints are expressions that need to When upgrading a service or transferring a service to an-
hold true for your configuration. Because of policy other device, one has to take care that the state (collection
or technical factors, the set of allowable values for a of data) remains consistent in the face of changes.
relation can be limited. Constraints allow to express Another aspect of workflow management is the coor-
these limitations. Examples of such limitations are: dination of distributed changes. This has to be done very
carefully as not to disrupt operations of the computing in-
A server can only serve 100 clients. frastructure. A change affecting multiple machines and
Clients can only use the DNS server that is services has to be executed as a single transaction. For
available in their own subnet. example, when moving a DNS server from one device to
5
another, one has to first activate the new server and make shared policy repository for all servers. An-
sure that all clients use the new server before deactivat- other possible realization of this approach is
ing the old server. For some services, characteristics of organizing translation agents hierarchically.
the managed protocol can be taken into account to make
this process easier. For example, the SMTP protocol re- (c) strongly distributed management systems
tries for a finite span of time to deliver a mail when the use a separate translation agent for each man-
first attempt fails. A workflow management protocol can aged device. The difficulty with this ap-
take advantage of this characteristic by allowing the mail proach is enforcing inter-device relations be-
server to be unreachable during the change. cause each device is responsible for translat-
A last aspect of workflow management is non- ing its own configuration specification. As a
technical: if the organizational policy is to use mainte- consequence, devices need to cooperate with
nance windows for critical devices, the tool must under- each other to ensure consistency.
stand that changes to these critical devices can influence
the planning and execution of changes on other devices.
2. push or pull: In all approaches, each managed de-
2.2.3 Deployment architecture vice contains a deployment agent that can be push
or pull based. In the case of a pull based mech-
The typical setup of a system configuration tool is illus- anism, the deployment agent needs to contact the
trated in Figure 1. A system configuration tool starts translation agent to fetch the translated configura-
from a central specification for all managed devices. tions. In a push based mechanism, the translation
Next, it (optionally) processes this specification to device agent contacts the deployment agent. Deployment
profiles and distributes these profiles (or the full spec- agents also have to be authenticated and their capa-
ification) to every managed device. An agent running bilities for fetching policies or configurations have
on the device then enforces the devices profile. For the to be limited. Configurations often contain sensi-
rest of this section, we define the processing step from a tive information like passwords or keys and expos-
central specification to device profiles as the translation ing this information to all deployment agents intro-
agent. The agent running on every device is defined as duces a security risk.
the deployment agent.
System configuration tools differentiate their deploy-
ment architecture along two axises: 1. the architecture of
the translation agent and 2. whether they use pull or push
2.2.4 Platform support
technology to distribute specifications .
1. architecture of translation agent: Possible ap- Modern infrastructures contain a variety of computing
proaches for the architecture of the translation agent platforms: Windows/Unix/Mac OS X servers, but also
can be classified in three categories, based on the desktop machines, laptops, handhelds, smartphones and
number of translation agents compared to the num- network equipment. Even in relatively homogeneous
ber of managed devices: centralized management, environments, we can not assume that all devices run
weakly distributed management and strongly dis- the same operating system: operating systems running
tributed management [15]. on network equipment are fundamentally different than
those running on servers/desktops and smartphones are
(a) centralized management is the central server yet another category of operating systems.
approach with only one translation agent. Good platform support or interaction with other tools
When dealing with huge networks, the central is essential for reducing duplication in the configuration
server quickly becomes a bottleneck. This is specification. Indeed, many relations exist between de-
certainly the case when a system configuration vices running different operating systems. For example:
tool uses a high-level abstraction, as the algo- a server running Unix and a router/firewall running Cisco
rithm for computing a devices configuration IOS. If different tools are used to manage the server and
will become complex. router, relations between the router and server need to
(b) weakly distributed management is an ap- be duplicated in both tools which in turn introduces con-
proach where multiple translation agents are sistency problems if one of the relations changes. An
present in the network. This approach can example of such a relation is the the firewall rule on a
be realized for many centralized management Cisco router that opens port 25 and the SMTP service on
tools by replicating the server and providing a a Unix server.
6
2.3 Specification management properties of the hooks most generic version control systems pro-
vide.
2.3.1 Usability
We identify three features concerning usability of a sys- 2.3.3 Specification documentation
tem configuration tool: 1. ease of use of the language,
Usability studies [4, 12] show that a lot of time of a sys-
2. support for testing specifications and, 3. monitoring
tem administrator is spent on communication with other
the infrastructure.
system administrators. These studies also show that a
1. ease of use of the language: The target audience lot of time is lost because of miscommunication, where
of a system configuration tool are system adminis- discussions and solutions are based on wrong assump-
trators. The language of the system configuration tions. A system configuration tool that supports struc-
tool should be powerful enough to replace their ex- tured documentation can generate documentation from
isting tools, which are mostly custom tools. But it the system configuration specification itself and thus re-
should also be easy enough to use, so the average move the need to keep the documentation in sync with
system administrator is able to use it. Good system the real specification.
administrators with a good education [13] are al-
ready scarce, so a system configuration tool should 2.3.4 Integration with environment
not require even higher education.
The infrastructure that is managed by the system con-
2. support for testing specifications: To understand figuration tool is not an island: it is connected to other
the impact of a change in the specification, the sys- networks, is in constant use and requires data from
tem configuration tool can provide support for test- other sources than the system configuration specifica-
ing specifications through something as trivial as a tion to operate correctly. As a consequence, a sys-
dry-run mode or more complex mechanisms like the tem administrator may need information from external
possibility to replicate parts of the production in- databases in its configuration specification (think LDAP
frastructure in a (virtualized) testing infrastructure for users/groups) or information about the run-time char-
and testing the changes in that testing infrastructure acteristics of the managed nodes. A system configuration
first [5]. tool that leverages on these existing sources of informa-
tion integrates better with the environment in which it is
3. monitoring the infrastructure: A system config- operating because it does not require all existing infor-
uration tool can provide an integrated (graphical) mation to be duplicated in the tool.
monitoring system and/or define a (language-based)
interface for other tools to check the state of an 2.3.5 Conflict management
infrastructure. A language-based interface has the
advantage that multiple monitoring systems can be A configuration specification can contain conflicting def-
connected with the system configuration tool. A initions, so a system configuration tool should have a
monitoring system enables the user to check the cur- mechanism to deal with conflicts. Despite the presence
rent state of the infrastructure and the delta with the of modularization mechanisms and relations modeling,
configuration specification. a configuration specification can still contain errors, be-
cause it is written by a human. In case of such an error,
a conflict is generated. We distinguish two types of con-
2.3.2 Versioning support
flicts: application specific conflicts and contradictions in
Some system configuration tools store their specification the configuration specification, also called modality con-
in text files. For those tools, a system configuration spec- flicts [14].
ification is essentially code. As a consequence, the same 1. application specific conflicts: An example of an
reasoning to use a version control system for source code
application specific conflict is the specification of
applies. It enables developers and system administrators two Internet services that use the same TCP port. In
to document their changes and track them through his-
general, application specific conflicts can not be de-
tory. In a configuration model this configuration history
tected in the configuration specification. Examples
can also be used to rollback configuration changes and it of research on application specific protocols can be
makes sure an audit trail of changes exists.
found in [10] and [7], where conflict management
The system configuration tool can opt to implement for IPSec and QoS policies is described.
versioning of configuration specification using a custom
mechanism or, when the specification is in text files, 2. modality conflicts: An example of a modality con-
reuse an external version control system and make use flict is the prohibition and obligation to enable an
7
instance (for example a mail server) on a device. In that users of the logging group should only set parame-
general, modality conflicts can be detected in the ters of object from types in the logging namespace. With
configuration specifications. path-based access control this becomes: users of group
logging should only access files in the /config/logging
When a configuration specification contains rules that directory. The latter assumes that every system admin-
cause a conflict, this conflict should be detected and acted istrator uses the correct files to store configuration speci-
upon. fications.
8
Tool Version The open-source tools focus on command-line interface
BCFG2 1.0.1 while the commercial tools also provide a graphical in-
Cfengine 3 3.0.4 terfaces. Tools such as Cfengine, Chef and Puppet pro-
Opscode Chef 0.8.8 vide a web-interface that allows to manage some aspects
Puppet 0.25 with a graphical interface. In the commercial tools all
LCFG 20100503 management is done through coommand-line and graph-
BMC Bladelogic Server Automation 8 ical interfaces.
Suite
CA Network and Systems Manage- R11.x 3.1.2 Abstraction mechanisms
ment (NSM)
IBM Tivoli System Automation for 4.3.1 3.1.3 Modularization mechanisms
Multiplatforms
Type of grouping All tools provide a grouping mech-
Microsoft Server Center Configuration 2007 R2 anism for managed devices or resources. HP Server Au-
Manager (SCCM) tomation, Tivoli and Netomata only provide static group-
HP Server Automation System 2010/08/12 ing. CA NSM and BCFG allow static grouping and
Netomata Config Generator 0.9.1 hierarchies of groups. LCFG supports limited static,
hierarchical and query based grouping through the C-
Table 1: Version numbers of the set of evaluated tools.
preprocessor. Bladelogic supports static, hierarchical
and query based groups. Cfengine and Puppet use the
on market research reports [8, 11] and consists of BMC concept of classes to group configuration. Classes can
Bladelogic Server Automation Suite, Computer Asso- include other classes to create hierarchies. Cfengine can
ciates Network and Systems Management, IBM Tivoli assign classes statically or conditionally using expres-
System Automation for Multiplatforms, Microsoft Sys- sions. Puppet can assign classes dynamically using ex-
tem Center Configuration Manager and HP Server Au- ternal tools. Chef and MS SCCM can define static groups
tomation System. For the open-source tools we selected and groups based on queries.
a set of tools that were most prominently present in dis-
cussions at the previous LISA edition and referenced Configuration modules BCFG, HP Server Automa-
in publications. This set of tools consists of BCFG2, tion, MS SCCM and Netomata have no support for
Cfengine3, Chef, Netomata, Puppet and LCFG. configuration modules. Bladelogic can parametrise re-
Due to space constraints we limit the results of our sources based on node characteristics to enable reuse.
evaluation to a summary of our findings for each prop- Tivoli includes sets of predefined policies that can be
erty. The full evaluation of each tool is available on our used to manage IBM products and SAP. LCFG can use
website at http://distrinet.cs.kuleuven. third party components that offer a key-value interface
be/software/sysconfigtools. We intend to to other policies, CA NSM provides a similar approach
keep the evaluations on this website in sync with ma- for third party agents that manage a device or subsystem.
jor updates of each tool. For this paper we based our Cfengine uses bundles, Chef uses cookbooks and Puppet
evaluation on the versions of each tool listed in Table 1. uses modules to distribute a reusable configuration spec-
ification for managing certain subsystems or devices.
3.1 Specification properties
3.1.4 Modeling of relations
3.1.1 Specification paradigm
BCFG, CA NSM, HP Server Automation and MS SCCM
Language type Cfengine, Puppet, Tivoli, Netomata
have no support for modeling relations in a configura-
and Bladelogic use a declarative DSL for their input
tion specification. Bladelogic can model one-to-one de-
specification. BCFG2 uses a declarative XML specifi-
pendencies between scripts that need to be executed as a
cation. Chef on the other hand uses an imperative ruby
prerequisite, these are instance relations. Cfengine sup-
DSL. LCFG uses a DSL that instantiates components and
ports one-to-one, one-to-many and many-to-many rela-
set parameters on them. CA NSM, HP Server Automa-
tions between instances, parameters and between param-
tion and MS SCCM are like LCFG limited to setting pa-
eters and instances. On these relations generative con-
rameters on their primitives.
straints can be expressed. Chef can express many-to-
many dependency relations between instances. Tivoli
User interface As with the language type, the tools can also express relations of all arities between instances
can be grouped in open-source and commercial tools. and parameters and just like Cfengine express generative
9
constraints. LCFG can express one-to-one and many-to- Tool Platform support
many relations using spanning maps and references be- BCFG2 *BSD, AIX, Linux, Mac OS
tween instances and parameters. Netomata can model X and Solaris
one-to-one network links and relations between devices. Cfengine 3 *BSD, AIX, HP-UX, Linux,
Finally Puppet can define one-to-many dependency rela- Mac OS X, Solaris and Win-
tions between instances. The virtual resource functional- dows
ity can also be used to define one-to-many relations be- Opscode Chef *BSD, Linux, Mac OS X, So-
tween all instances. laris and Windows
Puppet *BSD, AIX, Linux, Mac OS
X, Solaris
3.2 Deployment properties LCFG Linux (Scientific Linux)
3.2.1 Scalability BMC Bladelogic AIX, HP-UX, Linux, Net-
Server Automation work equipment, Solaris and
The only method to evaluate how well a tool scales is to Suite Windows
test each tool in a deployment and scale the number of CA Network and AIX, HP-UX, Linux, Mac
managed nodes. In this evaluation we did not do this. Systems Manage- OS X, Network equipment,
To have an indication of the scalability we searched for ment (NSM) Solaris and Windows
cases of real-life deployments and divided the tools in IBM Tivoli System AIX, Linux, Solaris and Win-
three groups based on the number of managed devices Automation for Mul- dows
and a group of tools for which no deployment informa- tiplatforms
tion was available. Microsoft Server Windows
Center Configuration
less than 1000 BCFG2 Manager (SCCM)
HP Server Automa- AIX, HP-UX, Linux, Net-
between 1000 and 10k LCFG and Puppet tion System work equipment, Solaris and
Windows
more than 10k Bladelogic and Cfengine, Netomata Config Network equipment
Generator
unknown CA NSM, Chef, HP Server Automation,
Tivoli, MS SCCM and Netomata, Table 2: Version information for the set of evaluated
tools.
3.2.2 Workflow
Distribution mechanism The deployment agent of
BMC Bladelogic and HP Server Automation integrate BCFG2, Cfengine, Chef, LCFG, MS SCCM and Puppet
with an orchestration tool to support coordination of dis- pull their specification from the central server. Bladel-
tributed changes. Cfengine and Tivoli can coordinate ogic, CA NSM, HP Server Automation and Tivoli push
distributed changes as well. MS SCCM and CA NSM the specification to the deployment agents. The central
support maintenance windows. Distributed changes in servers of Chef, MS SCCM and Puppet can notify the de-
Puppet can be sequenced by exporting and collecting re- ployment agents that a new specification can be pulled.
sources between managed devices. BCFG2, LCFG, Chef Netomata relies on external tools for distribution.
and Netomata have no support for workflow.
3.2.4 Platform support
3.2.3 Deployment architecture The platforms that each tool supports is listed in Table 2.
Translation agent Cfengine uses a strongly distributed
architecture where the emphasis is on the agents that run 3.3 Specification management properties
on each managed device. The central server is only used 3.3.1 Usability
for coordination and for policy distribution. Bladelogic,
CA NSM and MS SCCM use one or more central servers. Usability Usability is a very hard property to quantify.
BCFG2, Chef, HP Server Automation, Tivoli, Netomata We categorised the tools in easy, medium and hard. We
and Puppet use a central server. Chef and Puppet can determined this be assessing how easy a new user would
also work in a standalone mode without central server to be able to use and learn a tool. We tried to be as ob-
deploy a local specification. jective as possible to determine this but this part of the
10
evaluation is subjective. We found Bladelogic, CA NSM, 3.3.4 Integration with environment
HP Server Automation, Tivoli and MSCCM easy to start
BCFG2, Cfengine, Chef, Tivoli, LCFG, MS SCCM and
using. The usability of Cfengine, LCFG and Puppet is
Puppet can discover runtime characteristics of managed
medium, partially because of the custom syntax. Pup-
devices which can be used when the profiles of each de-
pet also has a lot of confusing terminology but tools such
vice are generated. Bladelogic can interact with external
as puppetdoc and puppetca make up for it so we did not
data sources like Active Directory.
classify it as hard to use. We found BCFG2 hard to use
because of the XML input and the specification is dis-
tributed in a lot of different directories because of their 3.3.5 Conflict management
plugin system. Chef is also hard to use because of its syn- BCFG and Puppet can detect modality conflict such as
tax and the use of a lot of custom terminology. Netomata a file managed twice in a specification. Cfengine3 also
is also hard to use because of its very concise syntax but detects modality conflicts such as an instable configura-
powerful language. tion that does not converge. Bladelogic and CA NSM
have no conflict management support. Puppet also sup-
Support for testing specifications BCFG2, Cfengine, ports modality conflicts by allowing certain parameters
LCFG and Puppet have a dry run mode. Netomata is in- of resources to be unique within a device, for example
herently dry-run because it has no deployment part. Chef the filename of file resources.
and Puppet support multiple environments such as test-
ing, staging and production. 3.3.6 Workflow enforcement
None of the evaluated tools have integrated support for
Monitoring the infrastructure BCFG2, Bladelogic, enforcing workflows on specification updates. Bladel-
HP Server Automation, CA NSM, Tivoli, LCFG, Pup- ogic can tie in a change management system that defines
pet and MS SCCM have various degrees of support for workflows.
reporting about the deployment and collecting metrics
from the managed devices. The commercial tools have 3.3.7 Access control
more extensive support for this. Chef, LCFG, Puppet
The tool that support external version repositories can
and Netomata can automatically generate the configura-
reuse the path based access control of that repository.
tion for monitoring systems such as Nagios.
BMC, CA NSM, HP Server Automation, Tivoli, MS
SCCM and the commercial version of Chef allow fine
3.3.2 Versioning support grained access control on resources in the specifica-
tion.
BCFG2, Bladelogic, Cfengine, Chef, Tivoli, LCFG, Ne-
tomata and Puppet use a textual input to create their con-
figuration specification. This textual input can be man-
3.4 Support
aged in an external repository such as subversion or git. 3.4.1 Available documentation
CA NSM and MS SCCM have internal support for policy
versions. The central Chef server also maintains cook- Bladelogic, CA NSM and HP Server Automation pro-
book version information. For HP Server Automation it vide no public documentation. IBM Tivoli provides
is unclear what is supported. extensive documentation in their evaluation download.
BCFG2, Cfengine, Chef, LCFG, MS SCCM and Puppet
all provide extensive reference documentation, tutorials
3.3.3 Specification documentation and examples on their websites. Netomata provides lim-
ited examples and documentation on their website and
BCFG2, Bladelogic, Chef, HP Server Automation, Wiki.
Tivoli, LCFG, Netomata and Puppet specifications can
include free form comments. Cfengine can include struc-
3.4.2 Commercial support
tured comments that are used to generate documentation.
Because Chef uses a Ruby DSL, Rdoc can also be used Not very surprising the commercial tools all provide
to generated documentation from structured comments. commercial support. But most open-source tools also
Puppet can generate reference documentation for built- have a company behind them that develops the tool and
in types from the comments included in the source code. provides commercial support. LCFG and BCFG2 have
No documentation support is available in CA NSM and both been developed in academic institutes and have no
MS SCCM. commercial support.
11
3.4.3 Community tact us for an account on the website so that you can add
your evaluated tool.
Cfengine, Chef, Tivoli, MS SCCM and Puppet have large
and active communities. BCFG2 has a small but active
community. CA NSM has a community but it is very 5 Areas for improvement
scattered. BMC, Netomata and LCFG have small and
not very active communities. For HP Server Automation Based on our evaluations in Section 3, we identify six
we were unable to determine if a community exists. areas for improvement in the current generation of tools.
We believe that tools who address these areas will have
a significant competitive advantage over other tools. The
3.4.4 Maturity areas are:
Some of the evaluated tools such as Tivoli and CA NSM 1. Create better abstractions: Very few tools support
are based on tools that exist for more than ten years, creating higher-level abstractions like those men-
while other tools such as Chef and Netomata are as tioned in Figure 2 on page 4. If they do, those
young as two years. However no relation between the capabilities are hidden deep in the tools documen-
feature set of a tool and their maturity seems to exist. tation and not used often. We believe this is a
missed opportunity. Creating higher-level abstrac-
4 Putting the framework to use tions would enable reuse of configuration specifica-
tions and lower the TCO of a computer infrastruc-
4.1 How do I choose a tool for my environ- ture. To realize this, the language needs to (a) sup-
port primitives that promote reuse of configuration
ment?
specifications like parametrization and modulariza-
Our framework and tool evaluations can help you to tion primitives, (b) support constraints modeling
quickly trim down the list of tools to the tools that match and enforcement, (c) deal with conflicts in the con-
your requirements. You list your required features, see figuration specification and (d) model and enforce
which tools support these features and you have a lim- relations.
ited list of tools to continue evaluating. In fact, our
2. Adapt to the target audiences processes: A tool
website at http://distrinet.cs.kuleuven.
that adapts to the processes for system administra-
be/software/sysconfigtools provides a handy
tion that exist in an organization is much more intu-
wizard to help you with this process.
itive to work with than a tool that imposes its own
The limitation of our framework is that it can not cap-
processes on a system administrators. A few ex-
ture all factors that influence the process for choosing
amples of how tools could support the existing pro-
a system configuration tool: 1. We limit our evaluation
cesses better:
to system configuration and do not include adjacent pro-
cesses like provisioning, 2. Politics often play an impor- structured documentation and knowledge
tant role when deciding on a tool, 3. your ideal solution management: Cfengine3 is the only tool in our
might be too pricey, or 4. other, more subjective, factors study that supports structured documentation
come into play. in the input specification and has a knowledge
For all these reasons, we see our framework more as an management system that uses this structured
aid that can quickly give you a high-level overview of the documentation. Yet, almost all system admin-
features of the most popular tools. Based on our frame- istrators document their configurations. Some
work, you can decide which tools deserve more time in- do it in comments in the configuration specifi-
vestment in your selection process. cation, some do it in separate files or in a fully-
fledged content management system. In all
cases, documentation needs to be kept in sync
4.2 How do I evaluate another tool using
with the specification. If you add structured
this framework? documentation to the configuration specifica-
We welcome clarifications to our existing evaluations tion, the tool can generate the documentation
and are happy to add other tool evaluations on the web- automatically.
site. Internally, the website defines our framework as integrate with version control systems: A lot
a taxonomy and every property is a term in this taxon- of system administrator teams use a version
omy. We associated a description with every term which control system to manage their input specifica-
should allow you to asses whether the property is sup- tion. It allows them to quickly rollback a con-
ported by the tool you want to evaluate. Feel free to con- figuration and to see who made what changes.
12
Yet, very few tools provide real integration tops and servers, dependencies between your fire-
with those version control systems. A tool wall and your DMZ servers, . . . . The current gen-
could quickly set up a virtualized test infras- eration of tools either focuses on a single platform
tructure for a branch that I created in my con- (Windows or Unix), focuses on one type of devices
figuration. I would be able to test my config- (servers) or needs different products with different
uration changes before I merge them with the interfaces for your devices (one product for network
main branch in the version control system that equipment, one for servers and one for desktops).
gets deployed on my real infrastructure.
4. Become more declarative: The commercial tools
semantic access controls: In a team of system
in our study all start from scripting functional-
administrators, every admin has his own ex-
ity: the system administrator can create or reuse
pertise: some are expert in managing network-
a set of scripts and the tool provides a script-
ing equipment, other know everything from
management layer. Research and experience with
the desktop environment the company sup-
many open-source tools has shown that declarative
ports, others from the web application plat-
specifications are far more robust than the tradi-
form, . . . . As a consequence, responsibilities
tional paradigm of imperative scripting. Imperative
are assigned based on expertise and this ex-
scripts have to deal with all possible states to be-
pertise does not always aligns with machine
come robust which results in a lot of if-else state-
boundaries. The ability to specify and en-
ments and spaghetti-code.
force these domains of responsibility will pre-
vent that for example a system administrator 5. Take the CIOs agenda into account: Most open-
responsible for the web application platform source tools in our study have their origin in
modifies the mail infrastructure setup.
academia. As a result, they lag behind on the fea-
flexible workflow support: Web content man- tures that are on the CIOs checklists when decid-
agement systems like Drupal have support for ing on a system configuration tool: (a) easy to use
customized workflows: If a junior editor sub- (graphical) user interface, reporting, (b) auditing,
mits an article, it needs to be reviewed by two compliance, reporting capabilities in nice graphs
senior editors, all articles need to be reviewed and (c) access control support.
by one of the senior editors, . . . . The same
type of workflows exist in computer infras- 6. Know that a system is software + configuration +
tructures: junior system administrators need data: No tool has support for the data that is on the
the approval from a senior to roll out a change, managed machines. Take a web server as example:
all changes in the DMZ needs to be approved the web server is software, that needs configuration
by one of the managers and a senior system files and serves data. System configuration tools can
administrator, . . . . Enforcing such workflows manage the software and configuration but have no
would lower the number of accidental errors support for state transfer: if my tool moves the web
that are introduced in the configuration and server to another node, I need to move the data man-
aligns the tools operation with the existing ually.
processes in the organization.
13
versions of tools are released and are open for adding [12] H ABER , E. M., AND BAILEY, J. Design guidelines for system
new tool evaluations to our website. administration tools developed through ethnographic field stud-
ies. In CHIMIT 07: Proceedings of the 2007 symposium on
Computer human interaction for the management of information
7 Acknowledgements technology (New York, NY, USA, 2007), ACM, ACM, p. 1.
[13] H REBEC , D. G., AND S TIBER , M. A survey of system admin-
We would like to thank our shepherd, Eser Kandogan istrator mental models and situation awareness. In SIGCPR 01:
Proceedings of the 2001 ACM SIGCPR conference on Computer
for his comments on the draft versions of the paper and personnel research (New York, NY, USA, 2001), ACM, ACM,
Mark Burgess for sharing his insights in the constraints- pp. 166172.
part of our framework. We would also like thank our [14] L UPU , E., AND S LOMAN , M. Conflict analysis for management
anonymous reviewers and all people who commented on policies. In Proceedings of the Vth International Symposium on
the tool evaluations on the website. Integrated Network Management IM97 (May 1997), Chapman
& Hall, pp. 114.
This research is partially funded by the Agency for In-
novation by Science and Technology in Flanders (IWT), [15] M ARTIN -F LATIN , J.-P., Z NATY, S., AND H UBAUX , J.-P. A
survey of distributed enterprise network andsystems management
by the Interuniversity Attraction Poles Programme Bel- paradigms. J. Netw. Syst. Manage. 7, 1 (1999), 926.
gian State, Belgian Science Policy, and by the Research
[16] M OFFETT, J. D. Requirements and policies. In Proceedings of
Fund K.U.Leuven. the Policy Workshop (November 1999).
[17] N ARAIN , S. Towards a foundation for
References building distributed systems via configuration.
http://www.argreenhouse.com/papers/narain/Service-Grammar-
[1] A LVA C OUCH , J OHN H ART, E. G. I., AND K ALLAS , D. Seek- Web-Version.pdf, 2004.
ing closure in an open world: A behavioral agent approach to [18] O PPENHEIMER , D. The importance of understanding distributed
configuration management. In Proceedings of the 17th Large In- system configuration. In Proceedings of the 2003 Conference on
stallations Systems Administration (LISA) conference (Baltimore, Human Factors in Computer Systems workshop (April 2003).
MD, USA, 10/2003 2003), Usenix Association, Usenix Associa-
tion, p. 125148. [19] O PPENHEIMER , D., G ANAPATHI , A., AND PATTERSON , D. A.
Why do internet services fail, and what can be done about it? In
[2] A NDERSON , P., AND C OUCH , A. What is this thing called sys- USITS03: Proceedings of the 4th conference on USENIX Sympo-
tem configuration? LISA Invited Talk, November 2004. sium on Internet Technologies and Systems (Berkeley, CA, USA,
[3] A NDERSON , P., AND S MITH , E. Configuration tools: Working 2003), USENIX Association, USENIX Association, p. 11.
together. In Proceedings of the Large Installations Systems Ad- [20] O PPENHEIMER , D., AND PATTERSON , D. A. Studying and us-
ministration (LISA) Conference (Berkeley, CA, December 2005), ing failure data from large-scale internet services. In EW10: Pro-
Usenix Association, pp. 3138. ceedings of the 10th workshop on ACM SIGOPS European work-
[4] BARRETT, R., K ANDOGAN , E., M AGLIO , P. P., H ABER , E. M., shop (New York, NY, USA, 2002), ACM, ACM, p. 255258.
TAKAYAMA , L. A., AND P RABAKER , M. Field studies of [21] Partimage homepage. http://www.partimage.org.
computer system administrators: analysis of system management
tools and practices. In Proceedings of the 2004 ACM conference [22] PATTERSON , D. A. A simple way to estimate the cost of down-
on Computer supported cooperative work (New York, NY, USA, time. In Proceedings of the 16th USENIX conference on System
2004), ACM, ACM, pp. 388395. administration (Berkeley, CA, USA, 11/2002 2002), USENIX
Association, USENIX Association, p. 185188.
[5] BARRETT, R., M AGLIO , P. P., K ANDOGAN , E., AND BAILEY,
J. Usable autonomic computing systems: The system admin- [23] R AYMER , D., S TRASSNER , J., L EHTIHET, E., AND VAN DER
istrators perspective. Advanced Engineering Informatics 19, 3 M EER , S. End-to-end model driven policy based network man-
(2005), 213 221. Autonomic Computing. agement. In Policies for Distributed Systems and Networks, 2006.
Policy 2006. Seventh IEEE International Workshop (2006), p. 4.
[6] BARTAL , Y., M AYER , A., N ISSIM , K., AND W OOL , A. Fir-
mato: A novel firewall management toolkit. ACM Trans. Comput. [24] S YMANTEC. Norton Ghost Homepage.
Syst. 22, 4 (2004), 381420. http://www.symantec.com/ghost.
[7] C HARALAMBIDES , M., F LEGKAS , P., PAVLOU , G., BAN - [25] V ERMA , D. Simplifying network administration using policy-
DARA , A. K., L UPU , E. C., R USSO , A., D ULAY, N., S LO - based management. IEEE Network 16, 2 (Mar/Apr 2002), 2026.
MAN , M., AND R UBIO -L OYOLA , J. Policy conflict analysis for
quality of service management. In POLICY 05: Proceedings
of the Sixth IEEE International Workshop on Policies for Dis-
tributed Systems and Networks (POLICY05) (Washington, DC,
USA, 2005), IEEE Computer Society, pp. 99108.
[8] C OLVILLE , R. J., AND S COTT, D. Vendor Landscape: Server
Provisioning and Configuration Management. Gartner Research,
May 2008.
[9] F EYRER , H. g4u homepage. http://www.feyrer.de/g4u/.
[10] F U , Z. J., AND W U , S. F. Automatic generation of ipsec/vpn
security policies in an intra-domain environment, 2001.
[11] G ARBANI , J.-P., AND ON EILL , P. The IT Management Soft-
ware Megavendors. Forrester, August 2009.
14