Cassandra 12
Cassandra 12
Cassandra 12
2 Documentation
May 02, 2013
Contents
What's new in Apache Cassandra 1.2.2 and 1.2.3
Key Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Other enhancements and changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Understanding the Cassandra Architecture
Architecture in brief . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
About internode communications (gossip) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
About data distribution and replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Partitioners
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Types of snitches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
About client requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Planning a Cassandra cluster deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Anti-patterns in Cassandra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Installing a Cassandra Cluster
24
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Upgrading Cassandra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Security
45
Client-to-node encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Node-to-node encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Preparing server certificates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Configuring and using internal authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Managing object permissions using internal authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Configuring system_auth keyspace replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Configuring firewall port access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Initializing a Cassandra cluster
51
57
Anatomy of a table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Working with pre-CQL 3 applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
About indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
68
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
CQL 3 Reference
86
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
ALTER KEYSPACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
ALTER TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
ALTER USER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
BATCH
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
ASSUME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
CAPTURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
CONSISTENCY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
COPY
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
DESCRIBE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
EXIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
SHOW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
SOURCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
TRACING
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
135
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
147
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Operations
172
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
189
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Troubleshooting Guide
200
Reads are getting slower while writes are still fast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Nodes seem to freeze after some period of time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Nodes are dying with OOM errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Nodetool or JMX connections failing on remote nodes
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
204
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
205
208
Key Features
Cassandra 1.2 introduced a number of major improvements:
Virtual nodes: Prior to this release, Cassandra assigned one token per node, and each node owned exactly one
contiguous range within the cluster. Virtual nodes change this paradigm from one token and range per node to
many tokens per node. This allows each node to own a large number of small ranges distributed throughout the
ring. Virtual nodes provide a number of advantages:
You no longer have to calculate and assign tokens to each node.
Rebalancing a cluster is no longer necessary when adding or removing nodes. When a node joins the
cluster, it assumes responsibility for an even portion of data from the other nodes in the cluster. If a node
fails, the load is spread evenly across other nodes in the cluster.
Rebuilding a dead node is faster because it involves every other node in the cluster and because data is
sent to the replacement node incrementally instead of waiting until the end of the validation phase.
Improves the use of heterogeneous machines in a cluster. You can assign a proportional number of virtual
nodes to smaller and larger machines.
Murmur3Partitioner: This new default partitioner provides faster hashing and improved performance.
Faster startup times: The release provides faster startup/bootup times for each node in a cluster, with internal
tests performed at DataStax showing up to 80% less time needed to start primary indexes. The startup reductions
were realized through more efficient sampling and loading of indexes into memory caches. The index load time is
improved dramatically by eliminating the need to scan the primary index.
Improved handling of disk failures: In previous versions, a single unavailable disk had the potential to make the
whole node unresponsive (while still technically alive and part of the cluster). In this scenario, memtables cannot
flush and the node eventually runs out of memory. Additionally, if the disk contained the commitlog, data could no
longer be appended to the commitlog. Thus, the recommended configuration was to deploy Cassandra on top of
RAID 10, but this resulted in using 50% more disk space. Starting with version 1.2, instead of erroring out
indefinitely, Cassandra properly reacts to a disk failure, either by stopping the affected node or by blacklisting the
failed drive, depending on your availability and consistency requirements. This improvement allows you to deploy
Cassandra nodes with large disk arrays without the overhead of RAID 10.
Multiple independent leveled compactions in parallel: Increases the performance of leveled compaction.
Cassandra's leveled compaction strategy creates data files of a fixed, relatively small size that are grouped into
levels. Each level (L0, L1, L2 and so on) is 10 times as large as the previous. Parallel level compaction allows
concurrent compactions to be performed between and within different levels, which allows better utilization of all
available I/O. For detailed information, see Leveled Compaction in Apache Cassandra.
Configurable and more frequent tombstone removal: Tombstones are removed more often in Cassandra 1.2 and
are easier to manage. Cassandra now tracks and removes tombstones automatically. Configuring tombstone
removal instead of manually performing compaction can save users time, effort, and disk space.
CQL improvements
CQL 3, which was previewed in Beta form in Cassandra 1.1, has been released in Cassandra 1.2.
Note
DataStax Enterprise 3.0.x supports CQL 3 in Beta form. Users need to refer to Cassandra 1.1 documentation for CQL
information.
CQL 3 is now the mode for cqlsh. CQL 3 supports schema that map Cassandra storage engine cells to a more powerful
and natural row-column representation than earlier CQL versions and the Thrift API. CQL3 transposes data partitions
(sometimes called "wide rows") into familiar row-based resultsets, dramatically simplifying data modeling. New features
in Cassandra 1.2 include:
Collections: Collections provide easier methods for inserting and manipulating data that consists of multiple items
that you want to store in a single column; for example, multiple email addresses for a single employee. There are
three different types of collections: set, list, and map. Common tasks that required creating a multiple columns or a
separate table can now be accomplished intuitively using a single collection.
CQL native/binary protocol: Although Cassandra continues to support the Thrift RPC indefinitely, the CQL binary
protocol is a flexible and higher-performance alternative.
Query profiling/request tracing: This enhancement to cqlsh includes performance diagnostic utilities aimed at
helping you understand, diagnose, and troubleshoot CQL statements that are sent to a Cassandra cluster. You
can interrogate individual CQL statements in an ad-hoc manner, or perform a system-wide collection of all
queries/commands that are sent to a cluster using cqlsh tracing of read/write requests. For collecting all
statements that are sent to a database to isolate and tune most resource intensive statements, the nodetool utility,
probabilistic tracing, has been added.
System information: You can easily retrieve details about your cluster configuration and database objects by
querying tables in the system keyspace using CQL.
Atomic batches: Prior versions of Cassandra allowed for batch operations for grouping related updates into a
single statement. If some of the replicas for the batch failed mid-operation, the coordinator would hint those rows
automatically. However, if the coordinator itself failed in mid operation, you could end up with partially applied
batches. In version 1.2 of Cassandra, batch operations are guaranteed by default to be atomic, and are handled
differently than in earlier versions of the database.
Flat file loader/export utility: A new cqlsh utility facilitates importing and exporting flat file data to/from Cassandra
tables. Although it was initially introduced in Cassandra 1.1.3, the new load utility wasnt formally announced with
that version, so an explanation of it is warranted in this document. The utility mirrors the COPY command from the
PostgreSQL RDBMS. A variety of file formats are supported including comma-separated value (CSV),
tab-delimited, and more, with CSV being the default.
Architecture in brief
Cassandra is designed to handle big data workloads across multiple nodes with no single point of failure. Its architecture
is based in the understanding that system and hardware failure can and do occur. Cassandra addresses the problem of
failures by employing a peer-to-peer distributed system where all nodes are the same and data is distributed among all
nodes in the cluster. Each node exchanges information across the cluster every second. A commit log on each node
captures write activity to ensure data durability. Data is also written to an in-memory structure, called a memtable, and
then written to a data file called an SStable on disk once the memory structure is full. All writes are automatically
partitioned and replicated throughout the cluster.
Cassandra is a row-oriented database. Cassandra's architecture allows any authorized user to connect to any node in
any data center and access data using the CQL language. For ease of use, CQL uses a similar syntax to SQL. From the
CQL perspective the database consists of tables. Typically, a cluster has one keyspace per application. Developers can
access CQL through cqlsh as well as via drivers for application languages.
Client read or write requests can go to any node in the cluster. When a client connects to a node with a request, that
node serves as the coordinator for that particular client operation. The coordinator acts as a proxy between the client
application and the nodes that own the data being requested. The coordinator determines which nodes in the ring
should get the request based on how the cluster is configured. For more information, see About client requests.
The key components for configuring Cassandra are:
About internode communications (gossip): A peer-to-peer communication protocol to discover and share location
and state information about the other nodes in a Cassandra cluster.
Partitioner: A partitioner determines how to distribute the data across the nodes in the cluster. Choosing a
partitioner determines which node to place the first copy of data on.
Replica placement strategy: Cassandra stores copies (replicas) of data on multiple nodes to ensure reliability and
fault tolerance. A replication strategy determines which nodes to place replicas on. The first replica of data is
simply the first copy; it is not unique in any sense.
Snitch: A snitch defines the topology information that the replication strategy uses to place replicas and route
requests efficiently.
Description
cluster_name
Name of the cluster that this node is joining. Should be the same for every node in
the cluster.
listen_address
The IP address or hostname that other Cassandra nodes use to connect to this
node. Should be changed from localhost to the public address for the host.
seed_provider
A -seeds list is comma-delimited list of hosts (IP addresses) that gossip uses to
learn the topology of the ring. Every node should have the same list of seeds. In
multiple data-center clusters, the seed list should include a node from each data
center.
storage_port
The intra-node communication port (default is 7000). Must be the same for every
node in the cluster.
initial_token
Determines the range of data the node is responsible for in version 1.1 and earlier.
num_tokens
Determines the ranges of data the node is responsible for in version 1.2 and later.
Note
The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the
cluster. Seed nodes are not a single point of failure, nor do they have any other special purpose in cluster operations
beyond the bootstrapping of nodes.
Consistent hashing
This section provides more detail about how the consistent hashing mechanism distributes data across a cluster in
Cassandra. Consistent hashing partitions data based on the primary key. For example, if you have the following data:
jim
age: 36
car: camaro
gender: M
carol
age: 37
car: bmw
gender: F
johnny
age: 12
gender: M
suzy
age: 10
gender: F
jim
-2245462676723223822
carol
7723358927203680754
johnny
-6723372854036780875
suzy
1168604627387940318
Each node in the cluster is responsible for a range of data based on the hash value:
Node
-9223372036854775808
-4611686018427387903
-4611686018427387904
-1
4611686018427387903
4611686018427387904
9223372036854775807
Start range
End range
Primary key
Hash value
-9223372036854775808
-4611686018427387903
johnny
-6723372854036780875
-4611686018427387904
-1
jim
-2245462676723223822
4611686018427387903
suzy
1168604627387940318
4611686018427387904
9223372036854775807
carol
7723358927203680754
The top portion of the graphic shows a cluster without virtual nodes. In this paradigm, each node is assigned a single
token that represents a location in the ring. Each node stores data determined by mapping the row key to a token value
within a range from the previous node to its assigned value. Each node also contains copies of each row from other
nodes in the cluster. For example, range E replicates to nodes 5, 6, and 1. Notice that a node owns exactly one
contiguous range in the ring space.
The bottom portion of the graphic shows a ring with virtual nodes. Within a cluster, virtual nodes are randomly selected
and non-contiguous. The placement of a row is determined by the hash of the row key within many smaller ranges
belonging to each node.
Rebalancing a cluster is no longer necessary when adding or removing nodes. When a node joins the cluster, it
assumes responsibility for an even portion of data from the other nodes in the cluster. If a node fails, the load is
spread evenly across other nodes in the cluster.
Rebuilding a dead node is faster because it involves every other node in the cluster and because data is sent to
the replacement node incrementally instead of waiting until the end of the validation phase.
Improves the use of heterogeneous machines in a cluster. You can assign a proportional number of virtual nodes
to smaller and larger machines.
For more information, see the article Virtual nodes in Cassandra 1.2.
To set up virtual nodes:
Set the number of tokens on each node in your cluster with the num_tokens parameter in the cassandra.yaml file. The
recommended value is 256. Do not set the initial_token parameter.
Generally when all nodes have equal hardware capability, they should have the same number of virtual nodes. If the
hardware capabilities vary among the nodes in your cluster, assign a proportional number of virtual nodes to the larger
machines. For example, you could designate your older machines to use 128 virtual nodes and your new machines (that
are twice as powerful) with 256 virtual nodes.
SimpleStrategy
Use only for a single data center. SimpleStrategy places the first replica on a node determined by the partitioner.
Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or data center
location).
NetworkTopologyStrategy
Use NetworkTopologyStrategy when you have (or plan to have) your cluster deployed across multiple data centers. This
strategy specify how many replicas you want in each data center.
NetworkTopologyStrategy places replicas in the same data center by walking the ring clockwise until reaching the first
node in another rack. NetworkTopologyStrategy attempts to place replicas on distinct racks because nodes in the same
rack (or similar physical grouping) often fail at the same time due to power, cooling, or network issues.
When deciding how many replicas to configure in each data center, the two primary considerations are (1) being able to
satisfy reads locally, without incurring cross data-center latency, and (2) failure scenarios. The two most common ways
to configure multiple data center clusters are:
Partitioners
Two replicas in each data center: This configuration tolerates the failure of a single node per replication group
and still allows local reads at a consistency level of ONE.
Three replicas in each data center: This configuration tolerates either the failure of a one node per replication
group at a strong consistency level of LOCAL_QUORUM or multiple node failures per data center using
consistency level ONE.
Asymmetrical replication groupings are also possible. For example, you can have three replicas per data center to serve
real-time application requests and use a single replica for running analytics.
Partitioners
A partitioner determines how data is distributed across the nodes in the cluster (including replicas). Basically, a
partitioner is a hash function for computing the token (it's hash) of a row key. Each row of data is uniquely identified by a
row key and distributed across the cluster by the value of the token.
Both the Murmur3Partitioner and RandomPartitioner use tokens to help assign equal portions of data to each node and
evenly distribute data from all the tables throughout the ring or other grouping, such as a keyspace. This is true even if
the tables use different row keys, such as usernames or timestamps. Moreover, the read and write requests to the
cluster are also evenly distributed and load balancing is simplified because each part of the hash range receives an
equal number of rows on average. For more detailed information, see Consistent hashing.
Cassandra offers the following partitioners:
Murmur3Partitioner (default): Uniformly distributes data across the cluster based on MurmurHash hash values.
RandomPartitioner: Uniformly distributes data across the cluster based on MD5 hash values.
ByteOrderedPartitioner: Keeps an ordered distribution of data lexically by key bytes
The Murmur3Partitioner is the default partitioning strategy for new Cassandra clusters and the right choice for new
clusters in almost all cases.
Note
You can only use Murmur3Partitioner for new clusters; you cannot change the partitioner in existing clusters. If you are
switching to the 1.2 cassandra.yaml, be sure to change the partitioner setting to match the previous partitioner.
10
Types of snitches
The Murmur3Partitioner uses the MurmurHash function. This hashing function creates a 64-bit hash value of the row
key. The possible range of hash values is from -263 to +263.
When using the Murmur3Partitioner, you can page through all rows using the token function in a CQL 3 query.
Types of snitches
A snitch has two functions:
It determines which data centers and racks are written to and read from and informs Cassandra about the network
topology so that requests are routed efficiently.
It allows Cassandra to distribute replicas by grouping machines into data centers and racks. Cassandra does its
best not to have more than one replica on the same rack (which is not necessarily a physical location).
Note
If you change the snitch after data is inserted into the cluster, you must run a full repair, since the snitch affects where
replicas are placed.
11
Types of snitches
The following snitches are available:
SimpleSnitch
The SimpleSnitch (the default) does not recognize data center or rack information. Use it for single-data center
deployments (or single-zone in public clouds).
Using a SimpleSnitch, the only keyspace strategy option you specify is a replication factor.
RackInferringSnitch
The RackInferringSnitch determines the location of nodes by rack and data center, which are assumed to correspond to
the 3rd and 2nd octet of the node's IP address, respectively. Use this snitch as an example of writing a custom Snitch
class.
PropertyFileSnitch
The PropertyFileSnitch determines the location of nodes by rack and data center. This snitch uses a user-defined
description of the network details located in the cassandra-topology.properties file. Use this snitch when your node IPs
are not uniform or if you have complex replication grouping requirements as shown in Configuring the
PropertyFileSnitch.
When using this snitch, you can define your data center names to be whatever you want. Make sure that the data center
names you define in the cassandra-topology.properties file correlates to the name of your data centers in your
keyspace
strategy_options.
Every
node
in
the
cluster
should
be
described
in
the
cassandra-topology.properties file, and this file should be exactly the same on every node in the cluster.
The location of the cassandra-topology.properties file depends on the type of installation; see Cassandra
Configuration Files Locations or DataStax Enterprise Configuration Files Locations.
12
Types of snitches
110.56.12.120=DC2:RAC1
110.50.13.201=DC2:RAC1
110.54.35.184=DC2:RAC1
50.33.23.120=DC2:RAC2
50.45.14.220=DC2:RAC2
50.17.10.203=DC2:RAC2
# Analytics Replication Group
172.106.12.120=DC3:RAC1
172.106.12.121=DC3:RAC1
172.106.12.122=DC3:RAC1
# default for unknown nodes
default=DC3:RAC1
GossipingPropertyFileSnitch
The GossipingPropertyFileSnitch defines a local node's data center and rack; it uses gossip for propagating this
information to other nodes. The conf/cassandra-rackdc.properties file defines the default data center and rack used by
this snitch:
dc=DC1
rack=RAC1
The location of the conf directory depends on the type of installation; see Cassandra Configuration Files Locations or
DataStax Enterprise Configuration Files Locations
To migrate from the PropertyFileSnitch to the GossipingPropertyFileSnitch, update one node at a time to allow gossip
time to propagate. The PropertyFileSnitch is used as a fallback when cassandra-topologies.properties is
present.
EC2Snitch
Use the EC2Snitch for simple cluster deployments on Amazon EC2 where all nodes in the cluster are within a single
region. The region is treated as the data center and the availability zones are treated as racks within the data center. For
example, if a node is in us-east-1a, us-east is the data center name and 1a is the rack location. Because private
IPs are used, this snitch does not work across multiple Regions.
When defining your keyspace strategy_options, use the EC2 region name (for example,``us-east``) as your data center
name.
EC2MultiRegionSnitch
Use the EC2MultiRegionSnitch for deployments on Amazon EC2 where the cluster spans multiple regions. As with the
EC2Snitch, regions are treated as data centers and availability zones are treated as racks within a data center. For
example, if a node is in us-east-1a, us-east is the data center name and 1a is the rack location.
This snitch uses public IPs as broadcast_address to allow cross-region connectivity. This means that you must
configure each Cassandra node so that the listen_address is set to the private IP address of the node, and the
broadcast_address is set to the public IP address of the node. This allows Cassandra nodes in one EC2 region to bind
to nodes in another region, thus enabling multiple data center support. (For intra-region traffic, Cassandra switches to
the private IP after establishing a connection.)
13
14
15
Memory
16
CPU
Insert-heavy workloads are CPU-bound in Cassandra before becoming memory-bound. Cassandra is highly concurrent
and uses as many CPU cores as available:
For dedicated hardware, 8-core processors are the current price-performance sweet spot.
For virtual environments, consider using a provider that allows CPU bursting, such as Rackspace Cloud Servers.
Disk
Disk space depends a lot on usage, so it's important to understand the mechanism. Cassandra writes data to disk when
appending data to the commit log for durability and when flushing memtables to SSTable data files for persistent
storage. SSTables are periodically compacted. Compaction improves performance by merging and rewriting data and
discarding old data. However, depending on the type of compaction_strategy and size of the compactions, compaction
can substantially increase disk utilization and data directory volume. For this reason, you should leave an adequate
amount of free disk space available on a node: 50% (worst case) for SizeTieredCompactionStrategy and large
compactions, and 10% for LeveledCompactionStrategy. The following links provide information about compaction:
Configuring compaction and compression
The Apache Cassandra storage engine
Leveled Compaction in Apache Cassandra
When to Use Leveled Compaction
For information on calculating disk size, see Calculating usable disk capacity and Choosing node configuration options.
Recommendations:
Capacity and I/O: When choosing disks, consider both capacity (how much data you plan to store) and I/O (the
write/read throughput rate). Some workloads are best served by using less expensive SATA disks and scaling disk
capacity and I/O by adding more nodes (with more RAM).
Solid-state drives: SSDs are the recommended choice for Cassandra. Cassandra's sequential, streaming write
patterns minimize the undesirable effects of write amplification associated with SSDs. This means that Cassandra
deployments can take advantage of inexpensive consumer-grade SSDs. Enterprise level SSDs are not necessary
because Cassandra's SSD access wears out consumer-grade SSDs in the same time frame as more expensive
enterprise SSDs.
Number of disks - SATA: Ideally Cassandra needs at least two disks, one for the commit log and the other for
the data directories. At a minimum the commit log should be on its own partition.
Commit log disk - SATA: The disk not need to be large, but it should be fast enough to receive all of your writes
as appends (sequential I/O).
Data disks: Use one or more disks and make sure they are large enough for the data volume and fast enough to
both satisfy reads that are not cached in memory and to keep up with compaction.
17
RAID on data disks: It is generally not necessary to use RAID for the following reasons:
Data is replicated across the cluster based on the replication factor you've chosen.
Starting in version 1.2, Cassandra includes takes care of disk management with the JBOD (Just a bunch of
disks) support feature. Because Cassandra properly reacts to a disk failure, based on your
availability/consistency requirements, either by stopping the affected node or by blacklisting the failed drive,
this allows you to deploy Cassandra nodes with large disk arrays without the overhead of RAID 10.
RAID on the commit log disk: Generally RAID is not needed for the commit log disk. Replication adequately
prevents data loss. If you need the extra redundancy, use RAID 1.
Extended file systems: DataStax recommends deploying Cassandra on XFS. On ext2 or ext3, the maximum file
size is 2TB even using a 64-bit kernel. On ext4 it is 16TB.
Because Cassandra can use almost half your disk space for a single file, use XFS when using large disks,
particularly if using a 32-bit kernel. XFS file size limits are 16TB max on a 32-bit kernel, and essentially unlimited
on 64-bit.
Number of nodes
Prior to version 1.2, the recommended size of disk space per node was 300 to 500GB. Improvement to Cassandra 1.2,
such as JBOD support, virtual nodes, off-heap Bloom filters, and parallel leveled compaction (SSD nodes only), allow
you to use few machines with multiple terabytes of disk space.
Network
Since Cassandra is a distributed data store, it puts load on the network to handle read/write requests and replication of
data across nodes. Be sure to choose reliable, redundant network interfaces and make sure that your network can
handle traffic between nodes without bottlenecksT.
Recommended bandwidth is 1000 Mbit/s (Gigabit) or greater.
Bind the Thrift interface (listen_address) to a specific NIC (Network Interface Card).
Bind the RPC server interface (rpc_address) to another NIC.
Cassandra efficiently routes requests to replicas that are geographically closest to the coordinator node and chooses a
replica in the same rack if possible; it always chooses replicas located in the same data center over replicas in a remote
data center.
Firewall
If using a firewall, make sure that nodes within a cluster can reach each other. See Configuring firewall port access.
Generally, when you have firewalls between machines, it is difficult to run JMX across a network and maintain security.
This is because JMX connects on port 7199, handshakes, and then uses any port within the 1024+ range. Instead use
SSH to execute commands remotely connect to JMX locally or use the DataStax OpsCenter.
18
For production Cassandra clusters on EC2, use Large or Extra Large instances with local storage.
Amazon Web Service recently reduced the number of default ephemeral disks attached to the image from four to
two. Performance will be slower for new nodes unless you manually attach the additional two disks; see Amazon
EC2 Instance Store.
RAID 0 the ephemeral disks, and put both the data directory and the commit log on that volume. This has proved
to be better in practice than putting the commit log on the root volume (which is also a shared resource). For more
data redundancy, consider deploying your Cassandra cluster across multiple availability zones or using EBS
volumes to store your Cassandra backup files.
Cassandra JBOD support allows you to use standard disks, but you may get better throughput with RAID0. RAID0
splits every block to be on another device so that writes are written in parallel fashion instead of written serially on
disk.
EBS volumes are not recommended for Cassandra data volumes for the following reasons:
EBS volumes contend directly for network throughput with standard packets. This means that EBS
throughput is likely to fail if you saturate a network link.
EBS volumes have unreliable performance. I/O performance can be exceptionally slow, causing the system
to backload reads and writes until the entire cluster becomes unresponsive.
Adding capacity by increasing the number of EBS volumes per host does not scale. You can easily surpass
the ability of the system to keep effective buffer caches and concurrently serve requests for all of the data it
is responsible for managing.
For more information and graphs related to ephemeral versus EBS performance, see the blog article at
http://blog.scalyr.com/2012/10/16/a-systematic-look-at-ec2-io/.
19
Anti-patterns in Cassandra
Column Overhead: Every column in Cassandra incurs 15 bytes of overhead. Since each row in a table can have
different column names as well as differing numbers of columns, metadata is stored for each column. For counter
columns and expiring columns, add an additional 8 bytes (23 bytes total). So the total size of a regular column is:
regular_total_column_size = column_name_size + column_value_size + 15
counter-expiring_total_column_size = column_name_size + column_value_size + 23
Row Overhead: Every row in Cassandra incurs 23 bytes of overhead.
Primary Key Index: Every table also maintains a primary index of its row keys. Sizing of the primary row key
index can be estimated as follows (in bytes):
primary_key_index = number_of_rows * (32 + average_key_size)
Replication Overhead: The replication factor plays a role in how much disk capacity is used. For a replication
factor of 1, there is no overhead for replicas (as only one copy of data is stored in the cluster). If replication factor
is greater than 1, then your total data storage requirement will include replication overhead.
replication_overhead = total_data_size * (replication_factor - 1)
Anti-patterns in Cassandra
The anti-patterns described here are implementation or design patterns that are ineffective and/or counterproductive in
Cassandra production installations. Correct patterns are suggested in most cases.
20
Anti-patterns in Cassandra
CPU utilization
Latency
40 GB
50%
750
1 second
8 GB
5%
8500 [1]
10 ms
Multiple-gets
Multiple-gets may cause problems. One sure way to kill a node is to buffer 300MB of data, timeout, and then try again
from 50 different clients.
You should architect your application using many single requests for different rows. This method ensures that if a read
fails on a node, due to a backlog of pending requests, an unmet consistency, or other error, only the failed request
needs to be retried.
21
Anti-patterns in Cassandra
Ideally, use the same key reading for the entire key or slices. Be sure to keep the row sizes in mind to prevent
out-of-memory (OOM) errors by reading too many entire ultra-wide rows in parallel.
Super columns
Do not use super columns. They are a legacy design from a pre-open source release. This design was structured for a
specific use case and does not fit most use cases. Super columns read entire super columns and all its sub-columns
into memory for each read request. This results in severe performance issues. Additionally, super columns are not
supported in CQL 3.
Use composite columns instead. Composite columns provide most of the same benefits as super columns without the
performance issues.
Load balancers
Cassandra was designed to avoid the need for load balancers. Putting load balancers between Cassandra and
Cassandra clients is harmful to performance, cost, availability, debugging, testing, and scaling. All high-level clients,
such as Astyanax and pycassa, implement load balancing directly.
Insufficient testing
Be sure to test at scale and production loads. This the best way to ensure your system will function properly when your
application goes live. The information you gather from testing is the best indicator of what throughput per node is
needed for future expansion calculations.
To properly test, set up a small cluster with production loads. There will be a maximum throughput associated with each
node count before the cluster can no longer increase performance. Take the maximum throughput at this cluster size
and apply it linearly to a cluster size of a different size. Next extrapolate (graph) your results to predict the correct cluster
sizes for required throughputs for your production cluster. This allows you to predict the correct cluster sizes for required
throughputs in the future. The Netflix case study shows an excellent example for testing.
Anti-patterns in Cassandra
Parallel SSH and Cluster SSH: The pssh and cssh tools allow SSH access to multiple nodes. This is useful for
inspections and cluster wide changes.
Passwordless SSH: SSH authentication is carried out by using public and private keys. This allows SSH
connections to easily hop from node to node without password access. In cases where more security is required,
you can implement a password Jump Box and/or VPN.
Useful common command-line tools include:
top: Provides an ongoing look at processor activity in real time.
System performance tools: Tools such as iostat, mpstat, iftop, sar, lsof, netstat, htop, vmstat, and similar
can collect and report a variety of metrics about the operation of the system.
vmstat: Reports information about processes, memory, paging, block I/O, traps, and CPU activity.
iftop: Shows a list of network connections. Connections are ordered by bandwidth usage, with the pair of
hosts responsible for the most traffic at the top of list. This tool makes it easier to identify the hosts causing
network congestion.
More anti-patterns
For more about anti-patterns, visit the Matt Dennis slideshare.
23
Note
For information on installing for evaluation or installing on Windows, see the Quick Start Documentation.
Note
By downloading community software from DataStax you agree to the terms of the DataStax Community EULA (End
User License Agreement) posted on the DataStax web site.
Prerequisites
Before installing Cassandra make sure the following prerequisites are met:
Yum Package Management application installed.
Root or sudo access to the install machine.
The latest version of Oracle Java SE Runtime Environment (JRE) 6 is installed. Java 7 is not recommended.
Java Native Access (JNA) is required for production installations. See Installing JNA.
Also see Recommended settings for production installations.
24
4. In this file add the following lines for the DataStax repository:
[datastax]
name= DataStax Repo for Apache Cassandra
baseurl=http://rpm.datastax.com/community
enabled=1
gpgcheck=0
5. Install the package using yum.
$ sudo yum install dsc12
This installs the DataStax Community distribution of Cassandra and the OpsCenter Community Edition.
Next steps
Initializing a multiple node cluster
Install locations
Note
By downloading community software from DataStax you agree to the terms of the DataStax Community EULA (End
User License Agreement) posted on the DataStax web site.
Prerequisites
Before installing Cassandra make sure the following prerequisites are met:
Aptitude Package Manager installed.
Root or sudo access to the install machine.
The latest version of Oracle Java SE Runtime Environment (JRE) 6 is installed. Java 7 is not recommended.
Java Native Access (JNA) is required for production installations. See Installing JNA.
Also see Recommended settings for production installations.
Note
If you are using Ubuntu 10.04 LTS, you need to update to JNA 3.4, as described in Install JNA on Ubuntu 10.04.
25
1. Check which version of Java is installed by running the following command in a terminal window:
java -version
Use the latest version of Java 6 on all nodes. Java 7 is not recommended. If you need help installing Java, see
Installing the JRE on Debian or Ubuntu Systems.
2. Add the DataStax Community repository to the /etc/apt/sources.list.d/cassandra.sources.list.
deb http://debian.datastax.com/community stable main
3. (Debian Systems Only) In /etc/apt/sources.list, find the line that describes your source repository for
Debian and add contrib non-free to the end of the line. This allows installation of the Oracle JVM instead of
the OpenJDK JVM. For example:
deb http://some.debian.mirror/debian/ $distro main contrib non-free
Save and close the file when you are done adding/editing your sources.
4. Add the DataStax repository key to your aptitude trusted keys.
$ curl -L http://debian.datastax.com/debian/repo_key | sudo apt-key add 5. Install the package.
$ sudo apt-get update
$ sudo apt-get install dsc12
This installs the DataStax Community distribution of Cassandra and the OpsCenter Community Edition. By
default, the Debian packages start the Cassandra service automatically.
6. To stop the service and clear the initial gossip history that gets populated by this initial start:
$ sudo service cassandra stop
$ sudo rm -rf /var/lib/cassandra/data/system/*
Next steps
Initializing a multiple node cluster
Install locations
Note
By downloading community software from DataStax you agree to the terms of the DataStax Community EULA (End
User License Agreement) posted on the DataStax web site.
Prerequisites
Before installing Cassandra make sure the following prerequisites are met:
The latest version of Oracle Java SE Runtime Environment (JRE) 6 is installed. Java 7 is not recommended.
26
Java Native Access (JNA) is required for production installations. See Installing JNA.
Also see Recommended settings for production installations.
Note
If you are using Ubuntu 10.04 LTS, you need to update to JNA 3.4, as described in Install JNA on Ubuntu 10.04.
sudo
sudo
sudo
sudo
mkdir
mkdir
chown
chown
/var/lib/cassandra
/var/log/cassandra
-R $USER:$GROUP /var/lib/cassandra
-R $USER:$GROUP /var/log/cassandra
Next steps
Initializing a multiple node cluster
Install locations
27
root soft
root hard
* soft as
* hard as
root soft
root hard
memlock unlimited
memlock unlimited
unlimited
unlimited
as unlimited
as unlimited
soft
nproc
in
10240
Disable swap
Disable swap entirely. This prevents the Java Virtual Machine (JVM) from responding poorly because it is buried in
swap and ensures that the OS OutOfMemory (OOM) killer does not kill Cassandra.
sudo swapoff --all
For more information, see Nodes seem to freeze after some period of time.
Synchronize clocks
The clocks on all nodes should be synchronized. You can use NTP (Network Time Protocol) or other methods.
This is required because columns are only overwritten if the timestamp in the new version of the column is more recent
than the existing column.
28
Note
After installing the JRE, you may need to set JAVA_HOME:
export JAVA_HOME=<path_to_java>
6. Make sure your system is now using the correct JRE. For example:
$ java -version
java version "1.6.0_43"
Java(TM) SE Runtime Environment (build 1.6.0_43-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode)
29
7. If the OpenJDK JRE is still being used, use the alternatives command to switch it. For example:
$ sudo alternatives --config java
There are 2 programs which provide 'java'.
Selection
Command
-----------------------------------------------------------1
/usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java
*+ 2
/usr/java/jre1.6.0_43/bin/java
Enter to keep the current selection[+], or type selection number: 2
Note
If updating from a previous version that was removed manually, execute the above command twice, because
you'll get an error message the first time.
7. Set the new JRE as the default:
sudo update-alternatives --set java /usr/java/latest/jre1.6.0_43/bin/java
8. Make sure your system is now using the correct JRE:
$ java -version
java version "1.6.0_43"
Java(TM) SE Runtime Environment (build 1.6.0_43-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode)
30
Installing JNA
Java Native Access (JNA) is required for production installations. Installing JNA can improve Cassandra memory usage.
When installed and configured, Linux does not swap out the JVM, and thus avoids related performance issues.
SUSE Systems
Install with the following commands:
31
#
#
#
#
Tarball Installations
Install with the following commands:
1. Download jna.jar from https://github.com/twall/jna.
2. Add jna.jar to <install_location>/lib/ (or place it in the CLASSPATH).
3. Add the following lines in the /etc/security/limits.conf file for the user/group that runs Cassandra:
$USER soft memlock unlimited
$USER hard memlock unlimited
Production considerations
For production Cassandra clusters on EC2, use Large or Extra Large instances with local storage. RAID0 the ephemeral
disks, and put both the data directory and the commit log on that volume. This has proved to be better in practice than
putting the commit log on the root volume (which is also a shared resource). For more data redundancy, consider
deploying your Cassandra cluster across multiple availability zones or using EBS volumes to store your Cassandra
backup files.
32
3. Click the Inbound tab and add rules for the ports listed in the table below:
Create a new rule: Custom TCP rule.
Port range: See table.
Source: See table. To create rules that open a port to other nodes in the same security group, use the
Group ID listed in the Group Details tab.
33
Port
Description
SSH port.
8888
7000
7199
Cassandra JMX monitoring port. After the initial handshake, the JMX protocol requires that
the client reconnects on a randomly chosen port (1024+).
9160
OpsCenter ports
61620
OpsCenter monitoring port. The opscenterd daemon listens on this port for TCP traffic
coming from the agent.
61621
OpsCenter agent port. The agents listen on this port for SSL traffic initiated by OpsCenter.
Note
Generally, when you have firewalls between machines, it is difficult to run JMX across a network and maintain
security. This is because JMX connects on port 7199, handshakes, and then uses any port within the 1024+
range. Instead use SSH to execute commands remotely connect to JMX locally or use the DataStax OpsCenter.
4. After you are done adding the above port rules, click Apply Rule Changes. Your completed port rules should look
similar to this:
Warning
This security configuration shown in the above example opens up all externally accessible ports to incoming traffic
from any IP address (0.0.0.0/0). The risk of data loss is high. If you desire a more secure configuration, see the
Amazon EC2 help on Security Groups.
34
3. On the Request Instances Wizard page, verify the settings and then click Continue.
4. On the Instance Details page, enter the total number of nodes that you want in your cluster, select the Instance
Type, and then click Continue.
Use the following guidelines when selecting the type of instance:
Extra large for production.
Large for development and light production.
Small and Medium not supported.
35
Note
EBS volumes are not recommended. In Cassandra data volumes, EBS throughput may fail in a saturated network
link, I/O may be exceptionally slow, and adding capacity by increasing the number of EBS volumes per host does
not scale. For more information and graphs related to ephemeral versus EBS performance, see the blog article at
http://blog.scalyr.com/2012/10/16/a-systematic-look-at-ec2-io/.
36
5. On the next page, under Advanced Instance Options, add the following options to the User Data section
according to the type of cluster you want, and then click Continue.
For new clusters the available options are:
Option
Description
--clustername <name>
--totalnodes <#_nodes>
--version community
--opscenter [no]
--reflector <url>
6. On the Storage Device Configuration page, you can add ephemeral drives if needed.
Note
Amazon Web Service recently reduced the number of default ephemeral disks attached to the image from four
to two. Performance will be slower for new nodes unless you manually attach the additional two disks; see
Amazon EC2 Instance Store.
37
7. On the Tags page, give a name to your DataStax Community instance, such as cassandra-node, and then click
Continue.
8. On the Create Key Pair page, create a new key pair or select an existing key pair, and then click Continue. Save
this key (.pem file) to your local machine; you will need it to log in to your DataStax Community instance.
9. On the Configure Firewall page, select the security group that you created earlier and click Continue.
10. On the Review page, review your cluster configuration and then click Launch.
11. Close the Launch Install Wizard and go to the My Instances page to see the status of your Cassandra instance.
Once a node has a status of running, you can connect to it.
2. To get the public DNS name of a node, select Instance Actions > Connect.
38
3. In the Connect Help - Secure Shell (SSH) page, copy the command line and change the connection user from
root to ubuntu, then paste it into your SSH client.
4. The AMI image configures your cluster and starts the Cassandra services. After you have logged into a node, run
the nodetool status command to make sure your cluster is running. For more information, see the nodetool
utility.
39
5. If you installed the OpsCenter with your Cassandra cluster, allow about 60 to 90 seconds after the cluster has
finished initializing for OpsCenter to start. You can launch OpsCenter using the URL:
http://<public-dns-of-first-instance>:8888.
6. After the OpsCenter loads, you must install the OpsCenter agents to see the cluster performance data.
a. Click the Fix link located near the top of the Dashboard in the left navigation pane to install the agents.
b. When prompted for credentials for the agent nodes, use the username ubuntu and copy and paste the
entire contents from your private key (.pem) file that you downloaded earlier.
Next steps
40
Note
For adding nodes to clusters created prior to Cassandra 1.2, follow the instructions in the 1.1 topic Expanding a
Cassandra AMI cluster.
Note
You must clear the data because new nodes have existing data from the initial start with the temporary cluster
name and settings.
4. Set the following properties in the cassandra.yaml configurtion file. For example:
cluster_name: 'NameOfExistingCluster'
...
num_tokens: 256
...
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "110.82.155.0,110.82.155.3"
Do not set the initial_token.
41
Upgrading Cassandra
5. Start each node in two minute intervals. You can monitor the startup and data streaming process using nodetool
netstats.
$ sudo service cassandra start
6. Verify that each node has finished joining the ring:
$ nodetool status
Upgrading Cassandra
This section describes how to upgrade an earlier version of Cassandra or DataStax Community Edition to DataStax
Community 1.2.3. This section contains the following topics:
Best Practices for upgrading Cassandra
Pre-requisite steps
To upgrade a binary tarball installation
Pre-requisite steps
1. Date strings (and timestamps) are no longer accepted as valid timeuuid values. This change requires modifying
queries that use these values. New methods have been added for working with timeuuid values.
2. Cassandra 1.2.3 is not network-compatible with versions older than 1.0. If you want to perform a rolling restart,
first upgrade the cluster to 1.0.x or 1.1.x, and then to 1.2.3, as described in the Cassandra 1.1 documentation.
Data files from Cassandra 0.6 and later are compatible with Cassandra 1.2 and later. If it's practical to shut down
cluster instead of performing a rolling restart, you can skip upgrading to an interim release and upgrade from
Cassandra 0.6 or later to 1.2.3.
42
Upgrading Cassandra
3. Do not upgrade if nodes in the cluster are down. The hints schema changed from 1.1 to 1.2.3. Cassandra
automatically snapshots and then truncates the hints column family as part of starting up 1.2.3 for the first time.
Additionally, upgraded nodes will not store new hints destined for older (pre-1.2) nodes. Use the nodetool
removenode command, which was called nodetool removetoken in earlier releases, to removed dead nodes.
Note
Do not use the default partitioner setting because it has changed in this release to the Murmur3Partitioner. The
Murmur3Partitioner can be used only for new clusters. After data has been added to the cluster, you cannot
change the partitioner without reworking tables, which is not practical. Use your old partitioner setting in the new
cassandra.yaml file.
7. Follow steps for completing the upgrade.
2. Open the old and new cassandra.yaml files and diff them.
3. Merge the diffs by hand, including the partitioner setting, from the old file into the new one. For details, see the
note in step 6 of the tarball upgrade procedure. Save the file as cassandra.yaml.
4. Follow steps for completing the upgrade.
Upgrading Cassandra
3. Run nodetool drain before shutting down the existing Cassandra service. This prevents overcounts of counter
data, and will also speed up restart post-upgrade.
4. Stop the old Cassandra process, then start the new binary process.
5. Monitor the log files for any issues.
6. After upgrading and restarting all Cassandra processes, restart client applications.
44
Security
Security
Cassandra 1.2.2 and later includes a number of features for securing data.
Client-to-node encryption
Client-to-node encryption protects data in flight from client machines to a database cluster. It establishes a secure
channel between the client and the coordinator node. For information about generating SSL certificates, see Preparing
server certificates.
Note
You cannot use cqlsh when client certificate authentication is enabled (require_client_auth=true).
Sample files are available in the following directories:
Packaged installs: /etc/cassandra/conf
Binary installs: <install_location>/conf
For example:
[authentication]
username = fred
password = !!bang!!$
[connection]
hostname = 127.0.0.1
port = 9160
factory = cqlshlib.ssl.ssl_transport_factory
45
Node-to-node encryption
[ssl]
certfile = ~/keys/cassandra.cert
validate = true ## Optional, true by default.
[certfiles] ## Optional section, overrides the default certfile in the [ssl] section.
192.168.1.3 = ~/keys/cassandra01.cert
192.168.1.4 = ~/keys/cassandra02.cert
When validate is enabled, the host in the certificate is compared to the host of the machine that it is connected to. The
SSL certificate must be provided either in the configuration file or as an environment variable. The environment
variables (SSL_CERTFILE and SSL_VALIDATE) override any options set in this file.
Node-to-node encryption
Node-to-node encryption protects data transferred between nodes in a cluster using SSL (Secure Sockets Layer). For
information about generating SSL certificates, see Preparing server certificates.
46
47
Configuration
CassandraAuthorizer is one of many possible IAuthorizer implementations, and the one that stores permissions in the
system_auth.permissions column family to support all authorization-related CQL 3 statements. Configuration consists
mainly of changing the authorizer option in the cassandra.yaml to use the CassandraAuthorizer.
48
Description
SSH port.
8888
7000
7199
Cassandra JMX monitoring port. After the initial handshake, the JMX protocol requires that the
client reconnects on a randomly chosen port (1024+).
9160
OpsCenter ports
61620
OpsCenter monitoring port. The opscenterd daemon listens on this port for TCP traffic coming
from the agent.
61621
OpsCenter agent port. The agents listen on this port for SSL traffic initiated by OpsCenter.
50
Note
In Cassandra, the term data center is a grouping of nodes. Data center is synonymous with replication group, that is, a
grouping of nodes configured together for replication purposes.
Prerequisites
Each node must be correctly configured before starting the cluster. You must determine or perform the following before
starting the cluster:
Install Cassandra on each node.
Choose a name for the cluster.
Get the IP address of each node.
Determine which nodes will be seed nodes. (Cassandra nodes use the seed node list for finding each other and
learning the topology of the ring.)
Determine the snitch.
If the nodes are behind a firewall, open the required ports for internal/external communication. See Configuring
firewall port access.
If using multiple data centers, determine a naming convention for each data center and rack, for example: DC1,
DC2 or 100, 200 and RAC1, RAC2 or R101, R102.
Other possible configuration settings are described in Choosing node configuration options and Node and cluster
configuration (cassandra.yaml).
The following examples demonstrate initializing Cassandra:
Configuration example for single data center
Configuration example for multiple data centers
51
Note
After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.
1. Suppose you install Cassandra on these nodes with one node per rack serving as a seed:
node0 110.82.155.0 (seed1)
node1 110.82.155.1
node2 110.82.155.2
node3 110.82.156.3 (seed2)
node4 110.82.156.4
node5 110.82.156.5
It is a best practice to have at more than one seed node per data center.
2. If you have a firewall running on the nodes in your cluster, you must open certain ports to allow communication
between the nodes. See Configuring firewall port access.
3. If the Cassandra is running, stop the node and clear the data.
For packaged installs, run the following commands:
$ sudo service cassandra stop (stops the service)
$ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)
For binary installs, run the following commands from the install directory:
$ ps auwx | grep cassandra (finds the Cassandra Java process ID [PID])
$ sudo kill <pid> (stops the process)
$ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)
4. Modify the following property settings in the cassandra.yaml file for each node:
num_tokens: -seeds: <internal IP_address of each seed node>
listen_address: <localhost IP address>
endpoint_snitch <name of snitch> - See endpoint_snitch.
node0
cluster_name: 'MyDemoCluster'
num_tokens: 256
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "110.82.155.0,110.82.155.3"
listen_address: 110.82.155.0
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch
node1 to node5
The properties for these nodes are the same as node0 except for the listen_address.
52
5. After you have installed and configured Cassandra on all nodes, start the seed nodes one at a time, and then start
the rest of the nodes.
Note
If the node has restarted because of automatic restart, you must stop the node and clear the data directories, as
described in above.
Packaged installs: sudo service cassandra start
Binary installs, run one of the following commands from the install directory:
bin/cassandra (starts in the background)
bin/cassandra -f (starts in the foreground)
6. To check that the ring is up and running, run the nodetool status command.
Note
After changing properties in these files, you must restart the node for the changes to take effect.
1. Suppose you install Cassandra on these nodes:
node0 10.168.66.41 (seed1)
node1 10.176.43.66
node2 10.168.247.41
node3 10.176.170.59 (seed2)
node4 10.169.61.170
node5 10.169.30.138
2. If you have a firewall running on the nodes in your cluster, you must open certain ports to allow communication
between the nodes. See Configuring firewall port access.
53
3. If the Cassandra is running, stop the node and clear the data.
For packaged installs, run the following commands:
$ sudo service cassandra stop (stops the service)
$ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)
For binary installs, run the following commands from the install directory:
$ ps auwx | grep cassandra (finds the Cassandra Java process ID [PID])
$ sudo kill <pid> (stops the process)
$ sudo rm -rf /var/lib/cassandra/* (clears the data from the default directories)
4. Modify the following property settings in the cassandra.yaml file for each node:
num_tokens: -seeds: <internal IP_address of each seed node>
listen_address: <localhost IP address>
endpoint_snitch <name of snitch> - See endpoint_snitch.
node0:
cluster_name: 'MyDemoCluster'
num_tokens: 256
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "10.168.66.41,10.176.170.59"
listen_address: 10.168.66.41
endpoint_snitch: PropertyFileSnitch
Note
Include at least one node from each data center.
node1 to node5
The properties for these nodes are the same as node0 except for the listen_address.
5. In the cassandra-topology.properties file, assign the data center and rack names you determined in the
Prerequisites to the IP addresses of each node. For example:
# Cassandra Node IP=Data Center:Rack
10.168.66.41=DC1:RAC1
10.176.43.66=DC2:RAC1
10.168.247.41=DC1:RAC1
10.176.170.59=DC2:RAC1
10.169.61.170=DC1:RAC1
10.169.30.138=DC2:RAC1
6. Also, in the cassandra-topologies.properties file, assign a default data center name and rack name for
unknown nodes.
# default for unknown nodes
default=DC1:RAC1
54
Generating tokens
7. After you have installed and configured Cassandra on all nodes, start the seed nodes one at a time, and then start
the rest of the nodes.
Note
If the node has restarted because of automatic restart, you must stop the node and clear the data directories, as
described in above.
Packaged installs: sudo service cassandra start
Binary installs, run one of the following commands from the install directory:
bin/cassandra (starts in the background)
bin/cassandra -f (starts in the foreground)
8. To check that the ring is up and running, run the nodetool status command.
Generating tokens
You do not need to generate tokens when using virtual nodes in Cassandra 1.2 and later clusters.
If you are not using virtual nodes, you still need to calculate tokens for your cluster. The following topics in the
Cassandra 1.1 documentation provides conceptual information about tokens:
Data Distribution in the Ring
Replication Strategy
Generating tokens
To calculate tokens when using the RandomPartitioner in Cassandra 1.2 clusters, use the Cassandra 1.1 Token
Generating Tool.
56
Anatomy of a table
CQL 3 tables, rows, and columns can be viewed much the same way as SQL, which is different from the internal
implementation in Cassandra. In SQL you define tables, which have defined columns. The table defines the column
names and their data types, and the client application then supplies rows conforming to that schema. In Cassandra, you
also define tables and metadata about the columns, but the actual columns that make up a row are determined by the
client application.
CQL 3 improves upon CQL 2 in a number of ways:
Clients no longer need to manually decode the CompositeType packing when processing values from a table like
playlists in the music service example. Non-Java clients, such as Python cqlsh, no longer need to
reverse-engineer the serialization format, and Java clients need not perform cumbersome unpacking.
The row-oriented parts of SQL are akin to the CQL 3 abstraction of the Cassandra database, which does away
with the CQL 2 syntax using FIRST to limit the number of columns, distinct from LIMIT for the number of rows.
CQL 3 addresses a number of other CQL 3 problems with handling wide rows, such as indexing their data and
performing row-oriented functions, such as count. CQL 3 is easier to use and intuitive for SQL users.
The COMPACT STORAGE directive provides backward compatibility for tables you create in CQL 3 and can also be
used to direct Cassandra to use a more space-efficient storage format for the table at the cost of making all primary key
columns non-updateable.
57
Note
UUIDs are handy for sequencing the data or automatically incrementing synchronization across multiple
machines.
After inserting the example data into playlists, the output of selecting all the data looks like this:
SELECT * FROM playlists;
id
| song_id
| album
| artist
| title
-------------+-------------+--------------+----------------+--------------------62c36092... | 2b09185b... |
Roll Away | Back Door Slam | Outside Woman Blues
62c36092... | 8a172618... | We Must Obey |
Fu Manchu |
Moving in Stereo
62c36092... | a3e64f8f... | Tres Hombres |
ZZ Top |
La Grange
The CQL 3 data maps to the storage engine's representation of the data as follows:
The solid lines show how the cell values get unpacked into three CQL 3 columns. The dotted lines show how the
clustering key portion of the compound primary key becomes the song_id column. Presenting a storage engine row as a
partition of two or more object rows is a more natural row-column representation than earlier CQL versions and the Thrift
API.
The next example illustrates how you can create a query that uses the artist as a filter. First, add a little more data to the
playlist table to make things interesting for the collections examples later:
58
Collection columns
CQL 3 introduces these collection types:
collections set
59
list
map
Handling some tasks in earlier releases of Cassandra was not as elegant as in a relational database. For example, in a
relational database, to allow users to have multiple email addresses, you create an email_addresses table having a
many-to-one (joined) relationship to a users table. In earlier releases of Cassandra, you denormalized the data, and
stored it in multiple columns: email1, email2, and so on. This early Cassandra approach involved doing a read before
adding a new email address to know which column name to use, but otherwise, involved no performance hit because
adding new columns is virtually free in Cassandra.
CQL 3 includes the capability to handle the classic multiple email addresses use case, and other use cases, by defining
columns as collections. Using the set collection type to solve the multiple email addresses problem is convenient and
intuitive using CQL 3.
Another use of a collection type can be demonstrated using the music service example.
From a relational standpoint, you can think of storage engine rows as partitions, within which (object) rows are clustered.
After tagging songs, the table would look like this:
60
Updating a collection
Update the songs table to insert the tags data:
UPDATE songs
WHERE id =
UPDATE songs
WHERE id =
UPDATE songs
WHERE id =
UPDATE songs
WHERE id =
UPDATE songs
WHERE id =
A music reviews list and a schedule (map collection) of live appearances can be added to the table:
ALTER TABLE songs ADD reviews list<text>;
ALTER TABLE songs ADD venue map<timestamp, text>;
Each element of a map, list, or map is internally stored as one Cassandra column. To update a set, use the UPDATE
command and the addition (+) operator to add an element or the subtraction (-) operator to remove an element. For
example, to update a set:
UPDATE songs
SET tags = tags + {'rock'}
WHERE id = 7db1a490-5878-11e2-bcfd-0800200c9a66;
To update a list, a similar syntax using square brackets instead of curly brackets is used.
UPDATE songs
SET reviews = reviews + [ 'hot dance music' ]
WHERE id = 7db1a490-5878-11e2-bcfd-0800200c9a66;
To update a map, use INSERT to specify the data in a map collection.
INSERT INTO songs (id, venue)
VALUES (7db1a490-5878-11e2-bcfd-0800200c9a66,
{ '2013-9-22 12:01' : 'The Fillmore',
'2013-10-1 18:00' : 'The Apple Barrel'});
Inserting data into the map replaces the entire map.
61
Querying a collection
To query a collection, include the name of the collection column in the select expression. For example, selecting the
tags set returns the set of tags, sorted alphabetically in this case because the tags set is of the text data type:
SELECT id, tags FROM songs;
id
| tags
--------------------------------------+---------------7db1a490-5878-11e2-bcfd-0800200c9a66 |
{rock}
a3e64f8f-bd44-4f28-b8d9-6938726e34d4 | {blues, 1973}
8a172618-b121-4136-bb10-f665cfc469eb | {2007, covers}
SELECT id, venue FROM songs;
id
| venue
--------------------------------------+-------------------------------------------------------------7db1a490... | {2013-10-01 18:00:00-0700: The Apple Barrel, 2013-09-22 12:01:00-0700: The Fillmore}
a3e64f8f... |
null
8a172618... |
null
The collection types are described in more detail in Using collections: set, list, and map.
Expiring columns
Data in a column can have an optional expiration date called TTL (time to live). Whenever a column is inserted, the
client request can specify an optional TTL value, defined in seconds, for the data in the column. TTL columns are
marked as having the data deleted (with a tombstone) after the requested amount of time has expired. After columns
are marked with a tombstone, they are automatically removed during the normal compaction (defined by the
gc_grace_seconds) and repair processes.
Use CQL to set the TTL for a column.
If you want to change the TTL of an expiring column, you have to re-insert the column with a new TTL. In Cassandra,
the insertion of a column is actually an insertion or update operation, depending on whether or not a previous version of
the column exists. This means that to update the TTL for a column with an unknown value, you have to read the column
and then re-insert it with the new TTL value.
TTL columns have a precision of one second, as calculated on the server. Therefore, a very small TTL probably does
not make much sense. Moreover, the clocks on the servers should be synchronized; otherwise reduced precision could
be observed because the expiration time is computed on the primary host that receives the initial insertion but is then
interpreted by other hosts on the cluster.
An expiring column has an additional overhead of 8 bytes in memory and on disk (to record the TTL and expiration time)
compared to standard columns.
Counter columns
A counter is a special kind of column used to store a number that incrementally counts the occurrences of a particular
event or process. For example, you might use a counter column to count the number of times a page is viewed.
Counter column tables must use Counter data type. Counters may only be stored in dedicated tables.
After a counter is defined, the client application then updates the counter column value by incrementing (or
decrementing) it. A client update to a counter column passes the name of the counter and the increment (or decrement)
value; no timestamp is required.
62
64
About indexes
About indexes
An index is a data structure that allows for fast, efficient lookup of data matching a given condition.
65
ON users (lname);
SELECT * FROM users
WHERE fname = 'bob' AND lname = 'smith'
ALLOW FILTERING;
When there are multiple conditions in a WHERE clause, Cassandra selects the least-frequent occurrence of a condition
for processing first for efficiency. In this example, Cassandra queries on the last name first if there are fewer Smiths than
Bobs in the database or on the first name first if there are fewer Bobs than Smiths. When you attempt a potentially
expensive query, such as searching a range of rows, Cassandra requires the ALLOW FILTERING directive.
66
Denormalize to optimize
In the relational world, the data model is usually designed up front with the goal of normalizing the data to minimize
redundancy. Normalization typically involves creating smaller, well-structured tables and then defining relationships
between them. During queries, related tables are joined to satisfy the request.
Cassandra does not have foreign key relationships like a relational database does, which means you cannot join
multiple tables to satisfy a given query request. Cassandra performs best when the data needed to satisfy a given query
is located in the same table. Try to plan your data model so that one or more rows in a single table are used to answer
each query. This sacrifices disk space (one of the cheapest resources for a server) in order to reduce the number of disk
seeks and the amount of network traffic.
67
Querying Cassandra
Querying Cassandra
You use cqlsh for querying the Cassandra database from the command line. All of the commands included in CQL are
available on the cqlsh command line. Several cqlsh commands, however, are not included in CQL. The command table
of contents lists these commands by type.You can run cqlsh-only commands from only the command line.
This document describes CQL 3.0.0, cqlsh, and the Command Line Interface (CLI) for querying Cassandra.
Note
DataStax Enterprise 3.0.x supports CQL 3 in Beta form. Users need to refer to Cassandra 1.1 documentation for CQL
information.
Activating CQL 3
You activate the CQL mode in one of these ways:
Start cqlsh, a Python-based command-line client.
Use the set_sql_version Thrift method.
Specify the desired CQL mode in the connect() call to the Python driver:
connection = cql.connect('localhost:9160', cql_version='3.0')
CQL 3 supports compound keys and clustering. Also, super columns are not supported by either CQL version;
column_type and subcomparator arguments are not valid.
Running CQL
Developers can access CQL commands in a variety of ways. Drivers are available for Python, PHP, Ruby, Node.js, and
JDBC-based client programs. For the purposes of administrators, cqlsh is the most direct way to run simple CQL
commands. Using cqlsh, you can run CQL commands from the command line. The location of cqlsh is
<install_location>/bin for tarball installations, or /usr/bin for packaged installations.
When you start cqlsh, you can provide the IP address of a Cassandra node to connect to. The default is localhost. You
can also provide the RPC connection port (default is 9160), and the cql specification number.
68
Using a keyspace
After creating a keyspace, select the keyspace for use, just as you connect to a database in SQL:
cqlsh> USE demodb;
69
Creating a table
Next, create a table and populate it with data.
Creating a table
Continuing with the previous example, create a users table in the newly created keyspace:
CREATE TABLE users (
user_name varchar,
password varchar,
gender varchar,
session_token varchar,
state varchar,
birth_year bigint,
PRIMARY KEY (user_name));
The users table has a single primary key.
70
Column names
schema_keyspaces
local
peers
schema_columns
schema_columnfamilies
See (4
Keyspace Information
An alternative to the Thrift API describe_keyspaces function is querying the system tables directly in CQL 3. For
example, you can query the defined keyspaces:
SELECT * from system.schema_keyspaces;
The cqlsh output includes information about defined keyspaces. For example:
keyspace | durable_writes | name
| strategy_class | strategy_options
----------+----------------+---------+----------------+---------------------------history |
True | history | SimpleStrategy | {"replication_factor":"1"}
ks_info |
True | ks_info | SimpleStrategy | {"replication_factor":"1"}
You can also retrieve information about tables by querying system.schema_columnfamilies and about column metadata
by querying system.schema_columns.
Cluster information
You can query system tables to get cluster topology information. You can get the IP address of peer nodes, data center
and rack names, token values, and other information.
For example, after setting up a 3-node cluster using ccm on the Mac OSX, query the peers and local tables.
USE system;
select * from peers;
Output from querying the peers table looks something like this:
71
Retrieving columns
peer
| data_center | rack | release_version | ring_id
| rpc_address | schema_version
| tokens
-----------+-------------+-------+-----------------+-----------------+-------------+------------------+-------. . .
127.0.0.3 | datacenter1 | rack1 | 1.2.0-beta2
| 53d171bc-ff. . .| 127.0.0.3
| 59adb24e-f3 . . .| {3074. . .
127.0.0.2 | datacenter1 | rack1 | 1.2.0-beta2
| 3d19cd8f-c9. . .| 127.0.0.2
| 59adb24e-f3 . . .| {-3074. . .}
For more information about system keyspaces, see The data dictionary article.
Retrieving columns
To retrieve results, use the SELECT statement.
SELECT * FROM users WHERE first_name = 'jane' and last_name='smith';
72
73
Removing data
To remove data, you can set column values for automatic removal using the TTL (time-to-expire) table attribute. You
can also drop a table or keyspace, and delete keyspace column metadata.
74
Expiring columns
Both the INSERT and UPDATE commands support setting a time for data in a column to expire. The expiration time
(TTL) is set using CQL. The following example first shows an INSERT statement that sets a password column in the
users table to expire in 86400 seconds, or one day. If you wanted to extend the expiration period to five days, use the
UPDATE command as shown in the second example:
cqlsh:demodb> INSERT INTO users
(user_name, password)
VALUES ('cbrown', 'ch@ngem4a') USING TTL 86400;
cqlsh:demodb> UPDATE users USING TTL 432000 SET 'password' = 'ch@ngem4a'
WHERE user_name = 'cbrown';
Insertion
75
Addition
To add an element to a set, use the UPDATE command and the addition (+) operator:
UPDATE users
SET emails = emails + {'[email protected]'} WHERE user_id = 'frodo';
Deletion
To remove an element from a set, use the subtraction (-) operator.
UPDATE users
SET emails = emails - {'[email protected]'} WHERE user_id = 'frodo';
Retrieval
When you query a table containing a collection, Cassandra retrieves the collection in its entirety; consequently, keep
collections small enough to be manageable, or construct a data model to replace collections that can accommodate
large amounts of data.
To return the set of email belonging to frodo, for example:
SELECT user_id, emails FROM users WHERE user_id = 'frodo';
Cassandra returns results in an order based on the type of the elements in the collection. For example, a set of text
elements is returned in alphabetical order.
user_id | emails
---------+------------------------------------------------------------------frodo
| {"[email protected]","[email protected]","[email protected]"}
If you want elements of the collection returned in insertion order, use a list.
An empty set
To remove all elements from a set, you can use the UPDATE or DELETE statement:
UPDATE users SET emails = {} WHERE user_id = 'frodo';
DELETE emails FROM users WHERE user_id = 'frodo';
A set, list, or map needs to have at least one element; otherwise, Cassandra cannot distinguish the set from a null
value.
SELECT user_id, emails FROM users WHERE user_id = 'frodo';
user_id | emails
---------+-----------------------------------------------frodo
| null
76
Insertion
To add a list declaration to a table, add a column top_places of the list type to the users table:
ALTER TABLE users ADD top_places list<text>;
Next, use the UPDATE command to insert values into the list.
UPDATE users
SET top_places = [ 'rivendell', 'rohan' ] WHERE user_id = 'frodo';
Addition
To prepend an element to the list, enclose it in square brackets, and use the addition (+) operator:
UPDATE users
SET top_places = [ 'the shire' ] + top_places WHERE user_id = 'frodo';
To append an element to the list, switch the order of the new element data and the list name in the UPDATE command:
UPDATE users
SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo';
These update operations are implemented internally without any read-before-write. Appending and prepending a new
element to the list writes only the new element.
To add an element at a particular position, use the list index position in square brackets:
UPDATE users SET top_places[2] = 'riddermark' WHERE user_id = 'frodo';
When you add an element at a particular position, Cassandra reads the entire list, and then writes only the updated
element. Consequently, adding an element at a particular position results in greater latency than appending or prefixing
an element to a list.
Deletion
To remove an element from a list, use the DELETE command and the list index position in square brackets:
DELETE top_places[3] FROM users WHERE user_id = 'frodo';
To remove all elements having a particular value, use the UPDATE command, the subtraction operator (-), and the list
value in square brackets:
UPDATE users
SET top_places = top_places - ['riddermark'] WHERE user_id = 'frodo';
The former, indexed method of removing elements from a list requires a read internally. Using the UPDATE command
as shown here is recommended over emulating the operation client-side by reading the whole list, finding the indexes
that contain the value to remove, and then removing those indexes. This emulation would not be thread-safe. If another
thread/client prefixes elements to the list between the read and the write, the wrong elements are removed. Using the
UPDATE command as shown here does not suffer from that problem.
77
Retrieval
A query returns a list of top places.
SELECT user_id, top_places FROM users WHERE user_id = 'frodo';
Insertion
To add a simple todo list to every user profile in an existing users table, use the CREATE TABLE or ALTER statement,
specifying the map type and enclosing the data types for the name-value pair in angle brackets. For example, enclose
the timestamp and reminder text in angle brackets:
ALTER TABLE users ADD todo map<timestamp, reminder_text>;
Deletion
To delete an element from the map, use the DELETE command and enclose the timestamp of the element in square
brackets:
DELETE todo['2012-9-24'] FROM users WHERE user_id = 'frodo';
Retrieval
Like the output of a query on a set, the order of the output of a map is based on the type of the map. To retrieve the todo
map:
78
Indexing a column
Setting Expiration
Each element of the map is internally stored as one Cassandra column. Each element can have an individual TTL for
instance. If you want elements of the todo list to expire the day of their timestamp, you can compute the correct TTL and
set it as follows:
UPDATE users USING TTL <computed_ttl>
SET todo['2012-10-1'] = 'find water' WHERE user_id = 'frodo';
Indexing a column
You can use cqlsh can to create a secondary index (indexes on column values). This example creates an index on the
state and birth_year columns in the users table.
cqlsh:demodb> CREATE INDEX state_key ON users (state);
cqlsh:demodb> CREATE INDEX birth_year_key ON users (birth_year);
Because you created the secondary index on the two columns, the column values can be queried directly:
cqlsh:demodb> SELECT * FROM users
WHERE gender = 'f' AND
state = 'TX' AND
birth_year > 1968
ALLOW FILTERING;
79
Creating a keyspace
You can use the Cassandra CLI commands described in this section to create a keyspace. This example creates a
keyspace called demo, with a replication factor of 1 and using the SimpleStrategy replica placement strategy.
The single quotes around the string value of placement_strategy:
[default@unknown] CREATE KEYSPACE demo
with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
and strategy_options = {replication_factor:1};
You can verify the creation of a keyspace with the SHOW KEYSPACES command. The new keyspace is listed along
with the system keyspace and any other existing keyspaces.
80
CQL Name
Description
BytesType
blob
AsciiType
ascii
UTF8Type
text, varchar
IntegerType
varint
Arbitrary-precision integer
Int32Type
int
4-byte integer
InetAddressType
inet
LongType
bigint
8-byte long
UUIDType
uuid
TimeUUIDType
timeuuid
DateType
timestamp
BooleanType
boolean
true or false
FloatType
float
DoubleType
double
DecimalType
decimal
Variable-precision decimal
CounterColumnType
counter
About validators
Using the CLI you can define a default row key validator for a table using the key_validation_class property. Using CQL,
you use built-in key validators to validate row key values. For static tables, define each column and its associated type
when you define the table using the column_metadata property.
Key and column validators may be added or changed in a table definition at any time. If you specify an invalid validator
on your table, client requests that respect that metadata are confused, and data inserts or updates that do not conform
to the specified validator are rejected.
You cannot know the column names of dynamic tables ahead of time, so specify a default_validation_class instead of
defining the per-column data types.
Key and column validators can be added or changed in a table definition at any time. If you specify an invalid validator
on the table, client requests that respect that metadata get confused, and data inserts or updates that do not conform to
the specified validator are rejected.
81
Creating a table
First, connect to the keyspace where you want to define the table with the USE command.
[default@unknown] USE demo;
In this example, we create a users table in the demo keyspace. This table defines a few columns: full_name, email,
state, gender, and birth_year. This is considered a static table because the column names are specified and most rows
are expected to have more-or-less the same columns.
Notice the settings of comparator, key_validation_class and validation_class. These values set the default encoding
used for column names, row key values and column values. In the case of column names, the comparator also
determines the sort order. To create a table using the CLI, you use the column family keyword.
[default@unknown] USE demo;
[default@demo] CREATE COLUMN FAMILY users
WITH comparator = UTF8Type
AND key_validation_class=UTF8Type
AND column_metadata = [
{column_name: full_name, validation_class: UTF8Type}
{column_name: email, validation_class: UTF8Type}
{column_name: state, validation_class: UTF8Type}
{column_name: gender, validation_class: UTF8Type}
{column_name: birth_year, validation_class: LongType}
];
Next, create a dynamic table called blog_entry. Notice that here we do not specify column definitions as the column
names are expected to be supplied later by the client application.
[default@demo] CREATE COLUMN FAMILY blog_entry
WITH comparator = TimeUUIDType
AND key_validation_class=UTF8Type
AND default_validation_class = UTF8Type;
82
Note
The Cassandra CLI sets the consistency level for the client. The level defaults to ONE for all write and read
operations. For more information, see About data consistency.
83
Indexing a column
The CLI can be used to create secondary indexes (indexes on column values). You can add a secondary index when
you create a table or add it later using the UPDATE COLUMN FAMILY command.
For example, to add a secondary index to the birth_year column of the users column family:
[default@demo] UPDATE COLUMN FAMILY users WITH comparator = UTF8Type
AND column_metadata =
[{column_name: birth_year,
validation_class: LongType,
index_type: KEYS
}
];
Because of the secondary index created for the column birth_year, its values can be queried directly for users born in a
given year as follows:
[default@demo] GET users WHERE birth_year = 1969;
84
85
CQL 3 Reference
CQL 3 Reference
This document covers the CQL 3.0.1 reference for querying Cassandra. For information about CQL 2.0.0, see the CQL
reference for Cassandra 1.0.
Uppercase/lowercase sensitivity
Keyspace, column, and table names created using CQL 3 are case-insensitive unless enclosed in double quotation
marks. If you enter names for these objects using any uppercase letters, Cassandra stores the names in lowercase. You
can force the case by using double quotation marks. For example:
CREATE TABLE test (
Foo int PRIMARY KEY,
"Bar" int
);
86
CQL 3 Reference
The following table shows partial queries that work and do not work to return results from the test table:
Queries that Work
Valid literals
Valid literal consist of these kinds of values:
blob: hexidecimal.
boolean: true or false, case-insensitive, and in CQL 3, enclosure in single quotation marks is not required prior to
release 1.2.2. In 1.2.2 and later, using quotation marks is not allowed.
A numeric constant can consist of integers 0-9 and a minus sign prefix. A numeric constant can also be float. A
float can be a series of one or more decimal digits, followed by a period, ., and one or more decimal digits. There
is no optional + sign. The forms .42 and 42 are unacceptable. You can use leading or trailing zeros before and
after decimal points. For example, 0.42 and 42.0. A float constant, expressed in E notation, consists of the
characters in this regular expression:
'-'?[0-9]+('.'[0-9]*)?([eE][+-]?[0-9+])?
The next section presents an example.
identifier: A letter followed by any sequence of letters, digits, or the underscore. Names of tables, columns, and
other objects are identifiers. Enclose them in double quotation marks.
integer: An optional minus sign, -, followed by one or more digits.
string literal: Characters enclosed in single quotation marks. To use a single quotation mark itself in a string literal,
escape it using a single quotation mark. For example, use '' to make dog plural: dog''s.
uuid: 32 hex digits, 0-9 or a-f, which are case-insensitive, separated by dashes, -, after the 8th, 12th, 16th, and
20th digits. For example: 01234567-0123-0123-0123-0123456789ab
timeuuid: Uses the time in 100 nanosecond intervals since 00:00:00.00 UTC (60 bits), a clock sequence number
for prevention of duplicates (14 bits), plus the IEEE 801 MAC address (48 bits) to generate a unique identifier. For
example: d2177dd0-eaa2-11de-a572-001b779c76e3
whitespace: Separates terms and used inside string literals, but otherwise CQL ignores whitespace.
CQL 3 Reference
Cassandra 1.2.1 and later supports exponential notation. This example shows exponential notation in the output from a
cqlsh 3.0.1 command:
CREATE TABLE test(
id varchar PRIMARY KEY,
value_double double,
value_float float
);
INSERT INTO test (id, value_float, value_double)
VALUES ('test1', -2.6034345E+38, -2.6034345E+38);
SELECT * FROM test;
id
| value_double | value_float
-------+--------------+------------test1 | -2.6034e+38 | -2.6034e+38
CQL Keywords
Here is a list of keywords and whether or not the words are reserved. A reserved keyword cannot be used as an
identifier unless you enclose the word in double quotation marks. Non-reserved keywords have a specific meaning in
certain context but can be used as an identifier outside this context.
Word
Reserved
ADD
yes
ALL
no
ALLOW
yes
ALTER
yes
AND
yes
APPLY
yes
ASC
yes
ASCII
no
AUTHORIZE
yes
BATCH
yes
BEGIN
yes
BIGINT
no
BLOB
no
BOOLEAN
no
BY
yes
CLUSTERING
no
COLUMNFAMILY
yes
COMPACT
no
COUNT
no
COUNTER
no
CREATE
yes
88
CQL 3 Reference
DECIMAL
no
DELETE
yes
DESC
yes
DOUBLE
no
DROP
yes
FILTERING
no
FLOAT
no
FROM
yes
GRAMT
yes
IN
yes
INDEX
yes
INET
yes
INSERT
yes
INT
no
INTO
yes
KEY
no
KEYSPACE
yes
KEYSPACES
yes
LIMIT
yes
LIST
no
MAP
no
MODIFY
yes
NORECURSIVE
yes
NOSUPERUSER
no
OF
yes
ON
yes
ORDER
yes
PASSWORD
yes
PERMISSION
no
PERMISSIONS
no
PRIMARY
yes
RENAME
yes
REVOKE
yes
SCHEMA
yes
SELECT
yes
SET
yes
STORAGE
no
89
SUPERUSER
no
TABLE
yes
TEXT
no
TIMESTAMP
no
TIMEUUID
no
TO
yes
TOKEN
yes
TRUNCATE
yes
TTL
no
TYPE
no
UNLOGGED
yes
UPDATE
yes
USE
yes
USER
no
USERS
no
USING
yes
UUID
no
VALUES
no
VARCHAR
no
VARINT
no
WHERE
yes
WITH
yes
WRITETIME
no
Constants
Description
ascii
strings
bigint
integers
blob
blobs
boolean
booleans
true or false
counter
integers
decimal
integers, floats
Variable-precision decimal
double
integers
float
integers, floats
inet
strings
90
int
integers
list
n/a
map
n/a
set
n/a
text
strings
timestamp
integers, strings
uuid
uuids
timeuuid
uuids
varchar
strings
varint
integers
Arbitrary-precision integer
In addition to the CQL types listed in this table, you can use a string containing the name of a JAVA class (a sub-class of
AbstractType loadable by Cassandra) as a CQL type. The class name should either be fully qualified or relative to the
org.apache.cassandra.db.marshal package.
Enclose ASCII text, timestamp, and inet values in single quotation marks. Enclose names of a keyspace, table, or
column in double quotation marks.
Blob
Cassandra 1.2.3 still supports blobs as string constants for input (to allow smoother transition to blob constant). Blobs as
strings are now deprecated and will not be supported in the near future. If you were using strings as blobs, update your
client code to switch to blob constants.
A blob constant is an hexadecimal number defined by 0[xX](hex)+ where hex is an hexadecimal character, e.g.
[0-9a-fA-F]. For example, 0xcafe.
\\not allowed
Currently, you cannot create a secondary index on a column of type map, set, or list.
Timeuuid type
A value of the timeuuid type is a Type 1 UUID. A type 1 UUID includes the time of its generation and are sorted by
timestamp, making them ideal for use in applications requiring conflict-free timestamps. For example, you can use this
type to identify a column (such as a blog entry) by its timestamp and allow multiple clients to write to the same row key
simultaneously. Collisions that would potentially overwrite data that was not intended to be overwritten cannot occur.
A valid timeuuid conforms to the timeuuid format shown in valid expressions.
Timeuuid functions
You can use these functions with the timeuuid type:
dateOf()
Used in a SELECT clause, this function extracts the timestamp of a timeuuid column in a resultset. This function
returns the extracted timestamp as a date. Use unixTimestampOf() to get a raw timestamp.
now()
Generates a new unique timeuuid when the statement is executed. This method is useful for inserting values. The
value returned by now() is guaranteed to be unique.
minTimeuuid() and maxTimeuuid()
Returns a UUID-like result given a conditional time component as an argument. For example:
SELECT * FROM myTable
WHERE t > maxTimeuuid('2013-01-01 00:05+0000')
AND t < minTimeuuid('2013-02-02 10:00+0000')
This example selects all rows where the timeuuid column, t, is strictly later than 2013-01-01 00:05+0000 but
strictly earlier than 2013-02-02 10:00+0000. The t >= maxTimeuuid('2013-01-01 00:05+0000') does not select a
timeuuid generated exactly at 2013-01-01 00:05+0000 and is essentially equivalent to t >
maxTimeuuid('2013-01-01 00:05+0000').
The values returned by minTimeuuid and maxTimeuuid functions are not true UUIDs in that the values do not
conform to the Time-Based UUID generation process specified by the RFC 4122.
Warning
The values returned by these methods are not unique. Use these methods for querying only. Inserting the result
of these methods in the database is not recommended.
unixTimestampOf()
Used in a SELECT clause, this functions extracts the timestamp of a timeuuid column in a resultset. Returns the
value as a raw, 64-bit integer timestamp.
Timestamp type
Values for the timestamp and timeuuid types are encoded as 64-bit signed integers representing a number of
milliseconds since the standard base time known as the epoch: January 1 1970 at 00:00:00 GMT. Timestamp and
timeuuid types can be entered as integers for CQL input.
92
counter type
To use counter types, see the DataStax blog about counters. Do not assign this type to a column that serves as the
primary key. Also, do not use the counter type in a table that contains anything other than counter types (and primary
key). To generate sequential numbers for surrogate keys, use the timeuuid type instead of the counter type.
Format
bloom_filter_fp_chance
name: value
caching
name: value
comment
name: value
compaction
map
compression
map
dclocal_read_repair_chance
name: value
gc_grace_seconds
name: value
read_repair_chance
name: value
replicate_on_write
name: value
CQL comments
Comments can be used to document CQL statements in your application code. Single line comments can begin with a
double dash (--) or a double slash (//) and extend to the end of the line. Multi-line comments can be enclosed in /*
and */ characters.
Supported Strategy
bucket_high
SizeTieredCompactionStrategy
bucket_low
SizeTieredCompactionStrategy
max_threshold
SizeTieredCompactionStrategy
min_threshold
SizeTieredCompactionStrategy
min_sstable_size
SizeTieredCompactionStrategy
sstable_size_in_mb
LeveledCompactionStrategy
94
tombstone_compaction_interval
all
tombstone_threshold
all
cqlsh Commands
ALTER KEYSPACE
ASSUME
ALTER TABLE
CAPTURE
ALTER USER
CONSISTENCY
BATCH
COPY
CREATE INDEX
DESCRIBE
CREATE KEYSPACE
EXIT
CREATE TABLE
SHOW
CREATE USER
SOURCE
DELETE
TRACING
DROP INDEX
DROP KEYSPACE
DROP TABLE
DROP USER
GRANT
INSERT
LIST PERMISSIONS
LIST USERS
REVOKE
SELECT
TRUNCATE
UPDATE
USE
ALTER KEYSPACE
95
ALTER TABLE
Change property values of a keyspace.
Synopsis
ALTER KEYSPACE | SCHEMA keyspace_name
WITH REPLICATION = map
| WITH DURABLE_WRITES = true | false
| WITH REPLICATION = map
AND DURABLE_WRITES = true | false
map is a map collection:
{ property : value, property, value : property, value ... }
Synopsis legend
In the synopsis section of each statement, formatting has the following meaning:
Uppercase means literal
Lowercase means not literal
Italics mean optional
The pipe (|) symbol means OR or AND/OR
Ellipsis (...) means repeatable
means a non-literal, open parenthesis used to indicate scope
means a non-literal, close parenthesis used to indicate scope
A semicolon that terminates CQL statements is not included in the synopsis.
Description
ALTER KEYSPACE changes the map that defines the replica placement strategy and/or the durable_writes value. You
can also use the alias ALTER SCHEMA. Use these properties and values to construct the map. To set the replica
placement strategy, construct a map of properties and values, as shown in the table of map properties on the CREATE
KEYSPACE reference page.
You cannot change the name of the keyspace.
Example
Continuing with the example in CREATE KEYSPACE, change the definition of the Excalibur keyspace to use the
SimpleStrategy and a replication factor of 3.
ALTER KEYSPACE "Excalibur" WITH REPLICATION =
{ 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
ALTER TABLE
Modifies the column metadata of a table.
Synopsis
96
ALTER TABLE
Description
ALTER TABLE manipulates the table metadata. You can change the data storage type of columns, add new columns,
drop existing columns, and change table properties. No results are returned.
You can also use the alias ALTER COLUMNFAMILY.
See CQL data types for the available data types and CQL 3 table storage properties for column properties and their
default values.
First, specify the name of the table to be changed after the ALTER TABLE keywords, followed by the type of change:
ALTER, ADD, DROP, RENAME, or WITH. Next, provide the rest of the needed information, as explained in the following
sections.
You can qualify table names by keyspace. For example, to alter the addamsFamily table in the monsters keyspace:
ALTER TABLE monsters.addamsFamily ALTER lastKnownLocation TYPE uuid;
Examples:
Changing the type of a column
Adding a column
Dropping a column, not available in this release.
Modifying table options
Renaming a column
Modifying the compression or compaction setting
Adding a column
97
ALTER USER
To add a column, other than a column of a collection type, to a table, use ALTER TABLE and the ADD keyword in the
following way:
ALTER TABLE addamsFamily ADD gravesite varchar;
To add a column of the collection type:
ALTER TABLE users ADD top_places list<text>;
The column may or may not already exist in current rows. No validation of existing data occurs.
These additions to a table are not allowed:
Adding a column having the same name as an existing column.
Adding columns to tables defined with COMPACT STORAGE.
Dropping a column
This feature is not ready in Cassandra 1.2 but will be available in a subsequent version.
To drop a column from the table, use ALTER TABLE and the DROP keyword in the following way:
ALTER TABLE addamsFamily DROP gender;
Dropping a column removes the column from current rows.
Renaming a column
The main purpose of the RENAME clause is to change the names of CQL 3-generated row key and column names that
are missing from a legacy tables.
ALTER USER
Alter existing user options.
98
BATCH
Synopsis
ALTER USER user_name
WITH PASSWORD 'password' NOSUPERUSER | SUPERUSER
Synopsis legend
Description
Superusers can change a user's password or superuser status. To prevent disabling all superusers, superusers cannot
change their own superuser status. Ordinary users can change only their own password.
Enclose the user name in single quotation marks if it contains non-alphanumeric characters. Enclose the password in
single quotation marks.
Example
ALTER USER moss WITH PASSWORD 'bestReceiver';
BATCH
Writes multiple DML statements and sets a client-supplied timestamp for all columns written by the statements in the
batch.
Synopsis
BEGIN BATCH
| BEGIN UNLOGGED
| BEGIN COUNTER
USING TIMESTAMP timestamp;
dml_statement
dml_statement
...
APPLY BATCH;
dml_statement is:
INSERT
UPDATE
DELETE
Synopsis legend
Description
A BATCH statement combines multiple data modification language (DML) statements (INSERT, UPDATE, DELETE)
into a single logical operation. Batching multiple statements saves network exchanges between the client/server and
server coordinator/replicas.
In Cassandra 1.2 and later, batches are atomic by default. In the context of a Cassandra batch operation, atomic means
that if any of the batch succeeds, all of it will. To achieve atomicity, Cassandra first writes the serialized batch to the
batchlog system table that consumes the serialized batch as blob data. When the rows in the batch have been
successfully written and persisted (or hinted) the batchlog data is removed. There is a performance penalty for atomicity.
If you do not want to incur this penalty, prevent Cassandra from writing to the batchlog system by using the UNLOGGED
option: BEGIN UNLOGGED BATCH
99
CREATE INDEX
Although an atomic batch guarantees that if any part of the batch succeeds, all of it will, no other transactional
enforcement is done at the batch level. For example, there is no batch isolation. Other clients are able to read the first
updated rows from the batch, while other rows are in progress. However, transactional row updates within a single row
are isolated: a partial row update cannot be read.
Using a timestamp
BATCH supports setting a client-supplied timestamp, an integer, in the USING clause that is used by all batched
operations. If not specified, the current time of the insertion (in microseconds) is used. Individual DML statements inside
a BATCH cannot specify a timestamp.
Individual statements can specify a TTL (time to live). TTL columns are automatically marked as deleted (with a
tombstone) after the requested amount of time has expired.
Example
BEGIN BATCH
INSERT INTO users (userID, password, name) VALUES ('user2', 'ch@ngem3b', 'second user')
UPDATE users SET password = 'ps22dhds' WHERE userID = 'user2'
INSERT INTO users (userID, password) VALUES ('user3', 'ch@ngem3c')
DELETE name FROM users WHERE userID = 'user2'
INSERT INTO users (userID, password, name) VALUES ('user4', 'ch@ngem3c', 'Andrew')
APPLY BATCH;
CREATE INDEX
Define a new, secondary index on a single column of a table.
Synopsis
CREATE INDEX index_name
ON keyspace_name.table_name ( column_name )
Synopsis legend
Description
CREATE INDEX creates a new, automatic secondary index on the given table for the named column. Optionally, specify
a name for the index itself before the ON keyword. Enclose a single column name in parentheses. It is not necessary for
the column to exist on any current rows. The column and its data type must be specified when the table is created, or
added afterward by altering the table.
If data already exists for the column, Cassandra indexes the data during the execution of this statement. After the index
is created, Cassandra indexes new data for the column automatically when new data is inserted.
In this release, Cassandra supports creating an index on a table having a compound primary key. You cannot create a
secondary index on the primary key itself. Cassandra does not support secondary indexes on collections.
100
CREATE KEYSPACE
Examples
Define a table and then create a secondary index on two of its named columns:
CREATE TABLE myschema.users (
userID uuid,
fname text,
lname text,
email text,
address text,
zip int,
state text,
PRIMARY KEY (userID)
);
CREATE INDEX user_state
ON myschema.users (state);
CREATE INDEX ON myschema.users (zip);
Define a table having a compound primary key and create a secondary index on it.
USE myschema;
CREATE TABLE timeline (
user_id varchar,
email_id uuid,
author varchar,
body varchar,
PRIMARY KEY (user_id, email_id)
);
CREATE INDEX ON timeline (author);
CREATE KEYSPACE
Define a new keyspace and its replica placement strategy.
Synopsis
CREATE KEYSPACE | SCHEMA keyspace_name WITH REPLICATION = map
AND DURABLE_WRITES = true | false
map is a map collection:
{ property : value, property, value : property, value ... }
Synopsis legend
Description
CREATE KEYSPACE creates a top-level namespace and sets the keyspace name, replica placement strategy class,
replication options, and durable_writes options for the keyspace. Keyspace names are 32 or fewer alpha-numeric
characters and underscores, the first of which is an alpha character. Keyspace names are case-insensitive. To make a
name case-sensitive, enclose it in double quotation marks.
101
CREATE KEYSPACE
To set the replica placement strategy, construct a map of properties and values.
Value
Value Description
'class'
'SimpleStrategy' or 'NetworkTopologyStrategy'
'replication_factor'
An integer
'<datacenter
name>'
An integer
Required if class is
NetworkTopologyStrategy; otherwise, not
used. The number of replicas of data on
each node in the data center.
'<datacenter
name>'
An integer
Optional if class is
NetworkTopologyStrategy. The number of
replicas of data on each node in the data
center.
...
...
102
CREATE TABLE
Cassandra converted the excelsior keyspace to lowercase because quotation marks were not used to create the
keyspace and retained the initial capital letter for the Excalibur because quotation marks were used.
CREATE TABLE
Define a new table.
Synopsis
CREATE TABLE keyspace_name.table_name
( column_definition, column_definition, ...)
WITH property AND property ...
column_definition is:
column_name cql_type
| column_name cql_type PRIMARY KEY
| PRIMARY KEY ( partition_key )
| column_name collection_type
cql_type is a type, other than a collection or counter type type, listed in CQL data types. Exceptions: ADD supports a
collection type and also, if the table is a counter, a counter type.
partition_key is:
column_name
| ( column_name1, column_name2, column_name3 ...)
| ((column_name1, column_name2), column_name3,
column_name4 . . .)
column_name1 is the partition key. column_name2, column_name3 ... are clustering keys. column_name1,
column_name2 are partitioning keys. column_name3, column_name4... are clustering keys.
collection_type is:
103
CREATE TABLE
LIST <cql_type>
| SET <cql_type>
| MAP <cql_type, cql_type>
property is a one of the CQL table storage options or a directive. A directive is either:
COMPACT STORAGE
CLUSTERING ORDER followed by the clustering order specification.
Synopsis legend
Description
CREATE TABLE creates a new table under the current keyspace. You can also use the alias CREATE
COLUMNFAMILY. Valid table names are strings of alphanumeric characters and underscores, which begin with a letter.
If you add the keyspace name followed by a period to the name of the table, Cassandra creates the table in the
specified keyspace, but does not change the current keyspace; otherwise, if you do not use a keyspace name,
Cassandra creates the table within the current keyspace.
Examples:
Defining a primary key column
Using compound primary keys
Defining columns
Setting table options
Using compact storage
Using clustering order
Using collections: set, list, and map
104
CREATE TABLE
Defining columns
You assign columns a type during table creation. Column types, other than collection-type columns, are specified as a
parenthesized, comma-separated list of column name and type pairs. See CQL data types for the available types.
This example shows how to create a table that includes collection-type columns: map, set, and list.
CREATE TABLE users (
userid text PRIMARY KEY,
first_name text,
last_name text,
emails set<text>,
top_scores list<int>,
todo map<timestamp, text>
);
For information about using collections, see Using collections: set, list, and map.
105
CREATE TABLE
106
CREATE USER
CREATE USER
Create a new user.
Synopsis
CREATE USER user_name
WITH PASSWORD 'password' NOSUPERUSER | SUPERUSER
Synopsis legend
Description
CREATE USER defines a new database user account. By default users accounts do not have superuser status. Only a
superuser can issue CREATE USER requests.
User accounts are required for logging in under internal authentication and authorization.
Enclose the user name in single quotation marks if it contains non-alphanumeric characters. You cannot recreate an
existing user. To change the superuser status or password, use ALTER USER.
DELETE
Removes entire rows or one or more columns from one or more rows.
Synopsis
DELETE column_name, ... | collection_colname [ term ]
USING TIMESTAMP integer
WHERE row_specification
term is:
[ list_index_position | [ list_value ]
row_specification is:
primary_key_name = key_value
primary_key_name IN ( key_value, key_value, ...)
Synopsis legend
107
FROM keyspace_name.table_name
CREATE USER
Description
A DELETE statement removes one or more columns from one or more rows in a table, or it removes the entire row if no
columns are specified. Cassandra applies seletions within the same partition key atomically and in isolation.
Specifying Columns
After the DELETE keyword, optionally list column names, separated by commas.
DELETE col1, col2, col3 FROM Planeteers WHERE userID = 'Captain';
When no column names are specified, the entire row(s) specified in the WHERE clause are deleted.
DELETE FROM MastersOfTheUniverse WHERE mastersID IN ('Man-At-Arms', 'Teela');
When a column is deleted, it is not removed from disk immediately. The deleted column is marked with a tombstone and
then removed after the configured grace period has expired. The optional timestamp defines the new tombstone record.
See About deletes for more information about how Cassandra handles deleted columns and rows.
Example
DELETE phone FROM users WHERE user_name IN ('jdoe', 'jsmith');
108
DROP TABLE
To remove all elements from a set, you can use the DELETE statement:
DELETE emails FROM users WHERE user_id = 'frodo';
DROP TABLE
Removes the named table.
Synopsis
DROP TABLE keyspace_name.table_name
Synopsis legend
Description
A DROP TABLE statement results in the immediate, irreversible removal of a table, including all data contained in the
table. You can also use the alias DROP COLUMNFAMILY.
Example
DROP TABLE worldSeriesAttendees;
DROP INDEX
Drops the named secondary index.
Synopsis
DROP INDEX name
Synopsis legend
Description
A DROP INDEX statement removes an existing secondary index. If the index was not given a name during creation, the
index name is <table_name>_<column_name>_idx.
Example
DROP INDEX user_state;
DROP INDEX users_zip_idx;
DROP KEYSPACE
Removes the keyspace.
Synopsis
109
DROP USER
Description
A DROP KEYSPACE statement results in the immediate, irreversible removal of a keyspace, including all tables and
data contained in the keyspace. You can also use the alias DROP SCHEMA.
Example
DROP KEYSPACE MyTwitterClone;
DROP USER
Synopsis
DROP USER user_name
Synopsis legend
Description
DROP USER removes an existing user. You have to be logged in as a superuser to issue a DROP USER statement. A
user cannot drop themselves.
Enclose the user name in single quotation marks only if it contains non-alphanumeric characters.
GRANT
Provides users access to database objects.
Synopsis
GRANT permission_name PERMISSION
| GRANT ALL PERMISSIONS
ON resource TO user
permission_name is one of these:
ALTER
AUTHORIZE
CREATE
DROP
MODIFY
SELECT
resource is one of these:
ALL KEYSPACES
110
INSERT
KEYSPACE keyspace_name
TABLE keyspace_name.table_name
Synopsis legend
Description
Permissions to access all keyspaces, a named keyspace, or a table can be granted to a user. Enclose the user name in
single quotation marks if it contains non-alphanumeric characters.
This table lists the permissions needed to use CQL statements:
Permission
CQL Statements
ALTER
AUTHORIZE
GRANT, REVOKE
CREATE
DROP
MODIFY
SELECT
SELECT
To be able to perform SELECT queries on a table, you have to have SELECT permission on the table, on its parent
keyspace, or on ALL KEYSPACES. To be able to CREATE TABLE you need CREATE permission on its parent
keyspace or ALL KEYSPACES. You need to be a superuser or to have AUTHORIZE permission on a resource (or one
of its parents in the hierarchy) plus the permission in question to be able to GRANT or REVOKE that permission to or
from a user. GRANT, REVOKE and LIST permissions check for the existence of the table and keyspace before
execution. GRANT and REVOKE check that the user exists.
Examples
Give 'spillman' permission to perform SELECT queries on all tables in all keyspaces:
GRANT SELECT ON ALL KEYSPACES TO spillman;
Give 'akers' permission to perform INSERT, UPDATE, DELETE and TRUNCATE queries on all tables in the 'field'
keyspace:
GRANT MODIFY ON KEYSPACE field TO akers;
Give 'boone' permission to perform ALTER KEYSPACE queries on the 'forty9ers' keyspace, and also ALTER TABLE,
CREATE INDEX and DROP INDEX queries on all tables in 'forty9ers' keyspace:
GRANT ALTER ON KEYSPACE forty9ers TO boone;
Give 'boone' permission to run all types of queries on ravens.plays table:
GRANT ALL PERMISSIONS ON ravens.plays TO boone;
To grant access to a keyspace to just one user, assuming nobody else has ALL KEYSPACES access, you use this
statement:
GRANT ALL ON KEYSPACE keyspace_name TO user_name
INSERT
Adds or updates one or more columns in the identified row of a table.
111
INSERT
Synopsis
INSERT INTO keyspace_name.table_name
( identifier, identifier...)
VALUES ( value, value ... )
USING option AND option
identifier is a column or a collection name.
Value is one of:
a column value
a set:
{ item1, item2, . . . }
a list:
[ name, value ]
a map:
{ name : value, name : value, . . . }
option is one of:
TIMESTAMP string
TTL seconds
Synopsis legend
Description
An INSERT writes one or more columns to a record in a Cassandra table atomically and in isolation. No results are
returned. You do not have to define all columns, except those that make up the key. Missing columns occupy no space
on disk.
If the column exists, it is updated. You can qualify table names by keyspace. INSERT does not support counters, but
UPDATE does.
Example
INSERT INTO playlists (id, song_id, title, artist, album)
VALUES (62c36092-82a1-3a00-93d1-46196ee77204,
a3e64f8f-bd44-4f28-b8d9-6938726e34d4, 'La Grange', 'ZZ Top', 'Tres Hombres');
INSERT INTO playlists (id, song_id, title, artist, album)
VALUES (62c36092-82a1-3a00-93d1-46196ee77204,
8a172618-b121-4136-bb10-f665cfc469eb, 'Moving in Stereo', 'Fu Manchu', 'We Must Obey');
INSERT INTO playlists (id, song_id, title, artist, album)
VALUES (62c36092-82a1-3a00-93d1-46196ee77204,
2b09185b-fb5a-4734-9b56-49077de9edbf, 'Outside Woman Blues', 'Back Door Slam', 'Roll Away');
112
LIST PERMISSIONS
Timestamp
INSERT INTO Hollywood.NerdMovies (user_uuid, fan)
VALUES (cfd66ccc-d857-4e90-b1e5-df98a3d40cd6, 'johndoe')
USING TTL 86400;
TTL input is in seconds. TTL column values are automatically marked as deleted (with a tombstone) after the requested
amount of time has expired. TTL marks the inserted values, not the column itself, for expiration. Any subsequent update
of the column resets the TTL to the TTL specified in the update. By default, values never expire.
The TIMESTAMP input is in one of the following formats:
A string must be in ISO 8601 format.
An integer representing microseconds.
If not specified, the time (in microseconds) that the write occurred to the column is used.
LIST PERMISSIONS
Lists permissions granted to a user.
Synopsis
LIST permission_name PERMISSION
| LIST ALL PERMISSIONS
ON resource OF user_name
NORECURSIVE
permission_name is one of these:
ALTER
AUTHORIZE
CREATE
DROP
MODIFY
SELECT
resource is one of these:
ALL KEYSPACES
KEYSPACE keyspace_name
TABLE keyspace_name.table_name
113
LIST PERMISSIONS
Synopsis legend
Description
Permissions checks are recursive. If you omit the NORECURSIVE specifier, permission on the requests resource and
its parents in the hierarchy are shown.
Omitting the resource name (ALL KEYSPACES, keyspace, or table), lists permissions on all tables and all
keyspaces.
Omitting the user name lists permissions of all users. You need to be a superuser to list permissions of all users. If
you are not, you must add of <myusername>.
Omitting the NORECURSIVE specifier, lists permissions on the resource and its parent resources.
Enclose the user name in single quotation marks only if it contains non-alphanumeric characters.
After creating users in Creating internal user accounts and granting the permissions in the GRANT examples, you can
list permissions that users have on resources and their parents.
Example
Assuming you completed the examples in Examples, list all permissions given to akers:
LIST ALL PERMISSIONS OF akers;
Output
username | resource
| permission
----------+--------------------+-----------akers | <keyspace field>
|
MODIFY
List permissions given to all the users:
LIST ALL PERMISSIONS;
Output
username | resource
| permission
----------+----------------------+-----------akers |
<keyspace field> |
MODIFY
boone | <keyspace forty9ers> |
ALTER
boone | <table ravens.plays> |
CREATE
boone | <table ravens.plays> |
ALTER
boone | <table ravens.plays> |
DROP
boone | <table ravens.plays> |
SELECT
boone | <table ravens.plays> |
MODIFY
boone | <table ravens.plays> | AUTHORIZE
spillman |
<all keyspaces> |
SELECT
List all permissions on the plays table:
LIST ALL PERMISSIONS ON ravens.plays;
username | resource
| permission
----------+----------------------+-----------boone | <table ravens.plays> |
CREATE
boone | <table ravens.plays> |
ALTER
boone | <table ravens.plays> |
DROP
boone | <table ravens.plays> |
SELECT
boone | <table ravens.plays> |
MODIFY
boone | <table ravens.plays> | AUTHORIZE
114
LIST USERS
spillman |
<all keyspaces> |
SELECT
LIST USERS
Lists existing users and their superuser status.
Synopsis
LIST USERS
Synopsis legend
Description
Assuming you use internal authentication, created the users in Creating internal user accounts, and have not yet
changed the default user, the following example shows the output of LIST USERS.
Example
LIST USERS;
Output
name
| super
-----------+------cassandra | True
boone | False
akers | True
spillman | False
REVOKE
Synopsis
REVOKE permission_name PERMISSION
| REVOKE ALL PERMISSIONS
ON resource FROM user_name
permission_name is one of these:
ALTER
115
SELECT
AUTHORIZE
CREATE
DROP
MODIFY
SELECT
resource is one of these:
ALL KEYSPACES
KEYSPACE keyspace_name
TABLE keyspace_name.table_name
Synopsis legend
Description
Permissions to access all keyspaces, a named keyspace, or a table can be revoked from a user. Enclose the user name
in single quotation marks if it contains non-alphanumeric characters.
This table lists the permissions needed to use CQL statements:
Permission
CQL Statements
ALTER
AUTHORIZE
GRANT, REVOKE
CREATE
DROP
MODIFY
SELECT
SELECT
Example
REVOKE SELECT ON ravens.plays FROM boone;
The user boone can no longer perform SELECT queries on the ravens.plays table. Exceptions: Because of inheritance,
the user can perform SELECT queries on revens.plays if one of these conditions is met:
The user is a superuser
The user has SELECT on ALL KEYSPACES permissions
The user has SELECT on the ravens keyspace
SELECT
Retrieves data, including Solr data, from a Cassandra table.
Synopsis
116
SELECT
SELECT select_expression
FROM keyspace_name.table_name
WHERE clause AND clause ...
ORDER BY compound_key_2 ASC | DESC
LIMIT n
ALLOW FILTERING
select expression is:
selection_list
| COUNT ( * | 1 )
selection_list is:
selector , selector, ...| *
selector is:
WRITETIME (col_name)
| TTL (col_name) | * | function (selector , selector, ...)
function is a timeuuid function, a token function, or a blob conversion function.
WHERE clause syntax is:
relation AND relation
primary_key_name = | < | > | <= | >= key_value
| primary_key_name> IN ( key_value,... )
| TOKEN (partitioner_key) | < | > | <= | >= term | TOKEN ( term )
Synopsis legend
Description
A SELECT statement reads one or more records from a Cassandra table. The input to the SELECT statement is the
select expression. The output of the select statement depends on the select expression:
Select Expression
Output
One row with a column that has the value of the number of rows in the resultset
WRITETIME function
TTL function
Specifying columns
The SELECT expression determines which columns, if any, appear in the result. Using the asterisk specifies selection of
all columns:
SELECT * from People;
Select two columns, Name and Occupation, from three rows having employee ids (primary key) 199, 200, or 207:
SELECT Name, Occupation FROM People WHERE empID IN (199, 200, 207);
A simple form is a comma-separated list of column names. The list can consist of a range of column names.
117
SELECT
118
SELECT
119
SELECT
Or, create an index on playlist artists, and use this query to get titles of Fu Manchu songs on the playlist:
CREATE INDEX ON playlists(artist)
SELECT title FROM playlists WHERE artist = 'Fu Manchu';
Output
title
-----------------Ojo Rojo
Moving in Stereo
TRUNCATE
TRUNCATE
Removes all data from a table.
Synopsis
TRUNCATE keyspace_name.table_name
Synopsis legend
Description
A TRUNCATE statement results in the immediate, irreversible removal of all data in the named table.
Example
TRUNCATE user_activity;
UPDATE
Updates one or more columns in the identified row of a table.
Synopsis
UPDATE keyspace_name.table_name
USING TTL seconds
SET assignment , assignment, ...
WHERE row_specification
assignment is one of:
column_name = value
set_or_list_item + | - set | list
map_name + map
collection_column_name [ term ] = value
counter_column_name = counter_column_name + | - integer
set is:
{ item1, item2, . . . }
list is:
[ name, value ]
map is:
{ name : value, name : value, . . . }
term is:
121
TRUNCATE
[ list_index_position | [ list_value ]
row_specification is:
primary key name = key_value
primary key name IN (key_value ,...)
Synopsis legend
Description
An UPDATE writes one or more column values to existing columns in a Cassandra table. No results are returned. A
statement begins with the UPDATE keyword followed by a Cassandra table name. To update multiple columns,
separate the name-value pairs using commas.
The SET clause specifies the column name-value pairs to update. Separate multiple name-value pairs using commas. If
the named column exists, its value is updated. If the column does not exist, use ALTER TABLE to create the new
column.
To update a counter column value in a counter table, specify a value to increment or decrement value the current value
of the counter column.
UPDATE UserActionCounts SET total = total + 2 WHERE keyalias = 523;
In an UPDATE statement, you can specify these options:
TTL seconds
Timestamp
TTL input is in seconds. TTL column values are automatically marked as deleted (with a tombstone) after the requested
amount of time has expired. TTL marks the inserted values, not the column itself, for expiration. Any subsequent update
of the column resets the TTL to the TTL specified in the update. By default, values never expire.
The TIMESTAMP input is in one of the following formats:
A string in ISO 8601 format
An integer representing microseconds
If not specified, the time (in microseconds) that the write occurred to the column is used.
Each update statement requires a precise set of row keys to be specified using a WHERE clause. You need to specify
all keys in a table having compound and clustering columns. For example, update the value of a column in a table
having a compound primary key, userid and url:
UPDATE excelsior.clicks USING TTL 432000
SET user_name = 'bob'
WHERE userid=cfd66ccc-d857-4e90-b1e5-df98a3d40cd6 AND
url='http://google.com';
UPDATE Movies SET col1 = val1, col2 = val2 WHERE movieID = key1;
UPDATE Movies SET col3 = val3 WHERE movieID IN (key1, key2, key3);
UPDATE Movies SET col4 = 22 WHERE movieID = key4;
Examples
Update a column in several rows at once:
UPDATE users
SET state = 'TX'
WHERE user_uuid
122
TRUNCATE
IN (88b8fd18-b1ed-4e96-bf79-4280797cba80,
06a8913c-c0d6-477c-937d-6c1b69a95d43,
bc108776-7cb5-477f-917d-869c12dfffa8);
Update several columns in a single row:
UPDATE users
SET name = 'John Smith',
email = '[email protected]'
WHERE user_uuid = 88b8fd18-b1ed-4e96-bf79-4280797cba80;
Update the value of a counter column:
UPDATE page_views
USING TIMESTAMP 1355384955054
SET index.html = 'index.html' + 1
WHERE url_key = 'www.datastax.com';
123
USE
To insert values into the list.
UPDATE users
SET top_places = [ 'rivendell', 'rohan' ] WHERE user_id = 'frodo';
To prepend an element to the list, enclose it in square brackets, and use the addition (+) operator:
UPDATE users
SET top_places = [ 'the shire' ] + top_places WHERE user_id = 'frodo';
To append an element to the list, switch the order of the new element data and the list name in the UPDATE command:
UPDATE users
SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo';
To add an element at a particular position, use the list index position in square brackets:
UPDATE users SET top_places[2] = 'riddermark' WHERE user_id = 'frodo';
To remove all elements having a particular value, use the UPDATE command, the subtraction operator (-), and the list
value in square brackets:
UPDATE users
SET top_places = top_places - ['riddermark'] WHERE user_id = 'frodo';
USE
Connects the current client session to a keyspace.
Synopsis
USE keyspace_name | keyspace_name
Description
A USE statement identifies the keyspace that contains the tables to query for the current client session. All subsequent
operations on tables and indexes are in the context of the named keyspace, unless otherwise specified or until the client
connection is terminated or another USE statement is issued.
To use a case-sensitive keyspace, enclose the keyspace name in double quotation marks.
Example
USE PortfolioDemo;
Continuing with the example in Checking created keyspaces:
USE "Excalibur";
ASSUME
Treats a column name or value as a specified type, even if that type information is not specified in the table's metadata.
Synopsis
124
CAPTURE
ASSUME keyspace_name.table_name
storage_type_definition , storage_type_definition ..., ...
storage_type_definition is:
(column_name) VALUES ARE datatype
| NAMES ARE datatype
| VALUES ARE datatype
Synopsis legend
Description
ASSUME treats all values in the given column in the given table as being of the specified type when the
storage_type_definition is:
(column_name) VALUES ARE datatype
This overrides any other information about the type of a value.
ASSUME treats all column names in the given table as being of the given type when the storage_type_definition is:
NAMES ARE <type>
ASSUME treats all column values in the given table as being of the given type unless overridden by a column-specific
ASSUME or column-specific metadata in the table's definition.
Examples
ASSUME users NAMES ARE text, VALUES are text;
ASSUME users(user_id) VALUES are uuid;
CAPTURE
Captures command output and appends it to a file.
Synopsis
CAPTURE '<file>' | OFF
Synopsis legend
Description
To start capturing the output of a query, specify the path of the file relative to the current directory. Enclose the file name
in single quotation marks. The shorthand notation in this example is supported for referring to $HOME:
125
CONSISTENCY
Example
CAPTURE '~/mydir/myfile.txt';
Output is not shown on the console while it is captured. Only query result output is captured. Errors and output from
cqlsh-only commands still appear.
To stop capturing output and return to normal display of output, use CAPTURE OFF.
To determine the current capture state, use CAPTURE with no arguments.
CONSISTENCY
Shows the current consistency level, or given a level, sets it.
Synopsis
CONSISTENCY level
Synopsis legend
Description
Providing an argument to the CONSISTENCY command overrides the default consistency level of ONE, setting the
consistency level for future requests. Valid values are: ANY, ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM
and EACH_QUORUM. See About data consistency for more information about these settings.
Providing no argument shows the current consistency level.
Example
CONSISTENCY
If you haven't changed the default, the output of the CONSISTENCY command with no arguments is:
Current consistency level is ONE.
COPY
Imports and exports CSV (comma-separated values) data to and from Cassandra 1.1.3 and higher.
Synopsis
COPY table_name ( column, ...)
FROM ( 'file_name' | STDIN )
WITH option = 'value' AND ...
COPY table_name ( column , ... )
TO ( 'file_name' | STDOUT )
WITH option = 'value' AND ...
Synopsis legend
126
CONSISTENCY
Description
Using the COPY options in a WITH clause, you can change the CSV format. This table describes these options:
COPY
Options
Default Value
Use To
QUOTE
quotation mark(")
ESCAPE
backslash (\)
Set the character that escapes literal uses of the QUOTE character.
HEADER
false
ENCODING UTF8
NULL
an empty string
127
CONSISTENCY
the end of the CSV data with a backslash and period (\.) on a separate line. If the data is being imported into a table
that already contains data, COPY FROM does not truncate the table beforehand.
You can copy only a partial set of columns. Specify the entire set or a subset of column names in parentheses after the
table name in the order you want to import or export them. By default, when you use the COPY TO command,
Cassandra copies data to the CSV file in the order defined in the Cassandra table metadata. In version 1.1.6 and later,
you can also omit listing the column names when you want to import or export all the columns in the order they appear
in the source table or CSV file.
Examples
Copy a table to a CSV file
1. Using CQL 3, create a table named airplanes and copy it to a CSV file.
CREATE KEYSPACE test
WITH REPLICATION = {'class' : 'SimpleStrategy'
'replication_factor' : 3 };
USE test;
CREATE TABLE airplanes (
name text PRIMARY KEY,
manufacturer ascii,
year int,
mach float
);
INSERT INTO airplanes
(name, manufacturer, year, mach)
VALUES ('P38-Lightning', 'Lockheed', 1937, '.7');
COPY airplanes (name, manufacturer, year, mach) TO 'temp.csv';
1 rows exported in 0.004 seconds.
2. Clear the data from the airplanes table and import the data from the temp.csv file
TRUNCATE airplanes;
COPY airplanes (name, manufacturer, year, mach) FROM 'temp.csv';
1 rows imported in 0.087 seconds.
Copy data from standard input to a table
1. Enter data directly during an interactive cqlsh session, using the COPY command defaults.
COPY airplanes (name, manufacturer, year, mach) FROM STDIN;
2. At the [copy] prompt, enter the following data:
"F-14D Super Tomcat", Grumman,"1987", "2.34"
"MiG-23 Flogger", Russian-made, "1964", "2.35"
"Su-27 Flanker", U.S.S.R.,"1981", "2.35"
\.
3 rows imported in 55.204 seconds.
128
DESCRIBE
DESCRIBE
Provides information about the connected Cassandra cluster, or about the data objects stored in the cluster.
Synopsis
DESCRIBE CLUSTER | SCHEMA
| KEYSPACES
| KEYSPACE keyspace_name
| TABLES
| TABLE table_name
Synopsis legend
Description
The DESCRIBE or DESC command outputs information about the connected Cassandra cluster, or about the data
stored on it. To query the system tables directly, use SELECT.
The keyspace and table name arguments are case-sensitive and need to match the the upper or lowercase names
stored internally. Use the DESCRIBE commands to list objects by their internal names.
DESCRIBE functions in the following ways:
DESCRIBE CLUSTER
Output is the information about the connected Cassandra cluster, such as the cluster name, and the partitioner
and snitch in use. When you are connected to a non-system keyspace, this command also shows endpoint-range
ownership information for the Cassandra ring.
DESCRIBE SCHEMA
Output is a list of CQL commands that could be used to recreate the entire schema. Works as though DESCRIBE
KEYSPACE <k> was invoked for each keyspace k.
DESCRIBE KEYSPACES
Output is a list of all keyspace names.
DESCRIBE KEYSPACE keyspace_name
Output is a list of CQL commands that could be used to recreate the given keyspace, and the tables in it. In some
cases, as the CQL interface matures, there will be some metadata about a keyspace that is not representable with
CQL. That metadata will not be shown.
The <keyspacename> argument can be omitted when using a non-system keyspace; in that case, the current
keyspace is described.
129
EXIT
DESCRIBE TABLES
Output is a list of the names of all tables in the current keyspace, or in all keyspaces if there is no current
keyspace.
DESCRIBE TABLE table_name
Output is a list of CQL commands that could be used to recreate the given table. In some cases, there might be
table metadata that is not representable and it is not be shown.
Examples
DESCRIBE CLUSTER;
DESCRIBE KEYSPACES;
DESCRIBE KEYSPACE PortfolioDemo;
DESCRIBE TABLES;
DESCRIBE TABLE Stocks;
EXIT
Terminates cqlsh.
Synopsis
EXIT | QUIT
SHOW
Shows the Cassandra version, host, or data type assumptions for the current cqlsh client session.
Synopsis
SHOW VERSION
| HOST
| ASSUMPTIONS
Description
A SHOW command displays this information about the current cqlsh client session:
The version and build number of the connected Cassandra instance, as well as the CQL mode for cqlsh and the
Thrift protocol used by the connected Cassandra instance.
The host information of the Cassandra node that the cqlsh session is currently connected to.
The data type assumptions for the current cqlsh session as specified by the ASSUME command.
Examples
130
SOURCE
SHOW VERSION;
SHOW HOST;
SHOW ASSUMPTIONS;
SOURCE
Executes a file containing CQL statements.
Synopsis
SOURCE 'file'
Description
To execute the contents of a file, specify the path of the file relative to the current directory. Enclose the file name in
single quotation marks. The shorthand notation in this example is supported for referring to $HOME:
Example
SOURCE '~/mydir/myfile.txt';
The output for each statement, if there is any, appears in turn, including any error messages. Errors do not abort
execution of the file.
Alternatively, use the --file option to execute a file while starting CQL.
TRACING
Enables or disables request tracing.
Synopsis
TRACING ON | OFF
Synopsis legend
Description
To turn tracing read/write requests on or off, use the TRACING command. After turning on tracing, database activity
creates output that can help you understand Cassandra internal operations and troubleshoot performance problems.
For 24 hours, Cassandra saves the tracing information in the tables, which are in the system_traces keyspace:
CREATE TABLE sessions (
session_id uuid PRIMARY KEY,
coordinator inet,
duration int,
parameters map<text, text>,
request text,
started_at timestamp
131
SOURCE
);
CREATE TABLE events (
session_id uuid,
event_id timeuuid,
activity text,
source inet,
source_elapsed int,
thread text,
PRIMARY KEY (session_id, event_id)
);
To keep tracing information, copy the data in sessions and event tables to another location.
132
SOURCE
Message
Message
Processing
Processing
received
received
response
response
from /127.0.0.2
from /127.0.0.3
from /127.0.0.2
from /127.0.0.3
Request complete
|
|
|
|
|
16:41:00,765
16:41:00,765
16:41:00,765
16:41:00,765
16:41:00,765
|
|
|
|
|
127.0.0.1
127.0.0.1
127.0.0.1
127.0.0.1
127.0.0.1
|
|
|
|
|
10966
10966
11063
11066
11139
133
SOURCE
|
|
|
|
|
|
|
|
|
|
17:47:32,667
17:47:32,668
17:47:32,668
17:47:32,669
17:47:32,669
17:47:32,669
17:47:32,669
17:47:32,669
17:47:32,669
17:47:32,670
|
|
|
|
|
|
|
|
|
|
127.0.0.2
127.0.0.1
127.0.0.1
127.0.0.1
127.0.0.1
127.0.0.1
127.0.0.1
127.0.0.1
127.0.0.1
127.0.0.1
|
|
|
|
|
|
|
|
|
|
1825
156454
156610
157387
157729
157904
158054
158217
158270
159525
134
About writes
Cassandra delivers high availability for writing through its data replication strategy. Cassandra duplicates data on
multiple peer nodes to ensure reliability and fault tolerance. Relational databases, on the other hand, typically structure
tables to keep data duplication at a minimum. The relational database server has to do additional work to ensure data
integrity across the tables. In Cassandra, maintaining integrity between related tables is not an issue. Cassandra tables
are not related. Usually, Cassandra performs better on writes than relational databases.
Memtables and SSTables are maintained per table. SSTables are immutable, not written to again after the memtable is
flushed. Consequently, a row is typically stored across multiple SSTable files.
For each SSTable, Cassandra creates these in-memory structures:
Primary index - A list of row keys and the start position of rows in the data file.
Index summary - A subset of the primary index. By default 1 row key out of every 128 is sampled.
135
136
Any number of columns may be inserted at the same time. When inserting or updating columns in a table, the client
application specifies the row key to identify which column records to update. The row key is similar to a primary key in
that it must be unique for each row within a table. However, unlike inserting a primary key, inserting a duplicate row key
does not result in a primary key constraint violation.
The write path of an update
Inserting a duplicate row key is treated as an upsert. Eventually, the updates are streamed to disk using sequential I/O
and stored in a new SSTable.
Columns are overwritten only if the timestamp in the new version of the column is more recent than the existing column,
so precise timestamps are necessary if updates (overwrites) are frequent. The timestamp is provided by the client, so
the clocks of all client machines should be synchronized using NTP (network time protocol), for example.
About deletes
Cassandra deletes data in a different way from a traditional, relational database. A relational database might spend time
scanning through data looking for expired data and throwing it away or an administrator might have to partition expired
data by month, for example, to clear it out faster. In Cassandra, you do not have to manually remove expired data. Two
facts about deleted Cassandra data to keep in mind are:
Cassandra does not immediately remove deleted data from disk.
A deleted column can reappear if you do not run node repair routinely.
After an SSTable is written, it is immutable (the file is not updated by further DML operations). Consequently, a deleted
column is not removed immediately. Instead a tombstone is written to indicate the new column status. Columns marked
with a tombstone exist for a configured time period (defined by the gc_grace_seconds value set on the table). When the
137
138
Note
By default, hints are only saved for three hours after a replica fails because if the replica is down longer than that, it is
likely permanently dead. In this case, you should run a repair to re-replicate the data before the failure occurred. You
can configure this time using the max_hint_window_in_ms property in the cassandra.yaml file.
Hint creation does not count towards any consistency level besides ANY. For example, if no replicas respond to a write
at a consistency level of ONE, hints are created for each replica but the request is reported to the client as timed out.
However, since hints are replayed at the earliest opportunity, a timeout here represents a write-in-progress, rather than
failure. The only time Cassandra will fail a write entirely is when too few replicas are alive when the coordinator receives
the request. For a complete explanation of how Cassandra deals with replica failure, see When a timeout is not a failure:
how Cassandra delivers high availability.
When a replica that is storing hints detects via gossip that the failed node is alive again, it will begin streaming the
missed writes to catch up the out-of-date replica.
Note
Hinted handoff does not completely replace the need for regular node repair operations. In addition to the time set by
max_hint_window_in_ms, the coordinator node storing hints could fail before replay. You should always run a full
repair after losing a node or disk.
About reads
Cassandra performs random reads from SSD in parallel with extremely low latency, unlike most databases. Rotational
disks are not recommended.
Cassandra reads, as well as writes, data by primary key, eliminating complex queries required by a relational database.
First, Cassandra checks the Bloom filter. Each SSTable has a Bloom filter associated with it that checks if any data for
the requested row exists in the SSTable before doing any disk I/O.
Next, Cassandra checks the global key cache. If the requested data is not in the key cache, Cassandra performs a
binary search of the index summary to find a row. By default, 1 row key out of every 128 is sampled from the primary
index to create the index summary. You configure sample frequency by changing the index_interval property in the
139
Disk reads take place on a block level. One disk read of the index block corresponds to the closest sampled entry.
Cassandra reads a row, plus some selection of columns or a range of columns. This process, in conjunction with fast
lookup of data through primary and secondary indexes makes Cassandra is very performant on reads when compared
to other storage systems, even for read-heavy workloads. Faster startup/bootup times for each node in a cluster are
realized through the efficient sampling and loading of SSTable indexes into memory caches. The SSTable index load
time is improved dramatically by eliminating the need to go through the whole primary index.
140
For example, you have a row of user data and need to update the user email address. Cassandra doesn't rewrite the
entire row into a new data file, but just puts new email address in the new data file. The user name and password are
still in the old data file.
The red lines in the SSTables in this diagram are fragments of a row that Cassandra needs to combine to give the user
the requested results. Cassandra caches the merged value, not the raw row fragments. That saves some CPU and disk
I/O.
The row cache is a write-through cache, so if you have a cached row and you update that row, it will be updated in the
cache and you still won't have to merge that again.
For a detailed explanation of how client read and write requests are handled in Cassandra, also see About client
requests.
Atomicity in Cassandra
In Cassandra, a write is atomic at the row-level, meaning inserting or updating columns for a given row key will be
treated as one write operation. Cassandra does not support transactions in the sense of bundling multiple row updates
into one all-or-nothing operation. Nor does it roll back when a write succeeds on one replica, but fails on other replicas.
It is possible in Cassandra to have a write operation report a failure to the client, but still actually persist the write to a
replica.
For example, if using a write consistency level of QUORUM with a replication factor of 3, Cassandra will send the write
to 2 replicas. If the write fails on one of the replicas but succeeds on the other, Cassandra will report a write failure to the
client. However, the write is not automatically rolled back on the other replica.
Cassandra uses timestamps to determine the most recent update to a column. The timestamp is provided by the client
application. The latest timestamp always wins when requesting data, so if multiple client sessions update the same
columns in a row concurrently, the most recent update is the one that will eventually persist.
142
Isolation in Cassandra
Prior to Cassandra 1.1, it was possible to see partial updates in a row when one user was updating the row while
another user was reading that same row. For example, if one user was writing a row with two thousand columns,
another user could potentially read that same row and see some of the columns, but not all of them if the write was still
in progress.
Full row-level isolation is now in place so that writes to a row are isolated to the client performing the write and are not
visible to any other user until they are complete.
From a transactional ACID (atomic, consistent, isolated, durable) standpoint, this enhancement now gives Cassandra
transactional AID support. A write is isolated at the row-level in the storage engine.
Durability in Cassandra
Writes in Cassandra are durable. All writes to a replica node are recorded both in memory and in a commit log on disk
before they are acknowledged as a success. If a crash or server failure occurs before the memory tables are flushed to
disk, the commit log is replayed on restart to recover any lost writes. In addition to the local durability (data immediately
written to disk), the replication of data on other nodes strengthens durability.
143
Level
Description
ANY
A write must be written to at least one node. If all replica nodes for the given row key are down,
the write can still succeed once a hinted handoff has been written. Note that if all replica nodes
are down at write time, an ANY write will not be readable until the replica nodes for that row
key have recovered.
ONE
A write must be written to the commit log and memory table of at least one replica node.
TWO
A write must be written to the commit log and memory table of at least two replica nodes.
THREE
A write must be written to the commit log and memory table of at least three replica nodes.
QUORUM
A write must be written to the commit log and memory table on a quorum of replica nodes.
LOCAL_QUORUM
A write must be written to the commit log and memory table on a quorum of replica nodes in
the same data center as the coordinator node. Avoids latency of inter-data center
communication.
EACH_QUORUM
A write must be written to the commit log and memory table on a quorum of replica nodes in all
data centers.
ALL
A write must be written to the commit log and memory table on all replica nodes in the cluster
for that row key.
Description
ONE
Returns a response from the closest replica (as determined by the snitch). By default, a read
repair runs in the background to make the other replicas consistent.
TWO
Returns the most recent data from two of the closest replicas.
THREE
Returns the most recent data from three of the closest replicas.
QUORUM
Returns the record with the most recent timestamp once a quorum of replicas has responded.
LOCAL_QUORUM
Returns the record with the most recent timestamp once a quorum of replicas in the current
data center as the coordinator node has reported. Avoids latency of inter-data center
communication.
EACH_QUORUM
Returns the record with the most recent timestamp once a quorum of replicas in each data
center of the cluster has responded.
ALL
Returns the record with the most recent timestamp once all replicas have responded. The read
operation will fail if a replica does not respond.
144
Note
LOCAL_QUORUM and EACH_QUORUM are designed for use in multiple data center clusters using a rack-aware
replica placement strategy (such as NetworkTopologyStrategy) and a properly configured snitch. These
consistency levels will fail when using SimpleStrategy.
145
About CQL
Cassandra 0.8 was the first release to include the Cassandra Query Language (CQL). As with SQL, clients built on CQL
only need to know how to interpret query resultset objects. CQL is the future of Cassandra client API development. CQL
drivers are hosted within the Apache Cassandra project.
CQL version 2.0, which has improved support for several commands, is compatible with Cassandra version 1.0 but not
version 0.8.x.
In Cassandra 1.1, CQL became the primary interface into the DBMS. The CQL mode was promoted to CQL 3, although
CQL 2 remained the default because CQL 3 is not backward compatible. The most significant enhancement of CQL 3
was support for compound and clustering columns.
In Cassandra 1.2, CQL 3 became the default interface into the DBMS.
The Python driver includes a command-line interface, cql.sh for using cqlsh.
146
Configuration
Configuration
Like any modern server-based software, Cassandra has a number of configuration options to tune the system toward
specific workloads and environments. Substantial efforts have been made to provide meaningful default configuration
values, but given the inherently complex nature of distributed systems coupled with the wide variety of possible
workloads, most production deployments require some modifications of the default configuration. For information about
JVM configuration, see Tuning Java resources.
Note
** Some default values are set at the class level and may be missing or commented out in the cassandra.yaml file.
Additionally, values in commented out options may not match the default value: they are the recommended value
when changing from the default.
Option
Option
authenticator
multithreaded_compaction
authorizer
native_transport_max_threads
auto_bootstrap
native_transport_port
auto_snapshot
num_tokens
broadcast_address
partitioner
client_encryption_options
permissions_validity_in_ms
cluster_name
phi_convict_threshold
column_index_size_in_kb
range_request_timeout_in_ms
commitlog_directory
read_request_timeout_in_ms
commitlog_segment_size_in_mb
reduce_cache_capacity_to
commitlog_sync
reduce_cache_sizes_at
commitlog_total_space_in_mb
request_scheduler_id
compaction_preheat_key_cache
request_scheduler_options
compaction_throughput_mb_per_sec
request_scheduler
concurrent_compactors
request_timeout_in_ms
concurrent_reads
request_timeout_in_ms
concurrent_writes
row_cache_keys_to_save
cross_node_timeout
row_cache_provider
data_file_directories
row_cache_save_period
147
Configuration
disk_failure_policy
row_cache_size_in_mb
dynamic_snitch_badness_threshold
rpc_address
dynamic_snitch_reset_interval_in_ms
rpc_keepalive
dynamic_snitch_update_interval_in_ms
rpc_max_threads
endpoint_snitch
rpc_min_threads
flush_largest_memtables_at
rpc_port
hinted_handoff_enabled
rpc_recv_buff_size_in_bytes
hinted_handoff_throttle_in_kb
rpc_send_buff_size_in_bytes
in_memory_compaction_limit_in_mb
rpc_server_type
incremental_backups
saved_caches_directory
index_interval
seed_provider
initial_token
server_encryption_options
inter_dc_tcp_nodelay
snapshot_before_compaction
internode_compression
ssl_storage_port
internode_recv_buff_size_in_bytes
start_native_transport
internode_send_buff_size_in_bytes
start_rpc
key_cache_keys_to_save
storage_port
key_cache_save_period
stream_throughput_outbound_megabits_per_sec
key_cache_size_in_mb
streaming_socket_timeout_in_ms
listen_address
thrift_framed_transport_size_in_mb
max_hint_window_in_ms
thrift_max_message_length_in_mb
max_hints_delivery_threads
trickle_fsync
memtable_flush_queue_size
truncate_request_timeout_in_ms
memtable_total_space_in_mb
write_request_timeout_in_ms
auto_bootstrap
(Default: true) This setting has been removed from default configuration. It makes new (non-seed) nodes automatically
migrate the right data to themselves. It is referenced here because auto_bootstrap: true is explicitly added to the
cassandra.yaml file in an AMI installation. Setting this property to false is not recommended and is necessary only in
rare instances.
broadcast_address
(Default: listen_address**) If your Cassandra cluster is deployed across multiple Amazon EC2 regions and you use the
EC2MultiRegionSnitch, set the broadcast_address to public IP address of the node and the listen_address to the private
IP.
148
Configuration
cluster_name
(Default: Test Cluster) The name of the cluster; used to prevent machines in one logical cluster from joining another. All
nodes participating in a cluster must have the same value.
commitlog_directory
(Default: /var/lib/cassandra/commitlog) The directory where the commit log is stored. For optimal write performance, it is
recommended the commit log be on a separate disk partition (ideally, a separate physical device) from the data file
directories.
data_file_directories
(Default: /var/lib/cassandra/data) The directory location where table data (SSTables) is stored.
disk_failure_policy
(Default: stop) Sets how Cassandra responds to disk failure.
stop: Shuts down gossip and Thrift, leaving the node effectively dead, but it can still be inspected using JMX.
best_effort: Cassandra does its best in the event of disk errors. If it cannot write to a disk, the disk is blacklisted
for writes and the node continues writing elsewhere. If Cassandra cannot read from the disk, those SSTables are
marked unreadable, and the node continues serving data from readable SSTables. This means you will see
obsolete data at consistency level of ONE.
ignore: Use for upgrading. Cassandra acts as in versions prior to 1.2. Ignores fatal errors and lets the requests
fail; all file system errors are logged but otherwise ignored. It is recommended using stop or best_effort.
endpoint_snitch
(Default: org.apache.cassandra.locator.SimpleSnitch) Sets which snitch Cassandra uses for locating nodes and routing
requests. It must be set to a class that implements IEndpointSnitch. For descriptions of the snitches, see Types of
snitches.
initial_token
(Default: n/a) Used in versions prior to 1.2. If you haven't specified num_tokens or have set it to the default value of 1,
you should always specify this parameter when setting up a production cluster for the first time and when adding
capacity. For more information, see this parameter in the 1.1 Node and Cluster Configuration topic.
listen_address
(Default: localhost) The IP address or hostname that other Cassandra nodes use to connect to this node. If left unset,
the hostname must resolve to the IP address of this node using /etc/hostname, /etc/hosts, or DNS. Do not specify
0.0.0.0.
num_tokens
(Default: 1**) Defines the number of tokens randomly assigned to this node on the ring. The more tokens, relative to
other nodes, the larger the proportion of data that the node stores. Generally all nodes should have the same number of
tokens assuming they have equal hardware capability. Specifying the initial_token overrides this setting. The
recommended value is 256.
If left unspecified, Cassandra uses the default value of 1 token (for legacy compatibility) and uses the initial_token. If you
already have a cluster with one token per node, and wish to migrate to multiple tokens per node, see
http://wiki.apache.org/cassandra/Operations.
149
Configuration
partitioner
(Default: org.apache.cassandra.dht.Murmur3Partitioner) Distributes rows (by key) across nodes in the cluster. Any
IPartitioner may be used, including your own as long as it is on the classpath. Cassandra provides the following
partitioners:
org.apache.cassandra.dht.Murmur3Partitioner
org.apache.cassandra.dht.RandomPartitioner
org.apache.cassandra.dht.ByteOrderedPartitioner
org.apache.cassandra.dht.OrderPreservingPartitioner (deprecated)
org.apache.cassandra.dht.CollatingOrderPreservingPartitioner (deprecated)
rpc_address
(Default: localhost) The listen address for client connections (Thrift remote procedure calls). Valid values are:
0.0.0.0: Listens on all configured interfaces.
IP address
hostname
unset: Resolves the address using the hostname configuration of the node.
If left unset, the hostname must resolve to the IP address of this node using /etc/hostname, /etc/hosts, or DNS.
rpc_port
(Default: 9160) The port for the Thrift RPC service, which is used for client connections.
start_rpc
(Default: true) Starts the Thrift RPC server.
saved_caches_directory
(Default: /var/lib/cassandra/saved_caches) The directory location where table key and row caches are stored.
seed_provider
(Default: org.apache.cassandra.locator.SimpleSeedProvider) A list of comma-delimited hosts (IP addresses) to use as
contact points when a node joins a cluster. Cassandra also uses this list to learn the topology of the ring. When running
multiple nodes, you must change the - seeds list from the default value (127.0.0.1). In multiple data-center clusters, the seeds list should include at least one node from each data center (replication group).
start_native_transport
(Default: false) Enable or disable the native transport server. Currently, only the Thrift server is started by default
because the native transport is considered beta. Note that the address on which the native transport is bound is the
same as the rpc_address. However, the port is different from the rpc_port and specified in native_transport_port.
native_transport_port
(Default: 9042) Port on which the CQL native transport listens for clients.
native_transport_max_threads
150
Configuration
(Default: unlimited**) The maximum number of thread handling requests. The meaning is the same as
rpc_max_threads.
storage_port
(Default: 7000) The port for inter-node communication.
key_cache_keys_to_save
(Default: disabled - all keys are saved**) Number of keys from the key cache to save.
key_cache_save_period
(Default: 14400 - 4 hours) Duration in seconds that keys are saved in cache. Caches are saved to
saved_caches_directory. Saved caches greatly improve cold-start speeds and has relatively little effect on I/O.
key_cache_size_in_mb
(Default: empty, which automatically sets it to the smaller of 5% of the available heap, or 100MB) A global cache setting
for tables. It is the maximum size of the key cache in memory. To disable set to 0.
row_cache_keys_to_save
(Default: disabled - all keys are saved**) Number of keys from the row cache to save.
row_cache_size_in_mb
(Default: 0 - disabled) A global cache setting for tables.
row_cache_save_period
(Default: 0 - disabled) Duration in seconds that rows are saved in cache. Caches are saved to saved_caches_directory.
row_cache_provider
(Default: SerializingCacheProvider) Specifies what kind of implementation to use for the row cache.
SerializingCacheProvider: Serializes the contents of the row and stores it in native memory, that is, off the JVM
Heap. Serialized rows take significantly less memory than live rows in the JVM, so you can cache more rows in a
given memory footprint. Storing the cache off-heap means you can use smaller heap sizes, which reduces the
impact of garbage collection pauses. It is valid to specify the fully-qualified class name to a class that implements
org.apache.cassandra.cache.IRowCacheProvider.
ConcurrentLinkedHashCacheProvider: Rows are cached using the JVM heap, providing the same row cache
behavior as Cassandra versions prior to 0.8.
The SerializingCacheProvider is 5 to 10 times more memory-efficient than ConcurrentLinkedHashCacheProvider for
applications that are not blob-intensive. However, SerializingCacheProvider may perform worse in update-heavy
workload situations because it invalidates cached rows on update instead of updating them in place as
151
Configuration
ConcurrentLinkedHashCacheProvider does.
column_index_size_in_kb
(Default: 64) Add column indexes to a row when the data reaches this size. This value defines how much row data must
be deserialized to read the column. Increase this setting if your column values are large or if you have a very large
number of columns. If consistently reading only a few columns from each row or doing many partial-row reads, keep it
small. All index data is read for each access, so take that into consideration when setting the index size.
commitlog_segment_size_in_mb
(Default: 32 for 32-bit JVMs, 1024 for 64-bit JVMs) Sets the size of the individual commitlog file segments. A commitlog
segment may be archived, deleted, or recycled after all its data has been flushed to SSTables. This amount of data can
potentially include commitlog segments from every table in the system. The default size is usually suitable for most
commitlog archiving, but if you want a finer granularity, 8 or 16 MB is reasonable. See Commit log archive configuration.
commitlog_sync
(Default: periodic) The method that Cassandra uses to acknowledge writes in milliseconds:
periodic: Used with commitlog_sync_period_in_ms (default: 10000 - 10 seconds) to control how often the commit
log is synchronized to disk. Periodic syncs are acknowledged immediately.
batch: Used with commitlog_sync_batch_window_in_ms (default: disabled**) to control how long Cassandra waits
for other writes before performing a sync. When using this method, writes are not acknowledged until fsynced to
disk.
commitlog_total_space_in_mb
(Default: 32 for 32-bit JVMs, 1024 for 64-bit JVMs**) Total space used for commitlogs. If the used space goes above this
value, Cassandra rounds up to the next nearest segment multiple and flushes memtables to disk for the oldest
commitlog segments, removing those log segments. This reduces the amount of data to replay on startup, and prevents
infrequently-updated tables from indefinitely keeping commitlog segments. A small total commitlog space tends to cause
more flush activity on less-active tables.
compaction_preheat_key_cache
(Default: true) When set to true, cached row keys are tracked during compaction, and re-cached to their new positions in
the compacted SSTable. If you have extremely large key caches for tables, set the value to false; see Global row and
key caches properties.
compaction_throughput_mb_per_sec
(Default: 16) Throttles compaction to the given total throughput across the entire system. The faster you insert data, the
faster you need to compact in order to keep the SSTable count down. The recommended Value is 16 to 32 times the
rate of write throughput (in MBs/second). Setting the value to 0 disables compaction throttling.
concurrent_compactors
(Default: 1 per CPU core**) Sets the number of concurrent compaction processes allowed to run simultaneously on a
node, not including validation compactions for anti-entropy repair. Simultaneous compactions help preserve read
performance in a mixed read-write workload by mitigating the tendency of small SSTables to accumulate during a single
152
Configuration
long-running compaction. If compactions run too slowly or too fast, change compaction_throughput_mb_per_sec first.
concurrent_reads
(Default: 32) For workloads with more data than can fit in memory, the bottleneck is reads fetching data from disk.
Setting to (16 * number_of_drives) allows operations to queue low enough in the stack so that the OS and drives can
reorder them.
concurrent_writes
(Default: 32) Writes in Cassandra are rarely I/O bound, so the ideal number of concurrent writes depends on the number
of CPU cores in your system. The recommended value is (8 * number_of_cpu_cores).
cross_node_timeout
(Default: false) Enable or disable operation timeout information exchange between nodes (to accurately measure
request timeouts). If disabled Cassandra assumes the request was forwarded to the replica instantly by the coordinator.
Warning
Before enabling this property make sure NTP (network time protocol) is installed and the times are synchronized
between the nodes.
flush_largest_memtables_at
(Default: 0.75) When Java heap usage (after a full concurrent mark sweep (CMS) garbage collection) exceeds the set
value, Cassandra flushes the largest memtables to disk to free memory. This parameter is an emergency measure to
prevent sudden out-of-memory (OOM) errors. Do not use it as a tuning mechanism. It is most effective under light to
moderate loads or read-heavy workloads; it will fail under massive write loads. A value of 0.75 flushes memtables when
Java heap usage is above 75% total heap size. Set to 1.0 to disable. Other emergency measures are
reduce_cache_capacity_to and reduce_cache_sizes_at.
in_memory_compaction_limit_in_mb
(Default: 64) Size limit for rows being compacted in memory. Larger rows spill to disk and use a slower two-pass
compaction process. When this occurs, a message is logged specifying the row key. The recommended value is 5 to 10
percent of the available Java heap size.
index_interval
(Default: 128) Controls the sampling of entries from the primary row index. The interval corresponds to the number of
index entries that are skipped between taking each sample. By default Cassandra samples one row key out of every
128. The larger the interval, the smaller and less effective the sampling. The larger the sampling, the more effective the
index, but with increased memory usage. Generally, the best trade off between memory usage and performance is a
value between 128 and 512 in combination with a large table key cache. However, if you have small rows (many to an
OS page), you may want to increase the sample size, which often lowers memory usage without an impact on
performance. For large rows, decreasing the sample size may improve read performance.
memtable_flush_queue_size
(Default: 4) The number of full memtables to allow pending flush (memtables waiting for a write thread). At a minimum,
set to the maximum number of secondary indexes created on a single table.
memtable_flush_writers
153
Configuration
(Default: 1 per data directory**) Sets the number of memtable flush writer threads. These threads are blocked by disk
I/O, and each one holds a memtable in memory while blocked. If you have a large Java heap size and many data
directories, you can increase the value for better flush performance.
memtable_total_space_in_mb
(Default: 1/3 of the heap**) Specifies the total memory used for all memtables on a node. This replaces the per-table
storage settings memtable_operations_in_millions and memtable_throughput_in_mb.
multithreaded_compaction
(Default: false) When set to true, each compaction operation uses one thread per core and one thread per SSTable
being merged. This is typically useful only on nodes with SSD hardware. With HDD hardware, the goal is to limit the disk
I/O for compaction (see compaction_throughput_mb_per_sec).
populate_io_cache_on_flush
(Default: false**) Populates the page cache on memtable flush and compaction. Enable this setting only when the whole
node's data fits in memory.
reduce_cache_capacity_to
(Default: 0.6) Sets the size percentage to which maximum cache capacity is reduced when Java heap usage reaches
the threshold defined by reduce_cache_sizes_at. Together with flush_largest_memtables_at, these properties constitute
an emergency measure for preventing sudden out-of-memory (OOM) errors.
reduce_cache_sizes_at
(Default: 0.85) When Java heap usage (after a full concurrent mark sweep (CMS) garbage collection) exceeds this
percentage, Cassandra reduces the cache capacity to the fraction of the current size as specified by
reduce_cache_capacity_to. To disable, set the value to 1.0.
stream_throughput_outbound_megabits_per_sec
(Default: 400**) Throttles all outbound streaming file transfers on a node to the specified throughput. Cassandra does
mostly sequential I/O when streaming data during bootstrap or repair, which can lead to saturating the network
connection and degrading client (RPC) performance.
trickle_fsync
(Default: false) When doing sequential writing, enabling this option tells fsync to force the operating system to flush the
dirty buffers at a set interval (trickle_fsync_interval_in_kb [default: 10240]). Enable this parameter to avoid sudden dirty
buffer flushing from impacting read latencies. Recommended to use on SSDs, but not on HDDs.
read_request_timeout_in_ms
(Default: 10000) The time in milliseconds that the coordinator waits for read operations to complete.
range_request_timeout_in_ms
(Default: 10000) The time in milliseconds that the coordinator waits for sequential or index scans to complete.
154
Configuration
request_timeout_in_ms
(Default: 10000) The default timeout for other, miscellaneous operations.
truncate_request_timeout_in_ms
(Default: 60000) The time in milliseconds that the coordinator waits for truncates to complete. The long default value
allows for flushing of all tables, which ensures that anything in the commitlog is removed that could cause truncated data
to reappear. If auto_snapshot is disabled, you can reduce this time.
write_request_timeout_in_ms
(Default: 10000) The time in milliseconds that the coordinator waits for write operations to complete.
request_timeout_in_ms
(Default: 10000) The default timeout for other, miscellaneous operations.
request_scheduler
(Default: org.apache.cassandra.scheduler.NoScheduler) Defines a scheduler to handle incoming client requests
according to a defined policy. This scheduler is useful for throttling client requests in single clusters containing multiple
keyspaces. Valid values are:
org.apache.cassandra.scheduler.NoScheduler: No scheduling takes place and does not have any options.
org.apache.cassandra.scheduler.RoundRobinScheduler: See request_scheduler_options properties.
A Java class that implements the RequestScheduler interface.
request_scheduler_id
(Default: keyspace**) An identifier on which to perform request scheduling. Currently the only valid value is keyspace.
request_scheduler_options
(Default: disabled) Contains a list of properties that define configuration options for request_scheduler:
throttle_limit: (Default: 80) The number of active requests per client. Requests beyond this limit are queued up
until running requests complete. Recommended value is ((concurrent_reads + concurrent_writes) * 2).
default_weight: (Default: 1**) How many requests are handled during each turn of the RoundRobin.
weights: (Default: 1 or default_weight) How many requests are handled during each turn of the RoundRobin,
based on the request_scheduler_id. Takes a list of keyspaces: weights.
rpc_keepalive
(Default: true) Enable or disable keepalive on client connections.
rpc_max_threads
(Default: unlimited**) Regardless of your choice of RPC server (rpc_server_type), the number of maximum requests in
the RPC thread pool dictates how many concurrent requests are possible. However, if you are using the parameter sync
in the rpc_server_type, it also dictates the number of clients that can be connected. For a large number of client
155
Configuration
connections, this could cause excessive memory usage for the thread stack. Connection pooling on the client side is
highly recommended. Setting a maximum thread pool size acts as a safeguard against misbehaved clients. If the
maximum is reached, Cassandra blocks additional connections until a client disconnects.
rpc_min_threads
(Default: 16**) Sets the minimum thread pool size for remote procedure calls.
rpc_recv_buff_size_in_bytes
(Default: N/A**) Sets the receiving socket buffer size for remote procedure calls.
rpc_send_buff_size_in_bytes
(Default: N/A**) Sets the sending socket buffer size in bytes for remote procedure calls.
streaming_socket_timeout_in_ms
(Default: 0 - never timeout streams**) Enable or disable socket timeout for streaming operations. When a timeout occurs
during streaming, streaming is retried from the start of the current file. Avoid setting this value too low, as it can result in
a significant amount of data re-streaming.
rpc_server_type
(Default: sync) Cassandra provides three options for the RPC server. On Windows, sync is about 30% slower than
hsha. On Linux, sync and hsha performance is about the same, but hsha uses less memory.
sync: (default) One connection per thread in the RPC pool. For a very large number of clients, memory is the
limiting factor. On a 64 bit JVM, 128KB is the minimum stack size per thread. Connection pooling is strongly
recommended.
hsha: Half synchronous, half asynchronous. The RPC thread pool is used to manage requests, but the threads
are multiplexed across the different clients. All Thrift clients are handled asynchronously using a small number of
threads that does not vary with the number of clients (and thus scales well to many clients). The RPC requests are
synchronous (one thread per active request).
Your own RPC server: You must provide a fully-qualified class name of an o.a.c.t.TServerFactory that can create
a server instance.
thrift_framed_transport_size_in_mb
(Default: 15) Frame size (maximum field length) for Thrift. The frame is the row or part of the row the application is
inserting.
thrift_max_message_length_in_mb
(Default: 16) The maximum length of a Thrift message in megabytes, including all fields and internal Thrift overhead (1
byte of overhead for each frame).
Message length is usually used in conjunction with batches. A frame length greater than or equal to 24 accommodates a
batch with four inserts, each of which is 24 bytes. The required message length is greater than or equal to
24+24+24+24+4 (number of frames).
156
Configuration
(Default: 0.0) Sets the performance threshold for dynamically routing requests away from a poorly performing node. A
value of 0.2 means Cassandra continues to prefer the static snitch values until the node response time is 20% worse
than the best performing node. Until the threshold is reached, incoming client requests are statically routed to the
closest replica (as determined by the snitch). Having requests consistently routed to a given replica can help keep a
working set of data hot when read repair is less than 1.
dynamic_snitch_reset_interval_in_ms
(Default: 600000) Time interval in milliseconds to reset all node scores, which allows a bad node to recover.
dynamic_snitch_update_interval_in_ms
(Default: 100) The time interval in milliseconds for calculating read latency.
hinted_handoff_enabled
(Default: true) Enable or disable hinted handoff. A hint indicates that the write needs to be replayed to an unavailable
node. Where Cassandra writes the hint depends on the version:
Prior to 1.0: Writes to a live replica node.
1.0 and later: Writes to the coordinator node.
hinted_handoff_throttle_in_kb
(Default: 1024) Rate per delivery thread that hints are sent to the node in kilobytes per second.
max_hint_window_in_ms
(Default: 10800000 - 3 hours) Defines how long in milliseconds to generate and save hints for an unresponsive node.
After this interval, new hints are no longer generated until the node is back up and responsive. If the node goes down
again, a new interval begins. This setting can prevent a sudden demand for resources when a node is brought back
online and the rest of the cluster attempts to replay a large volume of hinted writes.
max_hints_delivery_threads
(Default: 2) Number of threads with which to deliver hints. For multiple data center deployments, consider increasing this
number because cross data-center handoff is generally slower.
phi_convict_threshold
(Default: 8**) Adjusts the sensitivity of the failure detector on an exponential scale. Lower values increase the likelihood
that an unresponsive node will be marked as down, while higher values decrease the likelihood that transient failures
will cause a node failure. In unstable network environments (such as EC2 at times), raising the value to 10 or 12 helps
prevent false failures. Values higher than 12 and lower than 5 are not recommended.
incremental_backups
157
Configuration
(Default: false) Backs up data updated since the last snapshot was taken. When enabled, Cassandra creates a hard link
to each SSTable flushed or streamed locally in a backups/ subdirectory of the keyspace data. Removing these links is
the operator's responsibility.
snapshot_before_compaction
(Default: false) Enable or disable taking a snapshot before each compaction. This option is useful to back up data when
there is a data format change. Be careful using this option because Cassandra does not clean up older snapshots
automatically.
Security properties
authenticator
(Default: org.apache.cassandra.auth.AllowAllAuthenticator) The authentication backend. It implements IAuthenticator,
which is used to identify users.
authorizer
(Default: org.apache.cassandra.auth.AllowAllAuthorizer) The authorization backend. It implements IAuthenticator, which
limits access and provides permissions.
permissions_validity_in_ms
(Default: 2000) How long permissions in cache remain valid. Depending on the authorizer, fetching permissions can be
resource intensive. This setting is automatically disabled when AllowAllAuthorizer is set.
server_encryption_options
Enable or disable inter-node encryption. You must also generate keys and provide the appropriate key and trust store
locations and passwords. No custom encryption options are currently enabled.
The available options are:
internode_encryption: (Default: none) Enable or disable encryption of inter-node communication using the
TLS_RSA_WITH_AES_128_CBC_SHA cipher suite for authentication, key exchange, and encryption of data
transfers. The available inter-node options are:
all: Encrypt all inter-node communications.
none: No encryption.
dc: Encrypt the traffic between the data centers (server only).
rack: Encrypt the traffic between the racks(server only).
keystore: (Default: conf/.keystore) The location of a Java keystore (JKS) suitable for use with Java Secure Socket
Extension (JSSE), which is the Java version of the Secure Sockets Layer (SSL), and Transport Layer Security
(TLS) protocols. The keystore contains the private key used to encrypt outgoing messages.
keystore_password: (Default: cassandra) Password for the keystore.
truststore: (Default: conf/.truststore) Location of the truststore containing the trusted certificate for authenticating
remote servers.
truststore_password: (Default: cassandra) Password for the truststore.
The passwords used in these options must match the passwords used when generating the keystore and truststore. For
instructions on generating these files, see: Creating a Keystore to Use with JSSE.
The advanced settings are:
158
Configuration
client_encryption_options
Enable or disable client-to-node encryption. You must also generate keys and provide the appropriate key and trust
store locations and passwords. No custom encryption options are currently enabled.
enabled: (Default: false) To enable, set to true.
keystore: (Default: conf/.keystore) The location of a Java keystore (JKS) suitable for use with Java Secure Socket
Extension (JSSE), which is the Java version of the Secure Sockets Layer (SSL), and Transport Layer Security
(TLS) protocols. The keystore contains the private key used to encrypt outgoing messages.
keystore_password: (Default: cassandra) Password for the keystore. This must match the password used when
generating the keystore and truststore.
require_client_auth: (Default: false) Enables or disables certificate authentication. (Available starting with
Cassandra 1.2.3.)
truststore: (Default: conf/.truststore) Set if require_client_auth is true.
truststore_password: <truststore_password> Set if require_client_auth is true.
The advanced settings are:
protocol: (Default: TLS)
algorithm: (Default: SunX509)
store_type: (Default: JKS)
cipher_suites: (Default: TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA)
internode_send_buff_size_in_bytes
(Default: N/A**) Sets the sending socket buffer size in bytes for inter-node calls.
internode_recv_buff_size_in_bytes
(Default: N/A**) Sets the receiving socket buffer size in bytes for inter-node calls.
internode_compression
(Default: all) Controls whether traffic between nodes is compressed. The valid values are:
all: All traffic is compressed.
dc: Traffic between data centers is compressed.
none: No compression.
inter_dc_tcp_nodelay
(Default: false) Enable or disable tcp_nodelay for inter-data center communication. When disabled larger, but fewer,
network packets are sent. This reduces overhead from the TCP protocol itself. However, if cross data-center responses
are blocked, it will increase latency.
159
ssl_storage_port
(Default: 7001) The SSL port for encrypted communication. Unused unless enabled in encryption_options.
Keyspace attributes
A keyspace must have a user-defined name, a replica placement strategy, and options that specify the number of
copies per data center or node.
Attribute
Default Value
name
placement_strategy
SimpleStrategy
strategy_options
durable_writes
true
name
Required. The name for the keyspace.
placement_strategy
Required. Called strategy_class in CQL. Determines how Cassandra distributes replicas for a keyspace among nodes in
the ring.
Values are:
SimpleStrategy or org.apache.cassandra.locator.SimpleStrategy
NetworkTopologyStrategy or org.apache.cassandra.locator.NetworkTopologyStrategy
NetworkTopologyStrategy requires a properly configured snitch to be able to determine rack and data center locations of
a node. For more information about replication placement strategy, see About data replication.
strategy_options
Specifies configuration options for the chosen replication strategy class. The replication factor option is the total number
of replicas across the cluster. A replication factor of 1 means that there is only one copy of each row on one node. A
replication factor of 2 means there are two copies of each row, where each copy is on a different node. All replicas are
equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the
number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of
nodes.
When the replication factor exceeds the number of nodes, writes are rejected, but reads are served as long as the
desired consistency level can be met.
To set a placement strategy and options using CQL, see CREATE KEYSPACE. For more information about configuring
the replication placement strategy for a cluster and data centers, see Choosing keyspace replication options.
durable_writes
160
Table attributes
The following attributes can be declared per table.
Option
Default Value
bloom_filter_fp_chance
bucket_high
1.5
bucket_low
0.5
caching
keys_only
column_metadata
column_type
Standard
comment
n/a
compaction_strategy
SizeTieredCompactionStrategy
compaction_strategy_options
comparator
BytesType
[1]
compare_subcolumns_with
BytesType
compression_options
sstable_compression='SnappyCompressor'
default_validation_class
n/a
dclocal_read_repair_chance
0.0
gc_grace_seconds
key_validation_class
n/a
[2]
max_compaction_threshold
max_threshold
min_compaction_threshold
min_threshold
32
[3]
32
[2]
[3]
memtable_flush_after_mins
4
[1]
memtable_operations_in_millions
memtable_throughput_in_mb
[1]
n/a
[1]
n/a
n/a
min_sstable_size
50MB
name
n/a
read_repair_chance
replicate_on_write
true
sstable_size_in_mb
5MB
tombstone_compaction_interval
1 day
tombstone_threshold
0.2
bloom_filter_fp_chance
161
bucket_high
(Default: 1.5) Size-tiered compaction considers SSTables to be within the same bucket if the SSTable size diverges by
50% or less from the default bucket_low and default bucket_high values: [average_size * bucket_low, average_size *
bucket_high].
bucket_low
(Default: 0.5) See bucket_high for a description.
caching
(Default: keys_only) Optimizes the use of cache memory without manual tuning. Set caching to one of the following
values:
all
keys_only
rows_only
none
Cassandra weights the cached data by size and access frequency. In Cassandra 1.1 and later, use this parameter to
specify a key or row cache instead of a table cache, as in earlier versions.
3
4
5
Ignored in Cassandra 1.2, but can still be declared for backward compatibility.
Used by Thrift and CQL 2; ignored in CQL 3.
The CQL 3 attribute name for the max_compaction_threshold and min_compaction_threshold
Cassandra storage options.
162
chunk_length_kb
(Default: 64KB) On disk SSTables are compressed by block (to allow random reads). This subproperty of compression
defines the size (in KB) of the block. Values larger than the default value might improve the compression rate, but
increases the minimum size of data to be read from disk when a read occurs. The default value (64) is a good
middle-ground for compressing tables. Adjust compression size to account for read/write access patterns (how much
data is typically requested at once) and the average size of rows in the table.
column_metadata
(Default: N/A - container attribute) Column metadata defines these attributes of a column:
Attribute
Description
name
validation_class
index_name
index_type
Setting a value for the name option is required. The validation_class is set to the default_validation_class of the table if
you do not set the validation_class option explicitly. The value of index_type must be set to create a secondary index for
a column. The value of index_name is not valid unless index_type is also set.
Setting and updating column metadata with the Cassandra CLI requires a slightly different command syntax than other
attributes; note the brackets and curly braces in this example:
[default@demo] UPDATE COLUMN FAMILY users WITH comparator=UTF8Type
AND column_metadata=[{column_name: full_name, validation_class: UTF8Type, index_type: KEYS}];
column_type
(Default: Standard) The standard type of table contains regular columns.
comment
(Default: N/A) A human readable comment describing the table.
compaction_strategy
(Default: SizeTieredCompactionStrategy) Sets the compaction strategy for the table. The available strategies are:
SizeTieredCompactionStrategy: The default compaction strategy and the only compaction strategy available in
releases earlier than Cassandra 1.0. This strategy triggers a minor compaction whenever there are a number of
similar sized SSTables on disk (as configured by min_threshold). Using this strategy causes bursts in I/O activity
while a compaction is in process, followed by longer and longer lulls in compaction activity as SSTable files grow
larger in size. These I/O bursts can negatively effect read-heavy workloads, but typically do not impact write
performance. Watching disk capacity is also important when using this strategy, as compactions can temporarily
double the size of SSTables for a table while a compaction is in progress.
LeveledCompactionStrategy: The leveled compaction strategy creates SSTables of a fixed, relatively small size
(5 MB by default) that are grouped into levels. Within each level, SSTables are guaranteed to be non-overlapping.
Each level (L0, L1, L2 and so on) is 10 times as large as the previous. Disk I/O is more uniform and predictable as
SSTables are continuously being compacted into progressively larger levels. At each level, row keys are merged
into non-overlapping SSTables. This can improve performance for reads, because Cassandra can determine
which SSTables in each level to check for the existence of row key data. This compaction strategy is modeled
after Google's leveldb implementation. For more information, see the articles When to Use Leveled Compaction
and Leveled Compaction in Apache Cassandra.
163
compaction_strategy_options
(Default: N/A - container attribute) Sets attributes related to the chosen compaction_strategy. Attributes are:
bucket_high
bucket_low
max_threshold
min_threshold
min_sstable_size
sstable_size_in_mb
tombstone_compaction_interval
tombstone_threshold
CQL examples show how to set and update compaction properties.
comparator
(Default: BytesType) Defines the data types used to validate and sort column names. There are several built-in column
comparators available. The comparator cannot be changed after you create a table.
compare_subcolumns_with
(Default: BytesType) Required when the column_type attribute is set to Super. Same as comparator but for the
sub-columns of a super column. Ignored by Cassandra 1.2, but can be declared for backward compatibility.
compression_options
(Default: N/A - container attribute) Sets the compression algorithm and subproperties for the table. Choices are:
sstable_compression chunk_length_kb crc_check_chance
Using CQL presents examples of setting and updating compression properties.
crc_check_chance
(Default 1.0) When compression is enabled, each compressed block includes a checksum of that block for the purpose
of detecting disk bitrot and avoiding the propagation of corruption to other replica. This option defines the probability with
which those checksums are checked during read. By default they are always checked. Set to 0 to disable checksum
checking and to 0.5, for instance, to check them on every other read.
default_validation_class
(Default: N/A) Defines the data type used to validate column values. There are several built-in column validators
available.
dclocal_read_repair_chance
(Default: 0.0) Specifies the probability of read repairs being invoked over all replicas in the current data center. Contrast
read_repair_chance.
gc_grace_seconds
(Default: 864000 [10 days]) Specifies the time to wait before garbage collecting tombstones (deletion markers). The
default value allows a great deal of time for consistency to be achieved prior to deletion. In many deployments this
interval can be reduced, and in a single-node cluster it can be safely set to zero. When using CLI, use gc_grace instead
164
key_validation_class
(Default: N/A) Defines the data type used to validate row key values. There are several built-in key validators available,
however CounterColumnType (distributed counters) cannot be used as a row key validator.
max_compaction_threshold
(Default: 32) Used by Thrift and CQL2. Ignored in CQL3; replaced by max_threshold. Sets the maximum number of
SSTables processed by one minor compaction.
max_threshold
(Default: 32) Maximum number
sizeTieredCompactionStrategy.
of
SSTables
processed
by
one
minor
compaction
when
using
min_compaction_threshold
(Default: 4) Used by Thrift and CQL2. Ignored in CQL3; replaced by min_threshold. Sets the minimum number of
SSTables to trigger a minor compaction when compaction_strategy=sizeTieredCompactionStrategy.
min_threshold
(Default: 4) Sets the minimum number of SSTables to start a minor compaction when using
sizeTieredCompactionStrategy. Raising this value causes minor compactions to start less frequently and be more
I/O-intensive.
memtable_flush_after_mins
Deprecated as of Cassandra 1.0. Can still be declared (for backward compatibility) but is ignored. Use
commitlog_total_space_in_mb.
memtable_operations_in_millions
Deprecated as of Cassandra 1.0. Can still be declared (for backward compatibility) but is ignored. Use
commitlog_total_space_in_mb.
memtable_throughput_in_mb
Deprecated as of Cassandra 1.0. Can still be declared (for backward compatibility) but is ignored. Use
commitlog_total_space_in_mb.
min_sstable_size
(Default: 50MB) The size-tiered compaction strategy groups SSTables for compaction into buckets. The bucketing
process groups SSTables that differ in size by less than 50%. This results in a bucketing process that is too fine grained
for small SSTables. If your SSTables are small, use min_sstable_size to defines a size threshold (in bytes) below which
all SSTables belong to one unique bucket.
name
(Default: N/A) Required. The user-defined name of the table.
read_repair_chance
165
replicate_on_write
(Default: true) Applies only to counter tables. When set to true, replicates writes to all affected replicas regardless of the
consistency level specified by the client for a write request. For counter tables, this should always be set to true.
sstable_size_in_mb
(Default: 5MB) The target size for sstables that use the leveled compaction strategy. Although SSTable sizes should be
less or equal to sstable_size_in_mb, it is possible to have a larger SSTable during compaction. This occurs when data
for a given partition key is exceptionally large. The data is not split into two SSTables.
sstable_compression
(Default: SnappyCompressor) The compression algorithm to use. Valid values are SnappyCompressor (Snappy
compression library) and DeflateCompressor (Java zip implementation). Use an empty string ('') to disable compression.
Snappy compression offers faster compression/decompression while the Java zip compression offers better
compression ratios. Choosing the right one depends on your requirements for space savings over read performance.
For read-heavy workloads, Snappy compression is recommended. Developers can also implement custom compression
classes using the org.apache.cassandra.io.compress.ICompressor interface. Specify the full class name as
a "string constant".
tombstone_compaction_interval
(Default: 1 day) The mininum time to wait after an SSTable creation time before considering the SSTable for tombstone
compaction. Tombstone compaction is the compaction triggered if the SSTable has more garbage-collectable
tombstones than tombstone_threshold.
tombstone_threshold
(Default: 0.2) A ratio of garbage-collectable tombstones to all contained columns, which if exceeded by the SSTable
triggers compaction (with no other sstables) for the purpose of purging the tombstones.
166
1. Open the cassandra-env.sh file for editing. This file is located in:
Packaged installs
/etc/dse/cassandra
Binary installs
<install_location>/resources/cassandra/conf
2. Scroll down to the comment about the heap dump path:
# set jvm HeapDumpPath with CASSANDRA_HEAPDUMP_DIR
3. On the line after the comment, set the CASSANDRA_HEAPDUMP_DIR to the path you want to use:
# set jvm HeapDumpPath with CASSANDRA_HEAPDUMP_DIR
CASSANDRA_HEAPDUMP_DIR=<path>
4. Save the cassandra-env.sh file and restart DataStax Enterprise.
167
Logging Configuration
access.properties
This file contains entries in the format:
KEYSPACE[.TABLE].PERMISSION=USERS
KEYSPACE is the keyspace name.
TABLE is the table name.
PERMISSION is one of <ro> or <rw> for read-only or read-write respectively.
USERS is a comma delimited list of users from passwd.properties.
For example, to control access to Keyspace1 and give jsmith and Elvis read-only permissions while allowing dilbert full
read-write access to add and remove tables, you would create the following entries:
Keyspace1.<ro>=jsmith,Elvis Presley
Keyspace1.<rw>=dilbert
To provide a finer level of access control to the Standard1 table in Keyspace1, you would create the following entry to
allow the specified users read-write access:
Keyspace1.Standard1.<rw>=jsmith,Elvis Presley,dilbert
The access.properties file also contains a simple list of users who have permissions to modify the list of
keyspaces:
<modify-keyspaces>=jsmith
passwd.properties
This file contains name/value pairs in which the names match users defined in access.properties and the values
are user passwords. Passwords are in clear text unless the passwd.mode=MD5 system property is provided.
jsmith=havebadpass
Elvis Presley=graceland4ever
dilbert=nomoovertime
Logging Configuration
Cassandra provides logging functionality using Simple Logging Facade for Java (SLF4J) with a log4j backend.
Additionally, the output.log captures the stdout of the Cassandra process, which is configurable using the standard
Linux logrotate facility. You can also change logging levels via JMX using the JConsole tool.
168
Logging Configuration
The default configuration rolls the log file once the size exceeds 20MB and maintains up to 50 backups. When the
maxFileSize is reached, the current log file is renamed to system.log.1 and a new system.log is started. Any
previous backups are renumbered from system.log.n to system.log.n+1, which means the higher the number,
the older the file. When the maximum number of backups is reached, the oldest file is deleted.
If an issue occurred but has already been rotated out of the current system.log, check to see if it is captured in an
older backup. If you want to keep more history, increase the maxFileSize, maxBackupIndex, or both. However, make
sure you have enough space to store the additional logs.
By default, logging output is placed the /var/log/cassandra/system.log. You can change the location of the
output by editing the log4j.appender.R.File path. Be sure that the directory exists and is writable by the process
running Cassandra.
output.log
The output.log stores the stdout of the Cassandra process; it is not controllable from log4j. However, you can rotate
it using the standard Linux logrotate facility. To configure logrotate to work with cassandra, create a file called
/etc/logrotate.d/cassandra with the following contents:
/var/log/cassandra/output.log {
size 10M
rotate 9
missingok
copytruncate
compress
}
The copytruncate directive is critical because it allows the log to be rotated without any support from Cassandra for
closing and reopening the file. For more information, refer to the logrotate man page.
Note
Be aware that increasing logging levels can generate a lot of logging output on even a moderately trafficked cluster.
Logging Levels
The default logging level is determined by the following line in the log4j-server.properties file:
log4j.rootLogger=INFO,stdout,R
To exert more fine-grained control over your logging, you can specify the logging level for specific categories. The
categories usually (but not always) correspond to the package and class name of the code doing the logging.
For example, the following setting logs DEBUG messages from all classes in the org.apache.cassandra.db package:
log4j.logger.org.apache.cassandra.db=DEBUG
169
are
logged
specifically
from
the
StorageProxy
class
in
the
log4j.logger.org.apache.cassandra.service.StorageProxy=DEBUG
3. If you find that a particular class logs too many messages, use the following format to set a less verbose logging
level for that class by adding a line for that class:
loggerog4j.logger.package.class=WARN
For example a busy Solr node can log numerous INFO messages from the SolrCore,
LogUpdateProcessorFactory, and SolrIndexSearcher classes. To suppress these messages, add the following
lines:
log4j.logger.org.apache.solr.core.SolrCore=WARN
log4j.logger.org.apache.solr.update.processor.LogUpdateProcessorFactory=WARN
log4j.logger.org.apache.solr.search.SolrIndexSearcher=WARN
4. After determining which category a particular message belongs to you may want to revert the messages back to
the default format. Do this by removing %c from the ConversionPattern.
this
feature
in
the
Commands
The commands archive_command and restore_command expect only a single command with arguments. STDOUT and
STDIN or multiple commands cannot be executed. To workaround, you can script multiple commands and add a pointer
to this file. To disable a command, leave it blank.
Archive a segment
170
archive_command=
Parameters
<path>
<name>
Example
restore_command=
Parameters
<from>
<to>
Example
restore_directories=
Format
restore_directories=<restore_directory location>
Restore Mutations
Restore mutations created up to and including the specified timestamp.
Command
restore_point_in_time=
Format
<timestamp>
Example
restore_point_in_time=2012-08-16 20:43:12
Restore stops when the first client-supplied timestamp is greater than the restore point timestamp. Because the order in
which Cassandra receives mutations does not strictly follow the timestamp order, this can leave some mutations
unrecovered.
171
Operations
Operations
Monitoring a Cassandra cluster
Understanding the performance characteristics of your Cassandra cluster is critical to diagnosing issues and planning
capacity.
Cassandra exposes a number of statistics and management operations via Java Management Extensions (JMX). Java
Management Extensions (JMX) is a Java technology that supplies tools for managing and monitoring Java applications
and services. Any statistic or operation that a Java application has exposed as an MBean can then be monitored or
manipulated using JMX.
During normal operation, Cassandra outputs information and statistics that you can monitor using JMX-compliant tools,
such as:
JConsole
The Cassandra The nodetool utility utility
DataStax OpsCenter management console.
Using the same tools, you can perform certain administrative commands and operations such as flushing caches or
doing a node repair.
172
Operations
Within OpsCenter you can customrize the performance metrics viewed to meet your monitoring needs. Administrators
can also perform routine node administration tasks from OpsCenter. Metrics within OpsCenter are divided into three
general categories: table metrics, cluster metrics, and OS metrics. For many of the available metrics, you can view
aggregated cluster-wide information or view information on a per-node basis.
Operations
general health with the status command. For example:
The nodetool utility provides commands for viewing detailed metrics for tables, server metrics, and compaction statistics.
Commands include decommissioning a node, running repair, and moving partitioning tokens.
174
Operations
If you choose to monitor Cassandra using JConsole, keep in mind that JConsole consumes a significant amount of
system resources. For this reason, DataStax recommends running JConsole on a remote machine rather than on the
same host as a Cassandra node.
The JConsole CompactionManagerMBean exposes compaction metrics that can indicate when you need to add
capacity to your cluster.
Compaction metrics
Monitoring compaction performance is an important aspect of knowing when to add capacity to your cluster. The
following attributes are exposed through CompactionManagerMBean:
Attribute
Description
CompletedTasks
Number of completed compactions since the last start of this Cassandra instance
PendingTasks
ColumnFamilyInProgress
The table currently being compacted. This attribute is null if no compactions are in
progress.
BytesTotalInProgress
Total number of data bytes (index and filter are not included) being compacted. This
attribute is null if no compactions are in progress.
BytesCompacted
The progress of the current compaction. This attribute is null if no compactions are in
progress.
175
Operations
Thread Pool
Description
AE_SERVICE_STAGE
CONSISTENCY-MANAGER
Handles the background consistency checks if they were triggered from the client's
consistency level.
FLUSH-SORTER-POOL
FLUSH-WRITER-POOL
GOSSIP_STAGE
LB-OPERATIONS
LB-TARGET
MEMTABLE-POST-FLUSHER
Tasks resulting from the call of system_* methods in the API that have modified
the schema.
MISC_STAGE
MUTATION_STAGE
READ_STAGE
RESPONSE_STAGE
Response tasks from other nodes to message streaming from this node.
STREAM_STAGE
Table Statistics
For individual tables, ColumnFamilyStoreMBean provides the same general latency attributes as StorageProxyMBean.
Unlike StorageProxyMBean, ColumnFamilyStoreMBean has a number of other statistics that are important to monitor
for performance trends. The most important of these are:
Attribute
Description
MemtableDataSize
The total size consumed by this table's data (not including metadata).
MemtableColumnsCount
Returns the total number of columns present in the memtable (across all keys).
MemtableSwitchCount
RecentReadLatencyMicros
The average read latency since the last call to this bean.
RecentWriterLatencyMicros The average write latency since the last call to this bean.
LiveSSTableCount
The recent read latency and write latency counters are important in making sure operations are happening in a
consistent manner. If these counters start to increase after a period of staying flat, you probably need to add capacity to
the cluster.
You can set a threshold and monitor LiveSSTableCount to ensure that the number of SSTables for a given table does
not become too great.
176
Configuring caches
177
178
One read operation hits the row cache, returning the requested row without a disk seek. The other read operation
requests a row that is not present in the row cache but is present in the key cache. After accessing the row in the
SSTable, the system returns the data and populates the row cache with this read operation.
179
Compacting SSTables
In the background, Cassandra periodically merges SSTables together into larger SSTables using a process called
compaction. Compaction merges row fragments, removes expired tombstones, and rebuilds primary and secondary
indexes. Because the SSTables are sorted by row key, this merge is efficient (no random disk I/O). After a newly
merged SSTable is complete, the input SSTables are marked as obsolete and eventually deleted by the JVM garbage
collection (GC) process. However, during compaction, there is a temporary spike in disk space usage and disk I/O.
Cassandra 1.2 tracks the times that tombstones can be dropped for TTL-configured and deleted columns and performs
compaction when columns exceed a CQL-configurable threshold. Also, as of 1.2, you can better manage tombstone
removal and avoid manually performing user-defined compaction to recover disk space. The CQL-configurable sets the
minimum time to wait after an SSTable creation time before considering the SSTable for tombstone compaction.
The capability to perform multiple, independent leveled compactions in parallel promotes full I/O utilization when using
SSD hardware, which is not bottlenecked by I/O. Cassandra's leveled compaction strategy creates SSTables of a fixed,
relatively small size that are grouped into levels. Each level (L0, L1, L2 and so on) is 10 times as large as the previous.
Cassandra executes compactions in parallel between different levels, and performs multiple compactions per level. To
configure this feature, set the multithreaded_compaction setting to true in the cassandra.yaml configuration file and set
the compaction_strategy as described in Configuring compaction below.
Compaction impacts reads in two ways. During compaction temporary increases in disk I/O and disk utilization can
impact read performance for reads that are not fulfilled by the cache. However, after a compaction has been completed,
off-cache read performance improves since there are fewer SSTable files on disk that need to be checked to complete a
read request.
Configuring compaction
In addition to consolidating SSTables, the compaction process merges keys, combines columns, discards tombstones,
and creates a new index in the merged SSTable.
There are two different compaction strategies that you can configure on a table:
Size-tiered compaction
Leveled compaction
To set compaction, construct a property map using CQL. Set compaction properties using a map collection:
name = { 'name' : value, 'name', value : 'name', value ... }
In this string, italics indicates optional.
To create or update a table to set the compaction strategy, use the ALTER or CREATE TABLE statements. For
example:
ALTER TABLE users WITH
compaction =
{ 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 10 }
For the list of options and more information, see CQL 3 table storage properties.
180
Configuring compression
Compression maximizes the storage capacity of Cassandra nodes by reducing the volume of data on disk and disk I/O,
particularly for read-dominated workloads. Cassandra quickly finds the location of rows in the SSTable index and
decompresses the relevant row chunks.
Write performance is not negatively impacted by compression in Cassandra as it is in traditional databases. In traditional
relational databases, writes require overwrites to existing data files on disk. The database has to locate the relevant
pages on disk, decompress them, overwrite the relevant data, and finally recompress. In a relational database,
compression is an expensive operation in terms of CPU cycles and disk I/O. Because Cassandra SSTable data files are
immutable (they are not written to again after they have been flushed to disk), there is no recompression cycle
necessary in order to process writes. SSTables are compressed only once when they are written to disk. Writes on
compressed tables can show up to a 10 percent performance improvement.
181
182
Heap Size
2GB to 4GB
1GB
General guidelines
Many users new to Cassandra are tempted to turn up Java heap size too high, which consumes the majority of the
underlying system's RAM. In most cases, increasing the Java heap size is actually detrimental for these reasons:
In most cases, the capability of Java 6 to gracefully handle garbage collection above 8GB quickly diminishes.
Modern operating systems maintain the OS page cache for frequently accessed data and are very good at
keeping this data in memory, but can be prevented from doing its job by an elevated Java heap size.
If you have more than 2GB of system memory, which is typical, keep the size of the Java heap relatively small to allow
more memory for the page cache.
Guidelines for DSE Search/Solr users
Some Solr users have reported that increasing the stack size improves performance under Tomcat. To increase the
stack size, uncomment and modify the default -Xss128k setting in the cassandra-env.sh file. Also, decreasing the
memtable space to make room for Solr caches might improve performance. Modify the memtable space using the
memtable_total_space_in_mb property in the cassandra.yaml file.
Guidelines for Analytics/Hadoop users
Because MapReduce runs outside the JVM, changes to the JVM do not affect Analytics/Hadoop operations directly.
Repairing nodes
Cassandra's GCInspector class logs information about garbage collection whenever a garbage collection takes longer
than 200ms. Garbage collections that occur frequently and take a moderate length of time to complete (such as
ConcurrentMarkSweep taking a few seconds), indicate that there is a lot of garbage collection pressure on the JVM.
Remedies include adding nodes, lowering cache sizes, or adjusting the JVM options regarding garbage collection.
JMX options
Cassandra exposes a number of statistics and management operations via Java Management Extensions (JMX). Java
Management Extensions (JMX) is a Java technology that supplies tools for managing and monitoring Java applications
and services. Any statistic or operation that a Java application has exposed as an MBean can then be monitored or
manipulated using JMX. JConsole, The nodetool utility and DataStax OpsCenter are examples of JMX-compliant
management tools.
By default, you can modify the following properties in the conf/cassandra-env.sh file to configure JMX to listen on port
7199 without authentication.
com.sun.management.jmxremote.port
The port on which Cassandra listens from JMX connections.
com.sun.management.jmxremote.ssl
Enable/disable SSL for JMX.
com.sun.management.jmxremote.authenticate
Enable/disable remote authentication for JMX.
-Djava.rmi.server.hostname
Sets the interface hostname or IP that JMX should use to connect. Uncomment and set if you are having trouble
connecting.
Repairing nodes
This section discusses running routine node repair.
184
In systems that seldom delete or overwrite data, it is possible to raise the value of gc_grace_seconds with minimal
impact to disk space. This allows wider intervals for scheduling repair operations with the nodetool utility.
Note
If you do not use virtual nodes, follow the instructions in the 1.1 topic Adding Capacity to an Existing Cluster.
To add nodes to a cluster
1. Install Cassandra on the new nodes, but do not start Cassandra. (If you used a packaged install, Cassandra starts
automatically and you must stop the node and clear the data.)
2. Set the following properties in the cassandra.yaml and cassandra-topology.properties configurtion files:
cluster_name: The name of the cluster the new node is joining.
listen_address/broadcast_address: The IP address or host name that other Cassandra nodes use to
connect to the new node.
endpoint_snitch: The snitch Cassandra uses for locating nodes and routing requests.
num_tokens: The number of virtual nodes to assign to the node. If the hardware capabilities vary among the
nodes in your cluster, you can assign a proportional number of virtual nodes to the larger machines.
seed_provider: The - seeds list in this setting determines which nodes the new node should contact to
learn about the cluster and establish the gossip process.
Change other non-default settings you have made to your existing cluster in the cassandra.yaml file and
cassandra-topology.properties files. Use the diff command to find and merge (by head) any differences
between existing and new nodes.
3. Start Cassandra on each new node. Allow two minutes between node initializations. You can monitor the startup
and data streaming process using nodetool netstats.
4. After all new nodes are running, run nodetool cleanup on each of the previously existing nodes to remove the keys
no longer belonging to those nodes. Wait for cleanup to complete on one node before doing the next.
Cleanup may be safely postponed for low-usage hours.
185
1. Ensure that you are using NetworkTopologyStrategy for all of your keyspaces.
2. For each new node, edit the configuration properties in the cassandra.yaml file:
Add (or edit) auto_bootstrap: false. By default, this setting is true and not listed in the cassandra.yaml file.
Setting this parameter to false prevent the new nodes from attempting to get all the data from the other
nodes in the data center. When you run nodetool rebuild in the last step, each node is properly mapped.
Set the necessary properties as described in step 2 above.
3. If using the PropertyFileSnitch, update the cassandra-topology.properties file on all servers to include the new
nodes. You do not need to restart.
The location of this file depends on the type of installation; see Cassandra Configuration Files Locations or
DataStax Enterprise Configuration Files Locations.
4. Ensure that your client does not auto-detect the new nodes so that they aren't contacted by the client until
explicitly directed. For example in Hector, set hostConfig.setAutoDiscoverHosts(false);
5. If using a QUORUM consistency level for reads or writes, check the LOCAL_QUORUM or EACH_QUORUM
consistency level to see if the level meets your requirements for multiple data centers.
6. Start the new nodes.
7. After all nodes are running in the cluster:
a. Change the strategy_options for your keyspace to the desired replication factor for the new data center. For
example: set strategy options to DC1:2, DC2:2. For more information, see ALTER KEYSPACE.
b. Run nodetool rebuild on all nodes in the new data center.
2. Add and start the replacement node as described in the steps above.
3. Remove the dead node from the cluster using the removenode command. Use the force option of this command if
necessary to remove the node.
186
Note
If JNA is enabled, snapshots are performed by hard links. If not enabled, I/O activity increases as the files are copied
from one location to another, which significantly reduces efficiency.
Taking a snapshot
Snapshots are taken per node using the nodetool snapshot command. To take a global snapshot, run the
nodetool snapshot command using a parallel ssh utility, such as pssh.
A snapshot first flushes all in-memory writes to disk, then makes a hard link of the SSTable files for each keyspace. By
default the snapshot files are stored in the /var/lib/cassandra/data/<keyspace_name>/<table_name>/snapshots
directory.
You must have enough free disk space on the node to accommodate making snapshots of your data files. A single
snapshot requires little disk space. However, snapshots can cause your disk usage to grow more quickly over time
because a snapshot prevents old obsolete data files from being deleted. After the snapshot is complete, you can move
the backup files to another location if needed, or you can leave them in place.
To create a snapshot of a node
Run the nodetool snapshot command, specifying the hostname, JMX port, and keyspace. For example:
$ nodetool -h localhost -p 7199 snapshot demdb
The snapshot is created in <data_directory_location>/<keyspace_name>/<table_name>/snapshots/<snapshot_name>.
Each snapshot folder contains numerous .db files that contain the data at the time of the snapshot.
187
Note
Restoring from snapshots and incremental backups temporarily causes intensive CPU and I/O activity on the node
being restored.
To restore a node from a snapshot and incremental backups:
1. Shut down the node.
2. Clear all files in /var/lib/cassandra/commitlog.
3. Delete all *.db files in this directory:
<data_directory_location>/<keyspace_name>/<table_name>
DO NOT delete the /snapshots and /backups subdirectories.
4. Locate the most recent snapshot folder in this directory:
<data_directory_location>/<keyspace_name>/<table_name>/snapshots/<snapshot_name>
Copy its contents into this directory:
<data_directory_location>/<keyspace_name>/<table_name> directory.
5. If using incremental backups, copy all contents of this directory:
<data_directory_location>/<keyspace_name>/<table_name>/backups
Paste it into this directory:
<data_directory_location>/<keyspace_name>/<table_name>.
6. Restart the node.
Restarting causes a temporary burst of I/O activity and consumes a large amount of CPU resources.
188
References
References
The nodetool utility
The nodetool utility is a command line interface for Cassandra. You can use it to help manage a cluster.
In binary installations, nodetool is located in the <install_location>/bin directory. Square brackets indicate optional
parameters.
Standard usage:
nodetool -h HOSTNAME [-p JMX_PORT] COMMAND
RMI usage:
If a username and password for RMI authentication are set explicitly in the cassandra-env.sh file for the host, then you
must specify credentials:
nodetool -h HOSTNAME [-p JMX_PORT -u JMX_USERNAME -pw JMX_PASSWORD] COMMAND
Options
The available options are:
Flag
Option
Description
-h
--host arg
-p
--port arg
-pr
--partitioner-range
Repair only the first range returned by the partitioner for the node.
-pw
--password arg
-u
--username arg
--column-family arg
-snapshot
--with-snapshot
-t
--tag arg
Command list
The available commands are:
Command List
cfhistograms
getendpoints
repair
cfstats
getsstables
ring
cleanup
gossipinfo
scrub
clearsnapshot
info
setcompactionthreshold
compact
invalidatekeycache
setcompactionthroughput
compactionstats
invalidaterowcache
setstreamthroughput
decomission
join
settraceprobability
describering
move
snapshot
189
References
disablegossip
netstats
status
disablethrift
rangekeysample
statusthrift
drain
rebuild
stop
enablegossip
rebuild_index
tpstats
enablethrift
refresh
upgradesstables
flush
removenode
version
getcompactionthreshold
Command details
Details for each command are listed below:
cfhistograms keyspace table
Displays statistics on the read/write latency for a table. These statistics, which include row size, column count, and
bucket offsets, can be useful for monitoring activity in a table.
cfstats
Displays statistics for every keyspace and table.
cleanup [keyspace][table]
Triggers the immediate cleanup of keys no longer belonging to this node. This has roughly the same effect on a node
that a major compaction does in terms of a temporary increase in disk space usage and an increase in disk I/O.
Optionally takes a list of table names.
clearsnapshot [keyspaces] -t [snapshotName]
Deletes snapshots for the specified keyspaces. You can remove all snapshots or remove the snapshots with the given
name.
compact [keyspace][table]
For tables that use the SizeTieredCompactionStrategy, initiates an immediate major compaction of all tables in
keyspace. For each table in keyspace, this compacts all existing SSTables into a single SSTable. This can cause
considerable disk I/O and can temporarily cause up to twice as much disk space to be used. Optionally takes a list of
table names.
compactionstats
Displays compaction statistics.
decommission
Tells a live node to decommission itself (streaming its data to the next node on the ring). Use netstats to monitor the
progress. See also:
http://wiki.apache.org/cassandra/NodeProbe#Decommission
http://wiki.apache.org/cassandra/Operations#Removing_nodes_entirely
describering [keyspace]
Shows the token ranges for a given keyspace.
disablegossip
Disable Gossip. Effectively marks the node dead.
disablethrift
Disables the Thrift server.
190
References
drain
Flushes all memtables for a node and causes the node to stop accepting write operations. Read operations will continue
to work. You typically use this command before upgrading a node to a new version of Cassandra.
enablegossip
Re-enables Gossip.
enablethrift
Re-enables the Thrift server.
flush [keyspace] [table]
Flushes all memtables for a keyspace to disk, allowing the commit log to be cleared. Optionally takes a list of table
names.
getcompactionthreshold keyspace table
Gets the current compaction threshold settings for a table. See http://wiki.apache.org/cassandra/MemtableSSTable.
getendpoints keyspace table key
Displays the end points that owns the key. The key is only accepted in HEX format.
getsstables keyspace table key
Displays the sstable filenames that own the key.
gossipinfo
Shows the gossip information for the cluster.
info
Outputs node information including the token, load info (on disk storage), generation number (times started), uptime in
seconds, and heap memory usage.
invalidatekeycache [keyspace] [tables]
Invalidates, or deletes, the key cache. Optionally takes a keyspace or list of table names. Leave a blank space between
each table name.
invalidaterowcache [keyspace] [tables]
Invalidates, or deletes, the row cache. Optionally takes a keyspace or list of table names. Leave a blank space between
each table name.
join
Causes the node to join the ring. This assumes that the node was initially not started in the ring, that is, started with
-Djoin_ring=false. Note that the joining node should be properly configured with the desired options for seed list,
initial token, and auto-bootstrapping.
move new_token
Moves a node to a new token. This essentially combines decommission and bootstrap. See:
http://wiki.apache.org/cassandra/Operations#Moving_nodes
netstats [host]
Displays network information such as the status of data streaming operations (bootstrap, repair, move, and
decommission) as well as the number of active, pending, and completed commands and responses.
rangekeysample
Displays the sampled keys held across all keyspaces.
rebuild [source_dc_name]
191
References
Rebuilds data by streaming from other nodes (similar to bootstrap). Use this command to bring up a new data center in
an existing cluster. See Adding a data center to a cluster.
rebuild_index keyspace table_name.index_name,index_name1
Fully
rebuilds
of
native
secondary
Standard3.IdxName,Standard3.IdxName1
index
for
given
table.
Example
of
index_names:
192
Usage
cassandra [OPTIONS]
Environment
Cassandra requires the following environment variables to be set:
JAVA_HOME - The path location of your Java Virtual Machine (JVM) installation
CLASSPATH - A path containing all of the required Java class files (.jar)
CASSANDRA_CONF - Directory containing the Cassandra configuration files
For convenience, on Linux, Cassandra uses an include file, cassandra.in.sh, to source these environment variables.
It will check the following locations for this file:
Environment setting for CASSANDRA_INCLUDE if set
<install_location>/bin
/usr/share/cassandra/cassandra.in.sh
193
/usr/local/share/cassandra/cassandra.in.sh
/opt/cassandra/cassandra.in.sh
<USER_HOME>/.cassandra.in.sh
Cassandra also uses the Java options set in $CASSANDRA_CONF/cassandra-env.sh. If you want to pass additional
options to the Java virtual machine, such as maximum and minimum heap size, edit the options in that file rather than
setting JVM_OPTS in the environment.
Options
-f Start the cassandra process in foreground (default is to start as a background process).
-p <filename> Log the process ID in the named file. Useful for stopping Cassandra by killing its PID.
-v Print the version and exit.
-D <parameter>
Passes in one of the following startup parameters:
Parameter
Description
access.properties=<filename>
cassandra-pidfile=<filename>
cassandra.config=<directory>
cassandra.initial_token=<token>
Sets the initial partitioner token for a node the first time the
node is started.
cassandra.join_ring=<true|false>
cassandra.load_ring_state=
<true|false>
Set to false to clear all gossip state for the node on restart.
Use if you have changed node information in
cassandra.yaml (such as listen_address).
cassandra.renew_counter_id=
<true|false>
cassandra.replace_token=<token>
cassandra.write_survey=true
cassandra.framed
cassandra.host
cassandra.port=<port>
cassandra.rpc_port=<port>
cassandra.start_rpc=<true|false>
194
cassandra.storage_port=<port>
corrupt-sstable-root
legacy-sstable-root
mx4jaddress
mx4jport
passwd.mode
passwd.properties=<file>
Examples
Start Cassandra on a node and log its PID to a file:
cassandra -p ./cassandra.pid
Clear gossip state when starting a node. This is useful if the node has changed its configuration, such as its listen IP
address:
cassandra -Dcassandra.load_ring_state=false
Start Cassandra on a node in stand-alone mode (do not join the cluster configured in the cassandra.yaml file):
cassandra -Dcassandra.join_ring=false
About sstableloader
The sstableloader tool streams a set of SSTable data files to a live cluster. It does not simply copy the set of SSTables
to every node, but transfers the relevant part of the data to each node, conforming to the replication strategy of the
cluster. The table into which the data is loaded does not need to be empty.
Because sstableloader uses Cassandra gossip, make sure that the cassandra.yaml configuration file is in the classpath
and set to communicate with the cluster. At least one node of the cluster must be configured as seed. If necessary,
properly configure the following properties: listen_address, storage_port, rpc_address, and rpc_port.
If you use sstableloader to load external data, you must first generate SSTables. If you use DataStax Enterprise, you
can use Sqoop to migrate your data or if you use Cassandra, follow the procedure described in Using the Cassandra
Bulk Loader blog.
Before loading the data, you must define the schema of the column families with CLI, Thrift, or CQL.
To get the best throughput from SSTable loading, you can use multiple instances of sstableloader to stream across
multiple machines. No hard limit exists on the number of SSTables that sstablerloader can run at the same time, so you
can add additional loaders until you see no further improvement.
If you use sstableloader on the same machine as the Cassandra node, you can't use the same network interface as the
Cassandra node. However, you can use the JMX > StorageService > bulkload() call from that node. This method takes
the absolute path to the directory where the SSTables are located, and loads them just as sstableloader does. However,
because the node is both source and destination for the streaming, it increases the load on that node. This means that
you should load data from machines that are not Cassandra nodes when loading into a live cluster.
Using sstableloader
In binary installations, sstableloader is located in the <install_location>/bin directory.
195
Description
--debug
-h,--help
Display help.
-i,--ignore <NODES>
--no-progress
-t,--throttle <throttle>
-v,--verbose
Verbose output.
Note
Starting with version 0.7, json2sstable and sstable2json must be run so that the schema can be loaded from system
tables. This means that the cassandra.yaml file must be in the classpath and refer to valid storage directories. For
more information, see the Import/Export section of http://wiki.apache.org/cassandra/Operations.
sstable2json
This converts the on-disk SSTable representation of a table into a JSON formatted document.
Usage
bin/sstable2json SSTABLE
[-k KEY [-k KEY [...]]]] [-x KEY [-x KEY [...]]] [-e]
SSTABLE should be a full path to a {table-name}-Data.db file in Cassandras data directory. For example,
/var/lib/cassandra/data/Keyspace1/Standard1-e-1-Data.db.
196
Install locations
-k allows you to include a specific set of keys. The KEY must be in HEX format. Limited to 500 keys.
-x allows you to exclude a specific set of keys. Limited to 500 keys.
-e causes keys to only be enumerated.
Output format
The output of sstable2json for tables is:
{
ROW_KEY:
{
[
[COLUMN_NAME, COLUMN_VALUE, COLUMN_TIMESTAMP, IS_MARKED_FOR_DELETE],
[COLUMN_NAME, ... ],
...
]
},
ROW_KEY:
{
...
},
...
}
Row keys, column names and values are written in as the HEX representation of their byte arrays. Line breaks are only
in between row keys in the actual output.
json2sstable
This converts a JSON representation of a table (aka column family) to a Cassandra usable SSTable format.
Usage
bin/json2sstable -K KEYSPACE -c COLUMN_FAMILY JSON SSTABLE
JSON should be a path to the JSON file
SSTABLE should be a full path to a {table-name}-Data.d` file in Cassandras data directory. For example,
/var/lib/cassandra/data/Keyspace1/Standard1-e-1-Data.db.
sstablekeys
The sstablekeys utility is shorthand for sstable2json with the -e option. Instead of dumping all of a tables data, it
dumps only the keys.
Usage
bin/sstablekeys SSTABLE
SSTABLE should be a full path to a {table-name}-Data.db file in Cassandras data directory. For example,
/var/lib/cassandra/data/Keyspace1/Standard1-e-1-Data.db.
Install locations
197
198
Note
On Enterprise Linux systems, the Cassandra service runs as a java process. On Debian systems, the Cassandra
service runs as a jsvc process.
199
Troubleshooting Guide
Troubleshooting Guide
This page contains recommended fixes and workarounds for issues commonly encountered with Cassandra:
Reads are getting slower while writes are still fast
Nodes seem to freeze after some period of time
Nodes are dying with OOM errors
Nodetool or JMX connections failing on remote nodes
View of ring differs between some nodes
Java reports an error saying there are too many open files
Insufficient user resource limits errors
Cannot initialize class org.xerial.snappy.Snappy
200
The memtable sizes are too large for the amount of heap allocated to the JVM. You can expect N + 2
memtables resident in memory, where N is the number of column families. Adding another 1GB on top of that for
Cassandra itself is a good estimate of total heap usage.
If none of these seem to apply to your situation, try loading the heap dump in MAT and see which class is consuming
the bulk of the heap for clues.
Java reports an error saying there are too many open files
Java is not allowed to open enough file descriptors. Cassandra generally needs more than the default (1024) amount.
To increase the number of file descriptors, change the security limits on your Cassandra nodes as described in the
Recommended settings section of Insufficient user resource limits errors.
Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof -n | grep java to check that
the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few
thousand.
Cassandra errors
Insufficient as (address space) or memlock setting:
ERROR [SSTableBatchOpen:1] 2012-07-25 15:46:02,913 AbstractCassandraDaemon.java (line 139)
Fatal exception in thread Thread[SSTableBatchOpen:1,5,main]
java.io.IOError: java.io.IOException: Map failed at ...
201
OpsCenter errors
See the OpsCenter Troubleshooting documentation.
Recommended settings
You can view the current limits using the ulimit -a command. Although limits can also be temporarily set using this
command, DataStax recommends permanently changing the settings by adding the following entries to your
/etc/security/limits.conf file:
* soft nofile 32768
* hard nofile 32768
root soft nofile 32768
root hard nofile 32768
* soft memlock unlimited
* hard memlock unlimited
root soft memlock unlimited
root hard memlock unlimited
* soft as unlimited
* hard as unlimited
root soft as unlimited
root hard as unlimited
In addition, you may need to be run the following command:
sysctl -w vm.max_map_count=131072
The command enables more mapping. It is not in the limits.conf file.
On
CentOS,
RHEL,
OEL
Sysems,
change
the
system
limits
from
1024
to
10240
/etc/security/limits.d/90-nproc.conf and then start a new shell for these changes to take effect.
*
soft
nproc
in
10240
202
203
204
205
206
The mapping from the IP addresses of nodes to physical and virtual locations, such as racks and data centers. There
are several types of snitches. The type of snitch affects the request routing mechanism.
SSTable
A sorted string table (SSTable) is an immutable data file to which Cassandra writes memtables periodically. SSTables
are stored on disk sequentially and maintained for each Cassandra table. See also About writes.
strong consistency
When reading data, Cassandra performs read repair before returning results.
superuser
By default, each installation of Cassandra includes a superuser account named cassandra and whose password is
also cassandra. This account allows a user to perform any action on the database cluster and create new login
accounts. It is recommended that the password be changed from the default.
table
In CQL 3, a collection of ordered (by name) columns. In previous versions of CQL, the column family was
synonymous, in many respects, to a table. In CQL 3 a table is sparse, including only columns that rows have been
assigned a value.
token
An element on the ring that depends on the partitioner. A token determines the node's position on the ring and the
portion of data it is responsible for. The range for the Murmur3Partitioner (default) is -263 to +263. The range for the
RandomPartitionerIntegers is 0 to 2 127-1.
tombstone
A marker in a row that indicates a column was deleted. During compaction, marked column are deleted.
TTL
Time-to-live. An optional expiration date for values inserted into a column. See also Expiring columns.
weak consistency
When reading data, Cassandra performs read repair after returning results.
upsert
A change in the database that updates a specified column in a row if the column exists or inserts the column if it does
not exist.
207
cqlsh Commands
ALTER KEYSPACE
ASSUME
ALTER TABLE
CAPTURE
ALTER USER
CONSISTENCY
BATCH
COPY
CREATE INDEX
DESCRIBE
CREATE KEYSPACE
EXIT
CREATE TABLE
SHOW
CREATE USER
SOURCE
DELETE
TRACING
DROP INDEX
DROP KEYSPACE
DROP TABLE
DROP USER
GRANT
INSERT
LIST PERMISSIONS
LIST USERS
REVOKE
SELECT
TRUNCATE
UPDATE
USE
208