Unit 5 Lecture 3
Unit 5 Lecture 3
Unit 5 Lecture 3
1. BigTable Storage
2. Mutations and Deletions
3. Cloud Bigtable Architecture
4. Data Compressions
5. Data Durability
6. Security and Backups
7. Important Questions
8. References
Big Table Storage
Mutations, or changes, to a row take up extra storage space, because Cloud Bigtable stores mutations sequentially
and compacts them only periodically. When Cloud Bigtable compacts a table, it removes values that are no longer
needed. If you update the value in a cell, both the original value and the new value will be stored on disk for some
amount of time until the data is compacted.
Deletions also take up extra storage space, at least in the short term, because deletions are actually a specialized type
of mutation. Until the table is compacted, a deletion uses extra storage rather than freeing up space.
Cloud Bigtable's powerful back-end servers offer several key advantages over a self-managed HBase
installation:
Incredible scalability. Cloud Bigtable scales in direct proportion to the number of machines in your cluster. A
self-managed HBase installation has a design bottleneck that limits the performance after a certain threshold is
reached. Cloud Bigtable does not have this bottleneck, so you can scale your cluster up to handle more reads and
writes.
Simple administration. Cloud Bigtable handles upgrades and restarts transparently, and it automatically maintains
high data d. To replicate your data, simply add a second cluster to your instance, and replication starts
automatically. No more managing replicas or regions; just design your table schemas, and Cloud Bigtable will
handle the rest for you.
Cluster resizing without downtime. You can increase the size of a Cloud Bigtable cluster for a few hours to
handle a large load, then reduce the cluster's size again—all without any downtime. After you change a cluster's
size, it typically takes just a few minutes under load for Cloud Bigtable to balance performance across all of the
nodes in your cluster.
You can use Cloud Bigtable to store and query all of the following types of data:
Time-series data, such as CPU and memory usage over time for multiple servers.
Financial data, such as transaction histories, stock prices, and currency exchange rates.
Internet of Things data, such as usage reports from energy meters and home appliances.
Graph data, such as information about how users are connected to one another.
Cloud Bigtable architecture
The following diagram shows a simplified version of Cloud Bigtable's overall architecture:
Cloud Bigtable architecture
• As the diagram illustrates, all client requests go through a front-end server before they are sent to a Cloud
Bigtable node. (In the original Bigtable paper, these nodes are called "tablet servers.")
• The nodes are organized into a Cloud Bigtable cluster, which belongs to a Cloud Bigtable instance, a
container for the cluster.
• Each node in the cluster handles a subset of the requests to the cluster.
• By adding nodes to a cluster, you can increase the number of simultaneous requests that the cluster can
handle, as well as the maximum throughput for the entire cluster.
Cloud Bigtable architecture
• If you enable replication by adding a second cluster, you can also send different types of traffic to different
clusters, and you can fail over to one cluster if the other cluster becomes unavailable.
• A Cloud Bigtable table is shared into blocks of contiguous rows, called tablets, to help balance the workload
of queries. (Tablets are similar to HBase regions.)
• Tablets are stored on Colossus, Google's file system, in SSTable format. An SSTable provides a persistent,
ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. Each tablet
is associated with a specific Cloud Bigtable node.
• In addition to the SSTable files, all writes are stored in Colossus's shared log as soon as they are
acknowledged by Cloud Bigtable, providing increased durability.
DATA COMPRESSION
Cloud Bigtable compresses your data automatically using an intelligent algorithm. You cannot configure
compression settings for your table. However, it is useful to know how to store data so that it can be compressed
efficiently:
Random data cannot be compressed as efficiently as patterned data. Patterned data includes text, such as
the page you're reading right now.
Compression works best if identical values are near each other, either in the same row or in adjoining
rows. If you arrange your row keys so that rows with identical chunks of data are next to each other, the data
can be compressed efficiently.
Compress values larger than 1 MiB before storing them in Cloud Bigtable. This saves CPU cycles,
server memory and network bandwidth. Cloud Bigtable automatically turns off compression for values larger
than 1 MiB.
DATA DURABILITY
When you use Cloud Bigtable, your data is stored on Colossus, Google's internal, highly durable file system, using storage devices in
Google's data centers. You do not need to run an HDFS cluster or any other file system to use Cloud Bigtable. If your instance uses
replication, Cloud Bigtable maintains one copy of your data in Colossus for each cluster in the instance. Each copy is located in a
different zone or region, further improving durability.
Behind the scenes, Google uses proprietary storage methods to achieve data durability above and beyond what's provided by standard
HDFS three-way replication. In addition, we create backups of your data to protect against catastrophic events and provide for
disaster recovery.
.
SECURITY
Access to your Cloud Bigtable tables is controlled by your Google Cloud project and the
Identity and Access Management (IAM) roles that you assign to users. For example, you can assign IAM roles that prevent individual
users from reading from tables, writing to tables, or creating new instances. If someone does not have access to your project or does
not have an IAM role with appropriate permissions for Cloud Bigtable, they cannot access any of your tables.
You can manage security at the project, instance, and table levels. Cloud Bigtable does not support row-level, column-level, or cell-
level security restrictions.
BACKUPS
Cloud Bigtable backups let you save a copy of a table's schema and data, then restore from the backup to a new table at a later time.
Backups can help you recover from application-level data corruption or from operator errors such as accidentally deleting a table.
Important Questions
1. What is the recommended action to do in order to switch between SSD and HDD storage for your Google
Cloud Bigtable instance?
2. What is big table in cloud computing?
3. Does Google still use Bigtable?
4. What are the features of bigtable?
5. What is the difference between BigTable and BigQuery?
References
RajkumarBuyya, James Broberg, Andrzej Goscinski: “Cloud Computing Principles
and Paradigms”, Willey 2014.
https://www.ques10.com/p/13989/explain-architecture-of-google-file-system-1/
https://www.sciencedirect.com/topics/computer-science/google-file-system
https://www.researchgate.net/publication/220910111_The_Google_File_System
Enterprise Cloud Computing - Technology, Architecture, Applications, Gautam Shroff, Cambridge
University Press, 2010
https://cloud.google.com/bigtable/docs/overview