Unit 5 Lecture 3

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

Subject Name :-Cloud Computing

Subject Code :- KCS 713


Unit No. :- 5
Lecture No. :- 3
Topic Name :- Big Table Storage
Contents

1. BigTable Storage
2. Mutations and Deletions
3. Cloud Bigtable Architecture
4. Data Compressions
5. Data Durability
6. Security and Backups
7. Important Questions
8. References
Big Table Storage

• Each table is split into different row ranges, called tablets


• Each tablet is managed by a tablet server:
– Stores each column family for a given row range in a separate distributed file, called SSTable
• A single meta-data table is managed by a Meta-data server
– Locates the tablets of any user table in response to a read/write request
• The meta-data itself can be very large:
– Meta-data table can be similarly split into multiple tablets
– A root tablet points to other meta-data tablets
• Supports large parallel reads and inserts even simultaneously on the same table
• Insertions done in sorted fashion, and requires more work can simple append
BigTable Storage
Big Table Storage
• Google Bigtable is a Columnar Database and good for sparse data (Null Data)
• Sparse Table: A variable with sparse data is one in which a relatively high percentage of the variable's cells
do not contain actual data. Such "empty," or NA, values take up storage space in the file.
• Google Big Table contains row key, column key and time stamp
• Each table has only one index, the row key
• Rows are sorted lexicographically by row key, from the lowest to the highest byte string
• Column keys are grouped into sets called column families
• Each column is identified by a combination of the column family and a column qualifier, which is a unique name
within the column family
• Timestamp can be assigned automatically by Bigtable
BigTable
• Distributed structured storage system built on GFS
• Sparse, persistent, multi-dimensional sorted map (key-value pairs)
• Data is accessed by:
– Row key
– Column key
– Timestamp
MUTATIONS AND DELETIONS

Mutations, or changes, to a row take up extra storage space, because Cloud Bigtable stores mutations sequentially
and compacts them only periodically. When Cloud Bigtable compacts a table, it removes values that are no longer
needed. If you update the value in a cell, both the original value and the new value will be stored on disk for some
amount of time until the data is compacted.

Deletions also take up extra storage space, at least in the short term, because deletions are actually a specialized type
of mutation. Until the table is compacted, a deletion uses extra storage rather than freeing up space.
Cloud Bigtable's powerful back-end servers offer several key advantages over a self-managed HBase
installation:

 Incredible scalability. Cloud Bigtable scales in direct proportion to the number of machines in your cluster. A
self-managed HBase installation has a design bottleneck that limits the performance after a certain threshold is
reached. Cloud Bigtable does not have this bottleneck, so you can scale your cluster up to handle more reads and
writes.

 Simple administration. Cloud Bigtable handles upgrades and restarts transparently, and it automatically maintains
high data d. To replicate your data, simply add a second cluster to your instance, and replication starts
automatically. No more managing replicas or regions; just design your table schemas, and Cloud Bigtable will
handle the rest for you.

 Cluster resizing without downtime. You can increase the size of a Cloud Bigtable cluster for a few hours to
handle a large load, then reduce the cluster's size again—all without any downtime. After you change a cluster's
size, it typically takes just a few minutes under load for Cloud Bigtable to balance performance across all of the
nodes in your cluster.
You can use Cloud Bigtable to store and query all of the following types of data:

 Time-series data, such as CPU and memory usage over time for multiple servers.

 Marketing data, such as purchase histories and customer preferences.

 Financial data, such as transaction histories, stock prices, and currency exchange rates.

 Internet of Things data, such as usage reports from energy meters and home appliances.

 Graph data, such as information about how users are connected to one another.
Cloud Bigtable architecture

The following diagram shows a simplified version of Cloud Bigtable's overall architecture:
Cloud Bigtable architecture

• As the diagram illustrates, all client requests go through a front-end server before they are sent to a Cloud
Bigtable node. (In the original Bigtable paper, these nodes are called "tablet servers.")

• The nodes are organized into a Cloud Bigtable cluster, which belongs to a Cloud Bigtable instance, a
container for the cluster.

• Each node in the cluster handles a subset of the requests to the cluster.

• By adding nodes to a cluster, you can increase the number of simultaneous requests that the cluster can
handle, as well as the maximum throughput for the entire cluster.
Cloud Bigtable architecture

• If you enable replication by adding a second cluster, you can also send different types of traffic to different
clusters, and you can fail over to one cluster if the other cluster becomes unavailable.

• A Cloud Bigtable table is shared into blocks of contiguous rows, called tablets, to help balance the workload
of queries. (Tablets are similar to HBase regions.)

• Tablets are stored on Colossus, Google's file system, in SSTable format. An SSTable provides a persistent,
ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. Each tablet
is associated with a specific Cloud Bigtable node.

• In addition to the SSTable files, all writes are stored in Colossus's shared log as soon as they are
acknowledged by Cloud Bigtable, providing increased durability.
DATA COMPRESSION

Cloud Bigtable compresses your data automatically using an intelligent algorithm. You cannot configure
compression settings for your table. However, it is useful to know how to store data so that it can be compressed
efficiently:

 Random data cannot be compressed as efficiently as patterned data. Patterned data includes text, such as
the page you're reading right now.

 Compression works best if identical values are near each other, either in the same row or in adjoining
rows. If you arrange your row keys so that rows with identical chunks of data are next to each other, the data
can be compressed efficiently.

 Compress values larger than 1 MiB before storing them in Cloud Bigtable. This saves CPU cycles,
server memory and network bandwidth. Cloud Bigtable automatically turns off compression for values larger
than 1 MiB.
DATA DURABILITY

When you use Cloud Bigtable, your data is stored on Colossus, Google's internal, highly durable file system, using storage devices in
Google's data centers. You do not need to run an HDFS cluster or any other file system to use Cloud Bigtable. If your instance uses
replication, Cloud Bigtable maintains one copy of your data in Colossus for each cluster in the instance. Each copy is located in a
different zone or region, further improving durability.

Behind the scenes, Google uses proprietary storage methods to achieve data durability above and beyond what's provided by standard
HDFS three-way replication. In addition, we create backups of your data to protect against catastrophic events and provide for
disaster recovery.

.
SECURITY

Access to your Cloud Bigtable tables is controlled by your Google Cloud project and the
Identity and Access Management (IAM) roles that you assign to users. For example, you can assign IAM roles that prevent individual
users from reading from tables, writing to tables, or creating new instances. If someone does not have access to your project or does
not have an IAM role with appropriate permissions for Cloud Bigtable, they cannot access any of your tables.

You can manage security at the project, instance, and table levels. Cloud Bigtable does not support row-level, column-level, or cell-
level security restrictions.

BACKUPS

Cloud Bigtable backups let you save a copy of a table's schema and data, then restore from the backup to a new table at a later time.
Backups can help you recover from application-level data corruption or from operator errors such as accidentally deleting a table.
Important Questions

1. What is the recommended action to do in order to switch between SSD and HDD storage for your Google
Cloud Bigtable instance?
2. What is big table in cloud computing?
3. Does Google still use Bigtable?
4. What are the features of bigtable?
5. What is the difference between BigTable and BigQuery?
References
 RajkumarBuyya, James Broberg, Andrzej Goscinski: “Cloud Computing Principles
and Paradigms”, Willey 2014.
 https://www.ques10.com/p/13989/explain-architecture-of-google-file-system-1/
 https://www.sciencedirect.com/topics/computer-science/google-file-system
 https://www.researchgate.net/publication/220910111_The_Google_File_System
 Enterprise Cloud Computing - Technology, Architecture, Applications, Gautam Shroff, Cambridge
University Press, 2010
 https://cloud.google.com/bigtable/docs/overview

You might also like