0% found this document useful (0 votes)

20 views

HBase (Unit 4)

HBase is a distributed column-oriented database built on top of HDFS that provides Bigtable-like capabilities for the Hadoop ecosystem, with data stored in tables containing rows, columns, and versions. It uses a master-slave architecture with a single master and multiple region servers that host regions, and allows for fast random reads and writes through its data model of keys, column families, and columns.

Uploaded by

The piano guy

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

HBase (Unit 4)

Uploaded by

The piano guy

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

HBase: Overview

• HBase is a distributed column-oriented data

store built on top of HDFS

• HBase is an Apache open source project whose goal

is to provide storage for the Hadoop Distributed
Computing

• Data is logically organized into tables, rows and

columns

1
HBase: Part of Hadoop’s
Ecosystem

HBase is built on top of HDFS

HBase files are

internally stored
in HDFS

2
HBase vs. HDFS
• Both are distributed systems that scale to hundreds or
thousands of nodes

• HDFS is good for batch processing (scans over big files)

• Not good for record lookup
• Not good for incremental addition of small batches
• Not good for updates

3
HBase vs. HDFS (Cont’d)
• HBase is designed to efficiently address the above points
• Fast record lookup
• Support for record-level insertion
• Support for updates (not in place)

• HBase updates are done by creating new versions of

values

4
HBase vs. HDFS (Cont’d)

If application has neither random reads or writes  Stick to HDFS

5
HBase Data Model

6
HBase Data Model
• HBase is based on Google’s Bigtable model
• Key-Value pairs

Column Family

Row key

TimeStamp value

7
HBase Logical View

8
HBase: Keys and Column
Families
Each record is divided into Column Families

Each row has a Key

Each column family consists of one or more Columns

9
Column family named “anchor”
Column family named “Contents”

Column
Time
Row key “content Column “anchor:”
• Key Stamp
s:”
• Byte array
• Serves as the primary key “<html>
t12
…”
for the table
“com.apac Column named “apache.com”
“<html>
• Indexed far fast lookup he.ww t11
…”
w”
• Column Family t10
“anchor:apache
.com”
“APACH
E”
• Has a name (string)
“anchor:cnnsi.co
• Contains one or more t15
m”
“CNN”
related columns
“anchor:my.look. “CNN.co
t13
ca” m”
• Column
“com.cnn.w “<html>
• Belongs to one column ww” t6
…”
family
“<html>
• Included inside the row t5
…”
• familyName:columnName “<html>
t3
…”

10
Version number for each row

Column
Time
Row key “content Column “anchor:”
Stamp
• Version Number s:”

• Unique within each “<html>

t12
key …” value
“com.apac
“<html>
• By default System’s he.ww
w”
t11
…”
timestamp t10
“anchor:apache “APACH
.com” E”
• Data type is Long
“anchor:cnnsi.co
t15 “CNN”
m”
• Value (Cell) “anchor:my.look. “CNN.co
t13
ca” m”
• Byte array
“com.cnn.w “<html>
t6
ww” …”

“<html>
t5
…”
“<html>
t3
…”

11
Notes on Data Model
• HBase schema consists of several Tables
• Each table consists of a set of Column Families
• Columns are not part of the schema

• HBase has Dynamic Columns

• Because column names are encoded inside the cells
• Different cells can have different columns

“Roles” column family

has different columns
in different cells

12
Notes on Data Model (Cont’d)
• The version number can be user-supplied
• Even does not have to be inserted in increasing order
• Version number are unique within each key

• Table can be very sparse

Has two columns
• Many cells are empty [cnnsi.com & my.look.ca]

• Keys are indexed as the primary key

HBase Physical Model

14
HBase Physical Model
• Each column family is stored in a separate file (called HTables)

• Key & Version numbers are replicated with each column family

• Empty cells are not stored

15
Example

16
Column Families

17
HBase Regions
• Each HTable (column family) is partitioned horizontally
into regions
• Regions are counterpart to HDFS blocks

Each will be one region

18
HBase Architecture

19
Three Major Components
• The HBaseMaster
• One master

• The HRegionServer
• Many region servers

• The HBase client

20
HBase Components
• Region
• A subset of a table’s rows, like horizontal range partitioning
• Automatically done

• RegionServer (many slaves)

• Manages data regions
• Serves data for reads and writes (using a log)

• Master
• Responsible for coordinating the slaves
• Assigns regions, detects failures
• Admin functions

21
Big Picture

22
ZooKeeper
• HBase depends on
ZooKeeper

• By default HBase manages

the ZooKeeper instance
• E.g., starts and stops
ZooKeeper

• HMaster and HRegionServers

23
Creating a Table
HBaseAdmin admin= new HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new HColumnDescriptor("columnFamily1:");
column[1]=new HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);

24
Operations On Regions: Get()
• Given a key  return corresponding record

• For each value return the highest version

• Can control the number of versions you want

25
Operations On Regions: Scan()

26
Select value from table where
Get() key=‘com.apache.www’ AND
label=‘anchor:apache.com’

Time
Row key Column “anchor:”
Stamp

t12

t11
“com.apache.www”

t10 “anchor:apache.com” “APACHE”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6

t3
Select value from table
Scan() where anchor=‘cnnsi.com’

Time
Row key Column “anchor:”
Stamp

t12

t11
“com.apache.www”

t10 “anchor:apache.com” “APACHE”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6

t3
Operations On Regions: Put()
• Insert a new record (with a new key), Or

• Insert a record for an existing key

Implicit version number
(timestamp)

Explicit version number

29
Operations On Regions: Delete()

• Marking table cells as deleted

• Multiple levels
• Can mark an entire column family as deleted
• Can make all column families of a given row as deleted

30
HBase: Joins
• HBase does not support joins

• Can be done in the application layer

• Using scan() and get() operations

31
Altering a Table

32
Logging Operations

33
HBase Deployment

Master
node

Slave
nodes

34
HBase vs. HDFS

35
HBase vs. RDBMS

36
When to use HBase

HBase
No ratings yet
HBase
38 pages
Large-Scale Data Management: Hbase
No ratings yet
Large-Scale Data Management: Hbase
36 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
Big Data Analytics & Technologies: Hbase
No ratings yet
Big Data Analytics & Technologies: Hbase
30 pages
b0e1c9217ce447eb90f001de93aa0803 Chapter03HBase—DistributedDatabase&Hive—
No ratings yet
b0e1c9217ce447eb90f001de93aa0803 Chapter03HBase—DistributedDatabase&Hive—
54 pages
Unix Makefiles
No ratings yet
Unix Makefiles
21 pages
Introduction To NOSQL and Cassandra: @rantav @outbrain
No ratings yet
Introduction To NOSQL and Cassandra: @rantav @outbrain
60 pages
Slides CS101 6 Dynamic Memory Allocation
No ratings yet
Slides CS101 6 Dynamic Memory Allocation
29 pages
4g Functions in Pandas - PPTX - Lyst2398
No ratings yet
4g Functions in Pandas - PPTX - Lyst2398
11 pages
L9 Queues PDF
No ratings yet
L9 Queues PDF
8 pages
Chapter 12 HBase[1]
No ratings yet
Chapter 12 HBase[1]
108 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
C7 Hbase
No ratings yet
C7 Hbase
36 pages
Lecture 25
No ratings yet
Lecture 25
39 pages
Chapter 4 HBase Technical Principles
No ratings yet
Chapter 4 HBase Technical Principles
50 pages
Hive PPTs
No ratings yet
Hive PPTs
34 pages
MasonCC F21 Intro To Log File Analysis
No ratings yet
MasonCC F21 Intro To Log File Analysis
20 pages
Unit - III: Using The Unix
No ratings yet
Unit - III: Using The Unix
32 pages
Unix Makefiles
No ratings yet
Unix Makefiles
17 pages
html & css
No ratings yet
html & css
4 pages
Tut2 Arvr
No ratings yet
Tut2 Arvr
5 pages
Mallet Tutorial
No ratings yet
Mallet Tutorial
120 pages
C++ Reference
No ratings yet
C++ Reference
11 pages
Goals of This Lecture !: C Programming Examples!
No ratings yet
Goals of This Lecture !: C Programming Examples!
14 pages
CS3505 Lecture4
No ratings yet
CS3505 Lecture4
33 pages
Assignment
No ratings yet
Assignment
7 pages
Html5 Sections
No ratings yet
Html5 Sections
13 pages
linux_cmd[1]
No ratings yet
linux_cmd[1]
14 pages
Introduction to LaTeX
100% (1)
Introduction to LaTeX
35 pages
Introduction To Linux Commands
100% (1)
Introduction To Linux Commands
62 pages
Final - Leverage The Power of Microsoft .NET Technology
No ratings yet
Final - Leverage The Power of Microsoft .NET Technology
17 pages
1-Getting Started With ELK
No ratings yet
1-Getting Started With ELK
44 pages
Web Designing: HTML, PHP, Mysql, Javascript
No ratings yet
Web Designing: HTML, PHP, Mysql, Javascript
99 pages
UNIX Cmds
No ratings yet
UNIX Cmds
178 pages
S06 Slides
No ratings yet
S06 Slides
10 pages
Class 30: Active Learning: Hashing
No ratings yet
Class 30: Active Learning: Hashing
24 pages
Unix Fundamentals and Command References: Solaris Linux Hp-Ux AIX
100% (1)
Unix Fundamentals and Command References: Solaris Linux Hp-Ux AIX
178 pages
6 Flume - Student - Datadotz
No ratings yet
6 Flume - Student - Datadotz
29 pages
XHTML: Ewebtech Website Development
No ratings yet
XHTML: Ewebtech Website Development
5 pages
Ado Lecture III 2024-26
No ratings yet
Ado Lecture III 2024-26
93 pages
1658355738240
No ratings yet
1658355738240
84 pages
IPL LaTeX Advanced
No ratings yet
IPL LaTeX Advanced
72 pages
Ans
No ratings yet
Ans
12 pages
L5 HashTables
No ratings yet
L5 HashTables
22 pages
Lecture 2 - data wrangling_update (2)
No ratings yet
Lecture 2 - data wrangling_update (2)
114 pages
Subject: Big-Data Analytics (CSE-420) Class: B.Tech (CSE) Semester: 6 Semester: 6 Lecture No. 13
No ratings yet
Subject: Big-Data Analytics (CSE-420) Class: B.Tech (CSE) Semester: 6 Semester: 6 Lecture No. 13
17 pages
Documentation Csv2xml English
No ratings yet
Documentation Csv2xml English
2 pages
AWS BlackBelt CloudFormation
No ratings yet
AWS BlackBelt CloudFormation
90 pages
Piping and Redirecting Output: Takeaways: Syntax
No ratings yet
Piping and Redirecting Output: Takeaways: Syntax
2 pages
Linux Sys Calls
No ratings yet
Linux Sys Calls
232 pages
Ahmad Mustaffar, PHD Consultant: Automated Data Extraction Using Apache Tika / Spacy Hexag 2019, Brunel University
No ratings yet
Ahmad Mustaffar, PHD Consultant: Automated Data Extraction Using Apache Tika / Spacy Hexag 2019, Brunel University
11 pages
Project 1
No ratings yet
Project 1
13 pages
Aplikasi DB-MKG 7
No ratings yet
Aplikasi DB-MKG 7
22 pages
Starting Unix
No ratings yet
Starting Unix
30 pages
CSV File Guide
From Everand
CSV File Guide
Mia Wright
No ratings yet
Ruby Gems Mastery: 100 Essential Packages for 2024
From Everand
Ruby Gems Mastery: 100 Essential Packages for 2024
Kanto
No ratings yet
IT5405: Fundamentals of Multimedia: University of Colombo, Sri Lanka
No ratings yet
IT5405: Fundamentals of Multimedia: University of Colombo, Sri Lanka
15 pages
Applications of Neuro Fuzzy Systems: A Brief Review and Future Outline
No ratings yet
Applications of Neuro Fuzzy Systems: A Brief Review and Future Outline
11 pages
Structural Analysis and Topological Optimization of Aircraft Aileron Bracket
No ratings yet
Structural Analysis and Topological Optimization of Aircraft Aileron Bracket
14 pages
Keyword Thesis
100% (3)
Keyword Thesis
5 pages
AI Mod-4
No ratings yet
AI Mod-4
5 pages
Embankment Dams
100% (1)
Embankment Dams
19 pages
Cinema Thetare Safety STD L & Work Shop Wiring and
No ratings yet
Cinema Thetare Safety STD L & Work Shop Wiring and
23 pages
Vent Guide 2019
No ratings yet
Vent Guide 2019
36 pages
LECTURES Information Security Notes
No ratings yet
LECTURES Information Security Notes
42 pages
Information Technology PAT GR 12 2025 Eng
No ratings yet
Information Technology PAT GR 12 2025 Eng
33 pages
CGS10BT 2019 SEM 1 StudyGuideFINAL
No ratings yet
CGS10BT 2019 SEM 1 StudyGuideFINAL
15 pages
Amazon Com Inc 2023 Shareholder Letter
No ratings yet
Amazon Com Inc 2023 Shareholder Letter
11 pages
Exercise 2: Optimization: Problem 1
No ratings yet
Exercise 2: Optimization: Problem 1
3 pages
Arithmetic Logic Unit: Objective
No ratings yet
Arithmetic Logic Unit: Objective
4 pages
Electrical and Electronic Equipment For Industrial Machinery
No ratings yet
Electrical and Electronic Equipment For Industrial Machinery
34 pages
Supply and Installation of Energy Dispersive X-Ray Fluorescence (Edxrf) Spectrometer at Regional Forensic Science Laboratories, Hubli
No ratings yet
Supply and Installation of Energy Dispersive X-Ray Fluorescence (Edxrf) Spectrometer at Regional Forensic Science Laboratories, Hubli
37 pages
Shaktiman Corporate Brochure PDF
No ratings yet
Shaktiman Corporate Brochure PDF
6 pages
Technical Data Sheet: Forged Steel Gate, Globe, and Check Valves
No ratings yet
Technical Data Sheet: Forged Steel Gate, Globe, and Check Valves
24 pages
Guide For Applicants - V1.8 - 0
No ratings yet
Guide For Applicants - V1.8 - 0
31 pages
Contoh Form Preventive Maintenace
No ratings yet
Contoh Form Preventive Maintenace
1 page
Homeroom Guidance (HG) Quarter 1 - Module 1 Level Up Your Study Habits
No ratings yet
Homeroom Guidance (HG) Quarter 1 - Module 1 Level Up Your Study Habits
4 pages
Faculty of Engineering and Technology: Register No Studentname SL No
No ratings yet
Faculty of Engineering and Technology: Register No Studentname SL No
2 pages
Project Management Business Plan
No ratings yet
Project Management Business Plan
1 page
Inert Gas Solutions For Product Tankers PDF
No ratings yet
Inert Gas Solutions For Product Tankers PDF
2 pages
Universal Automatic Identification System Uais Debeg 3400: Technical Manual
100% (1)
Universal Automatic Identification System Uais Debeg 3400: Technical Manual
106 pages
Geemap Readthedocs Io en Latest
No ratings yet
Geemap Readthedocs Io en Latest
94 pages
(Ebook) Instructor’s Solutions Manual of College Algebra And Trigonometry by Richard N. Aufmann ISBN 9780618825202, 0618825207 2024 Scribd Download
100% (2)
(Ebook) Instructor’s Solutions Manual of College Algebra And Trigonometry by Richard N. Aufmann ISBN 9780618825202, 0618825207 2024 Scribd Download
67 pages
Body, Lock & Security System: Section
No ratings yet
Body, Lock & Security System: Section
45 pages
Internet Enabled SCM
No ratings yet
Internet Enabled SCM
13 pages
Sigalarm Brochure 2018
No ratings yet
Sigalarm Brochure 2018
6 pages