Unit-5 BDA
Unit-5 BDA
Unit-5 BDA
BigData Analytics
(20CSE361)
Introduction to HBse
• HBase is distributed column-oriented
database built on top of the Hadoop file
system.
• HBase an Open-Source Non-Relational
Distributed DB modeled, after Google’s Bigtable
and it is written in java.
• Its developed as part of Apache Software
Foundation.
• HBase is data model that provide quick
random access to huge amounts of
structured data.
Introduction to HBse
• Its part of the Hadoop ecosystem that provides
random real-time read/write access to data in
the HDFS either data can store directly in HDFS or
through HBase.
ZooKeeper Service
Leader
1. Persistence znode:
( exists till explicitly deleted)
• This type of znode is alive even after the client
which created that specific znode is
disconnected.
• By default, in zookeeper, all nodes are
persistent if it is not specified.
Cont…
2. Ephemeral znode:
( exists as long as the session is alive and can’t
have childred )
• This type of zookeeper znode are alive until
the client is alive.
• Therefore, when the client gets a disconnect
from the zookeeper, it will also be deleted.
Moreover, Ephemeral nodes are not allowed
to have children.
Cont…
3. Sequential znode:
• Sequential znodes can be either ephemeral or
persistent. So when a new znode is created as
a sequential znode.
• You can assign the path of the znode by
attaching a 10 digit sequence number to the
original name.
Building Applications with Zookeeper
1.Configuration Management: ZooKeeper can
be used to store and manage configuration
information for applications. Rather than
hardcoding configuration values, applications
can retrieve them from ZooKeeper, allowing
for dynamic updates without the need for
application restarts. ZooKeeper's watches can
be used to receive notifications when
configuration values change.
Building Applications with Zookeeper
2. Distributed Locking: ZooKeeper provides a
distributed locking mechanism that can be used
to coordinate access to shared resources among
multiple processes or nodes. Applications can
use ZooKeeper's lock implementation to ensure
that only one process holds the lock at a time,
preventing conflicts and ensuring data
consistency.
Building Applications with Zookeeper
3. Leader Election: ZooKeeper can be used for
leader election in distributed systems.
Applications can utilize ZooKeeper's consensus
algorithm to elect a leader among a group of
nodes. The leader can then take on specific
responsibilities or tasks while the other nodes
act as followers.
Building Applications with Zookeeper
4. Service Discovery: ZooKeeper can serve as a
service registry and discovery mechanism.
Applications can register themselves as znodes
in ZooKeeper, providing information about their
availability and endpoints. Other applications
can then discover and interact with these
services by querying ZooKeeper.
Building Applications with Zookeeper
5. Distributed Queues: ZooKeeper can be used
to implement distributed queues, where
multiple processes or nodes can enqueue and
dequeue elements. This is useful for scenarios
where you need to distribute work among
multiple workers or ensure ordered processing
of tasks across multiple nodes.
Building Applications with Zookeeper
6. Name Services: ZooKeeper can act as a central
name service, maintaining consistent naming
information for distributed systems. Applications
can use ZooKeeper to store and retrieve naming
information such as node addresses or
endpoints, enabling dynamic discovery and
interaction with resources.
Building Applications with Zookeeper
7. Fault-tolerant Systems: ZooKeeper's fault-
tolerant and highly available nature makes it
well-suited for building fault-tolerant systems.
Applications can rely on ZooKeeper to provide
consistency and resilience, allowing them to
recover from failures and continue functioning
smoothly.
Installing and Running of Zookeeper
•Before installing ZooKeeper, make sure your system is running on any of the following
operating systems −
Any of Linux OS − Supports development and deployment. It is preferred for demo
applications.
Windows OS − Supports only development.
Mac OS − Supports only development.
•ZooKeeper server is created in Java and it runs on JVM. You need to use JDK 6 or
greater.
•Now, follow the steps given below to install ZooKeeper framework on your machine.
•Step 1: Verifying Java Installation
•We believe you already have a Java environment installed on your system. Just verify
it using the following command.
•$ java -version
•If you have Java installed on your machine, then you could see the version of installed
Java. Otherwise, follow the simple steps given below to install the latest version of
Java.
•Step 1.1: Download JDK
•Download the latest version of JDK by visiting the following link and download the
latest version. Java
•The latest version (while writing this tutorial) is JDK 8u 60 and the file is “jdk-8u60-
linuxx64.tar.gz”. Please download the file on your machine.
•Step 1.2: Extract the files
•Generally, files are downloaded to the downloads folder. Verify it and extract the tar
setup using the following commands.
•$ cd /go/to/download/path
•$ tar -zxf jdk-8u60-linux-x64.gz
•Step 1.3: Move to opt directory
•To make Java available to all users, move the extracted java content to
“/usr/local/java” folder.
•$ su
•password: (type password of root user)
•$ mkdir /opt/jdk
•$ mv jdk-1.8.0_60 /opt/jdk/
•Step 1.4: Set path
•To set path and JAVA_HOME variables, add the following commands to ~/.bashrc file.
•export JAVA_HOME = /usr/jdk/jdk-1.8.0_60
•export PATH=$PATH:$JAVA_HOME/bin
•Now, apply all the changes into the current running system.
•$ source ~/.bashrc
•Step 1.5: Java alternatives
•Use the following command to change Java alternatives.
•update-alternatives --install /usr/bin/java java /opt/jdk/jdk1.8.0_60/bin/java 100
•Step 1.6
•Verify the Java installation using the verification command (java -version) explained in
Step 1.
•Step 2: ZooKeeper Framework Installation
•Step 2.1: Download ZooKeeper
•To install ZooKeeper framework on your machine, visit the following link and
download the latest version of ZooKeeper. http://zookeeper.apache.org/releases.html
•As of now, the latest version of ZooKeeper is 3.4.6 (ZooKeeper-3.4.6.tar.gz).
•Step 2.2: Extract the tar file
•Extract the tar file using the following commands −
•$ cd opt/
•$ tar -zxf zookeeper-3.4.6.tar.gz
•$ cd zookeeper-3.4.6
•$ mkdir data
•
•Step 2.3: Create configuration file
•Open the configuration file named conf/zoo.cfg using the command vi
conf/zoo.cfg and all the following parameters to set as starting point.
•$ vi conf/zoo.cfg
•
•tickTime = 2000
•dataDir = /path/to/zookeeper/data
•clientPort = 2181
•initLimit = 5
•syncLimit = 2
•Once the configuration file has been saved successfully, return to the terminal again.
You can now start the zookeeper server.
•Step 2.4: Start ZooKeeper server
•Execute the following command −
•$ bin/zkServer.sh start
•After executing this command, you will get a response as follows −
•$ JMX enabled by default
•$ Using config: /Users/../zookeeper-3.4.6/bin/../conf/zoo.cfg
•$ Starting zookeeper ... STARTED
•Step 2.5: Start CLI
•Type the following command −
•$ bin/zkCli.sh
•After typing the above command, you will be connected to the ZooKeeper server and
you should get the following response.
•Connecting to localhost:2181
•................
•................
•................
•Welcome to ZooKeeper!
•................
•................
•WATCHER::
•WatchedEvent state:SyncConnected type: None path:null
•[zk: localhost:2181(CONNECTED) 0]
•Stop ZooKeeper Server
•After connecting the server and performing all the operations, you can stop the
zookeeper server by using the following command.
•$ bin/zkServer.sh stop
•ZooKeeper Command Line Interface (CLI) is used to interact with the ZooKeeper
ensemble for development purpose. It is useful for debugging and working around
with different options.
•To perform ZooKeeper CLI operations, first turn on your ZooKeeper server
(“bin/zkServer.sh start”) and then, ZooKeeper client (“bin/zkCli.sh”). Once the client
starts, you can perform the following operation −
Create znodes
Get data
Watch znode for changes
Set data
Create children of a znode
List children of a znode
Check Status
Remove / Delete a znode
•Now let us see above command one by one with an example.
•Create Znodes
•Create a znode with the given path. The flag argument specifies whether the created
znode will be ephemeral, persistent, or sequential. By default, all znodes are
persistent.
Ephemeral znodes (flag: e) will be automatically deleted when a session expires or
when the client disconnects.
Sequential znodes guaranty that the znode path will be unique.
ZooKeeper ensemble will add sequence number along with 10 digit padding to the
znode path. For example, the znode path /myapp will be converted to
/myapp0000000001 and the next sequence number will be /myapp0000000002. If
no flags are specified, then the znode is considered as persistent.
•Syntax
•create /path /data
•Sample
•create /FirstZnode “Myfirstzookeeper-app”
•Output
•[zk: localhost:2181(CONNECTED) 0] create /FirstZnode “Myfirstzookeeper-app”
•Created /FirstZnode
•Installing Apache ZooKeeper
•Steps for downloading and installing Zookeeper 3.4.6 with configuration for 3 nodes
Zookeeper:
1. Download and install JDK from
http://www.oracle.com/technetwork/java/javase/downloads/index.html or from
http://www.guru99.com/install-java.html - if not already installed.
Apache ZooKeeper server runs on JVM so this is an important prerequisite.
2. Go to http://zookeeper.apache.org/ and download the Zookeeper from release
page.
3. Choose to download from mirrors and select the first mirror.
4. Go to stable folder and download zookeeper-3.4.6.tar.gz
5. Unpack the tar ball with tar –zxvf zookeeper-3.4.6.tar.gz
6. Make a directory using mkdir /usr/local/zookeeper/data. You can make this
directory as root and then change the owner to any user needed.
1. Create a zookeeper configuration file using sudo vi /
usr/local/zookeeper/conf/zoo.cfg and place the following code:
•tickTime = 2000
•syncLimit = 5
•dataDir = /usr/local/zookeeper/data
•clientPor t= 2181
•server.1 = Master : 2888 : 3888
•server.2 = Slave1 : 2888 : 3888
•server.3 = Slave2 : 2888 : 3888
•
1. Create a file called myid in data folder using sudo vi /
usr/local/zookeeper/data/myid and write “1” in this file without quotes and save
it.
2. Do the same steps from 1 to 7 for other 2 servers but change myid data to 2 for
server 2 and 3 for server 3.
3. Use the command zkServer.sh start to start the Zookeeper on all servers
4. To confirm that the Zookeeper has started type jps and check for
QuorumPeerMain.
Introduction of SQOOP
• Sqoop − “SQL to Hadoop and Hadoop to SQL”
• Sqoop is a data transfer tool
• Sqoop transfer data between hadoop and
relational DB servers
• Sqoop is used to import data from relational DB
such as MySQL, Oracle
• Sqoop is used to export data from HDFS to
relational DB
• Tools -> Sqoop Import / Export
cont…
o Full Load.
o Incremental Load.
o Parallel import/export.
o Import results of SQL query.
o Compression.
o Connectors for all major RDBMS
Databases.
ADVANTAGES OF SQOOP