Install Hadoop

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Installing the Default JRE/JDK (java)

sudo apt update


sudo apt install openjdk-8-jdk

Setting the JAVA_HOME Environment Variable:


sudo nano /etc/environment
add line:
JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
PATH="/usr/lib/jvm/java-8-openjdk-amd64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/
sbin:/bin:/usr/games:/usr/local/games"

source /etc/environment

echo $JAVA_HOME
java -version

Install ssh
sudo apt install openssh-server openssh-client
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

Install Hadoop:
wget https://dlcdn.apache.org/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz
tar xzf hadoop-2.10.1.tar.gz
mv hadoop-2.10.1 hadoop

Config Hadoop trong thư mục:

cd hadoop/etc/hadoop

nano core-site.xml
----- core-site.xml ---------
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

nano hdfs-site.xml
---- hdfs-site.xml ----
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.name.dir</name>
<value>file:///home/ubuntu/hadoop/hdfs/namenode </value>
</property>

<property>
<name>dfs.data.dir</name>
<value>file:///home/ubuntu/hadoop/hdfs/datanode </value>
</property>
</configuration>

nano yarn-site.xml
--- yarn-site.xml ----
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

## Đổi tên tập tin


cp mapred-site.xml.template mapred-site.xml
nano mapred-site.xml
----- mapred-site.xml -----
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

---------------
-- Environment Setup —
cp ~/.bashrc ~/.bashrc0
nano ~/.bashrc

Thêm các dòng sau vào cuối tập tin :


export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=~/hadoop
export PATH=$PATH:${JAVA_HOME}/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar

source ~/.bashrc

nano ~/hadoop/etc/hadoop/hadoop-env.sh
----- hadoop-env.sh ------
Thay đường dẫn : JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

-------------
- Name Node Setup
hdfs namenode -format
- Verifying Hadoop dfs
start-dfs.sh
- Verifying Yarn Script
start-yarn.sh
- Or start all:
start-all.sh

- jps
- Accessing Hadoop on Browser
http://localhost:50070/
- Verify All Applications for Cluster
http://localhost:8088/

--- Compile WordCount.java ---


$ hadoop com.sun.tools.javac.Main WordCount.java
$ jar cf wc.jar WordCount*.class
$ hadoop jar wc.jar WordCount /data/input /data/output1

Eclipse:
Them thu vien nguoi dung Hadoop: add cac file jar trong cac thu muc sau:

hadoop/share/hadoop/common/
hadoop/share/hadoop/common/lib
hadoop/share/hadoop/hdfs
hadoop/share/hadoop/yarn
hadoop/share/hadoop/mapreduce
Cai dat Hadoop cluster:
Apache Hadoop Installation on Multi Node Tutorial | CloudDuggu
Tutorial Hadoop multi node installation - intellitech.pro

Apache Hive:
Cài đặt và cấu hình:
>> https://sparkbyexamples.com/apache-hive/apache-hive-installation-on-hadoop/

Sau đó cấu hình thêm hive.server2 và Beeline:


>> http://www.mtitek.com/tutorials/bigdata/hive/install.php
>> Khi cấu hình Beeline , Bước 17:
${HADOOP_HOME}/etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.ubuntu.groups</name>
<value>*</value>
</property>

<property>
<name>hadoop.proxyuser.ubuntu.hosts</name>
<value>*</value>
</property>

ubuntu/cntt@2021 >> tài khoản cài hive của Ubuntu OS


>>> Restart lại HDFS: stop-dfs.sh ; start-dfs.sh

beeline> !connect jdbc:hive2://localhost:10000


>> ubuntu/cntt@2021

>> Thực hiện Command line:


$HIVE_HOME/bin/beeline -u jdbc:hive2://
hive> SET hive.exec.mode.local.auto=true;

Start HiveServer
$ mkdir ~/hiveserver2log
$ cd ~/hiveserver2log
$ nohup hiveserver2 &
$ nohup hive --service hiveserver2 &
$ nohup hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf
hive.root.logger=INFO,console &
$ tail -f ~/hiveserver2log/nohup.out

HiveServer web UI: http://localhost:10002

Start Hive MetaStore


$ mkdir ~/hivemetastorelog
$ cd ~/hivemetastorelog
$ nohup hive --service metastore &
$ tail -f ~/hiveserver2log/nohup.out

Hive Tutorial:
https://www.guru99.com/hive-tutorials.html

https://sparkbyexamples.com/apache-hive-tutorial/

You might also like