Install a Multi-Node Hadoop Cluster on Alibaba Cloud

Prerequisites

  • The first instance (configured as the master machine/system) is hadoop-master with an IP address of 172.31.129.196.
  • The second instance (configured as the slave machine/system) is hadoop-slave with an IP address of 172.31.129.197.

Procedure

Install Java

root@hadoop-master:~# sudo add-apt-repository ppa:openjdk-r/ppa
root@hadoop-master:~# sudo apt-get update
root@hadoop-master:~# sudo apt-get install openjdk-8-jdk
root@hadoop-master:~# sudo update-java-alternatives --list
java-1.8.0-openjdk-amd64 1081 /usr/lib/jvm/java-1.8.0-openjdk-amd64
root@hadoop-master:~# java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03)
OpenJDK 64-Bit Server VM (build 25.212-b03, mixed mode)
root@hadoop-master:~#  vi /etc/hosts/# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.31.129.196 hadoop-master
172.31.129.197 hadoop-slave
127.0.0.1 localhost localhost

Install Hadoop

root@hadoop-master:~# wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz
root@hadoop-master:~# tar -xzf hadoop-2.7.3.tar.gz
root@hadoop-master:~# ls
hadoop-2.7.3 hadoop-2.7.3.tar.gz
root@hadoop-master:~# ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:OawYX8MzsF8iXeF1FhIJcdqtf7bMwaoPsL9Cq0t/Pi4 root@hadoop-master
The key's randomart image is:
+---[RSA 2048]----+
| +o=o+. |
| . *.= |
| . + . . |
| * o . |
| . o S.. . |
| + = Oo .. |
| . o.o... .oo|
| . .E.o. +oo|
| oo.**=o + |
+----[SHA256]-----+
root@hadoop-master:~# cat .ssh/id_rsa.pub | ssh root@172.31.129.197 'cat >> .ssh/authorized_keys'The authenticity of host '172.31.129.197 (172.31.129.197)' can't be established.
ECDSA key fingerprint is SHA256:XOA5/7EcNPfEu/uNU/os0EekpcFkvIhKowreKhLD2YA.
Are you sure you want to continue connecting (yes/no)? yes
root@172.31.129.197's password:
root@hadoop-master:~# scp -r hadoop-2.7.3 root@172.31.129.197:/root/
root@hadoop-master:~# ssh root@172.31.129.197
Welcome to Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
Welcome to Alibaba Cloud Elastic Compute Service !Last login: Mon Jul 29 22:16:16 2019 from 172.31.129.196
Run ls command to check if the Hadoop directory got copied to the slave machine
root@hadoop-slave:~# ls
hadoop-2.7.3
root@hadoop-slave:~# exit
logout
Connection to 172.31.129.197 closed.
root@hadoop-master:~#

Configure Hadoop

root@hadoop-master:~# sudo vi .bashrcexport HADOOP_PREFIX="/root/hadoop-2.7.3"
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
root@hadoop-master:~# source .bashrc
root@hadoop-master:~# java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03)
OpenJDK 64-Bit Server VM (build 25.212-b03, mixed mode)
root@hadoop-master:~# hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /root/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
root@hadoop-master:~#
root@hadoop-master:~# mkdir hadoop-2.7.3/hdfs
root@hadoop-master:~# mkdir hadoop-2.7.3/hdfs/namenode
root@hadoop-master:~# mkdir hadoop-2.7.3/hdfs/datanode
root@hadoop-master:~/hadoop-2.7.3/etc/hadoop# cd
root@hadoop-master:~# cd hadoop-2.7.3/etc/hadoop/
root@hadoop-master:~/hadoop-2.7.3/etc/hadoop# ls
capacity-scheduler.xml kms-log4j.properties
configuration.xsl kms-site.xml
container-executor.cfg log4j.properties
core-site.xml mapred-env.cmd
hadoop-env.cmd mapred-env.sh
hadoop-env.sh mapred-queues.xml.template
hadoop-metrics2.properties mapred-site.xml
hadoop-metrics.properties mapred-site.xml.template
hadoop-policy.xml masters
hdfs-site.xml slaves
httpfs-env.sh ssl-client.xml.example
httpfs-log4j.properties ssl-server.xml.example
httpfs-signature.secret yarn-env.cmd
httpfs-site.xml yarn-env.sh
kms-acls.xml yarn-site.xml
kms-env.sh
root@hadoop-master:~/hadoop-2.7.3/etc/hadoop# vi masters
hadoop-master
root@hadoop-master:~/hadoop-2.7.3/etc/hadoop# vi slaves
hadoop-master
hadoop-slave
root@hadoop-slave:~/hadoop-2.7.3/etc/hadoop# vi slaves
hadoop-slave
root@hadoop-master:~/hadoop-2.7.3/etc/hadoop# vi core-site.xml<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. --><configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master:9000</value>
</property>
</configuration>
root@hadoop-master:~/hadoop-2.7.3/etc/hadoop# vi hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. --><configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/root/hadoop-2.7.3/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/root/hadoop-2.7.3/hdfs/datanode</value>
</property>
</configuration>
root@hadoop-master:~/hadoop-2.7.3/etc/hadoop# cp mapred-site.xml.template mapred-site.xml
root@hadoop-master:~/hadoop-2.7.3/etc/hadoop# vi mapred-site.xml
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. --><configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Edit yarn-site.xml. This file contains the configuration settings for YARN.root@hadoop-master:~/hadoop-2.7.3/etc/hadoop# vi yarn-site.xml<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
root@hadoop-master:~/hadoop-2.7.3/etc/hadoop# vi hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre

Start the Multi Node Hadoop Cluster

root@hadoop-master:~/hadoop-2.7.3/etc/hadoop# cd .. .. 
root@hadoop-master:~/hadoop-2.7.3# bin/hadoop namenode -format
root@hadoop-master:~/hadoop-2.7.3# ./sbin/start-all.sh
Starting namenodes on [hadoop-master]
hadoop-master: starting namenode, logging to /root/hadoop-2.7.3/logs/hadoop-root-namenode-hadoop-master.out
hadoop-master: starting datanode, logging to /root/hadoop-2.7.3/logs/hadoop-root-datanode-hadoop-master.out
hadoop-slave: starting datanode, logging to /root/hadoop-2.7.3/logs/hadoop-root-datanode-hadoop-slave.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-hadoop-master.out
starting yarn daemons
starting resourcemanager, logging to /root/hadoop-2.7.3/logs/yarn-root-resourcemanager-hadoop-master.out
hadoop-slave: starting nodemanager, logging to /root/hadoop-2.7.3/logs/yarn-root-nodemanager-hadoop-slave.out
hadoop-master: starting nodemanager, logging to /root/hadoop-2.7.3/logs/yarn-root-nodemanager-hadoop-master.out
root@hadoop-master:~/hadoop-2.7.3# jps
4144 NameNode
4609 ResourceManager
4725 NodeManager
4456 SecondaryNameNode
4283 DataNode
5054 Jps
root@hadoop-master:~/hadoop-2.7.3# ssh root@172.31.129.197
Welcome to Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
Welcome to Alibaba Cloud Elastic Compute Service !
Last login: Mon Jul 29 23:12:51 2019 from 172.31.129.196
Run the jps command to check if the datanode in up and running.
root@hadoop-slave:~# jps
23185 DataNode
23441 Jps
23303 NodeManager
root@hadoop-slave:~#

Original Source

--

--

--

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Migrating Postgres to Google Cloud SQL

Automation : a key driving force across industry verticals

Web Development using JSP & Servlet

Is ClickHouse really that fast? a friendly comparison with atoti

Top 5 Reasons to Choose Xamarin for Cross-Platform Development

Xamarin for Cross-Platform Development

GIT IT DONE

IntelliJ Idea Integrated with Bitbucket Repository

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

More from Medium

Apache Beam ParDo Transformations

Cassandra Monitoring Setup Using Open Source Tools

Historize elastic APM server data

AWS S3 Same Region Replication (SRR)

AWS S3 — Same Region Replication diagram