How to Install Hadoop Cluster on Ubuntu 16.04

Alibaba Cloud
4 min readMay 27, 2019

--

In this tutorial, we will learn how to setup an Apache Hadoop on a single node cluster in an Alibaba Cloud Elastic Compute Service (ECS) instance with Ubuntu 16.04.

Install Hadoop

Before starting, you will need to download the latest version of the Hadoop from their official website. You can download it with the following command:

wget http://www-eu.apache.org/dist/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz

Once the download is completed, extract the downloaded file with the following command:

tar -xvzf hadoop-3.1.0.tar.gz

Next, move the extracted directory to the /opt with the following command:

mv hadoop-3.1.0 /opt/hadoop

Next, change the ownership of the hadoop directory using the following command:

chown -R hadoop:hadoop /opt/hadoop/

Next, you will need to set an environment variable for Hadoop. You can do this by editing .bashrc file:

First, log in to hadoop user:

su - hadoop

Next, open .bashrc file:

nano .bashrc

Add the following lines at the end of the file:

export HADOOP_HOME=/opt/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Save and close the file, when you are finished. Then, initialize the environment variables using the following command:

source .bashrc

Next, you will also need to setup Java environment variable for Hadoop. You can do this by editing hadoop-env.shfile:

First, find the default Java path using the following command:

readlink -f /usr/bin/java | sed "s:bin/java::"

Output:

/usr/lib/jvm/java-8-openjdk-amd64/jre/

Now, open hadoop-env.sh file and paste above output in the hadoop-env.sh file:

nano /opt/hadoop/etc/hadoop/hadoop-env.sh

Make the following changes:

#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/

Save and close the file, when you are finished.

Configure Hadoop

Next, you will need to configure multiple configuration files to setup Hadoop infrastructure. First, log in with hadoop user and create a directory for hadoop file system storage:

mkdir -p /opt/hadoop/hadoopdata/hdfs/namenode
mkdir -p /opt/hadoop/hadoopdata/hdfs/datanode

First, you will need to edit core-site.xml file. This file contains the Hadoop port number information, file system allocated memory, data store memory limit and the size of Read/Write buffers.

nano /opt/hadoop/etc/hadoop/core-site.xml

Make the following changes:

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Save the file, then open the hdfs-site.xml file. This file contains the replication data value, namenode path and datanode path for local file systems.

nano /opt/hadoop/etc/hadoop/hdfs-site.xml

Make the following changes:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///opt/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///opt/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

Save the file, then open the mapred-site.xml file.

nano /opt/hadoop/etc/hadoop/mapred-site.xml

Make the following changes:

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Save the file, then open the yarn-site.xml file:

nano /opt/hadoop/etc/hadoop/yarn-site.xml

Make the following changes:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Save and close the file, when you are finished. If you would like to have more information, like how to install java to get started, access Hadoop services and test hadoop, please go to this tutorial.

Related Blog Posts

How to Safeguard Apache Web Server on Ubuntu

Apache is the cornerstone of modern web servers and is a powerful software solution for a large percentage of today’s internet economy.

In this guide, we will show you how to safeguard your Apache web server hosted on Alibaba Cloud Elastic Compute Service (ECS) against DDoS and brute-force attacks.

How to Install and Configure Seafile on Ubuntu 16.04

Seafile is a free, open source and cross-platform file storage system similar to Dropbox. It is used for sharing and syncing files between users and groups. It can be easily integrated with LDAP and WebDAV. It supports file versioning, snapshots and two-factor authentication. You can deploy it with MySQL, MariaDB, PostgreSQL, Apache and Nginx web server. Files are stored on Seafile server and can be synchronized with personal computers and mobile devices through apps. You can also access and manage Seafile through a web browser.

Related Market Product

Apache-Solr powered by Websoft9(Ubuntu16.04)

Websoft9 Apache Solr is a pre-configured, ready to run image for running Apache Solr on Alibaba Cloud.Solr is a standalone enterprise search server with a REST-like API.

Related Documentation

Deploy a Java Web project

This article describes how to deploy a Java Web project on a Linux instance with the basic configuration. This method is applicable to individual users who are new to website construction by using ECS, including installation of Tomcat with Apache.

Harden Hadoop environment security

Hadoop is an open-source, highly reliable and extensible distributed computing framework developed by the Apache Software Foundation. The core design of the Hadoop framework is HDFS and MapReduce.

Related Products

Anti-DDoS Pro

Alibaba Cloud Anti-DDoS Pro is a paid service that features a set of high-defensive IPs, and acts as a protective barrier for the origin. It safeguards network servers under high volume DDoS attacks. After configuring the high defensive IPs for the network servers, all traffic passes through the Anti-DDoS Pro instance before rerouting to the origin.

Elastic Compute Service

Alibaba Cloud Elastic Compute Service (ECS) provides fast memory and the latest Intel CPUs to help you to power your cloud applications and achieve faster results with low latency. All ECS instances come with Anti-DDoS protection to safeguard your data and applications from DDoS and Trojan attacks.

Related Course

Hadoop Cluster Installation on Alibaba Cloud ECS

This course is designed to help users who want to understand Big Data technology, as well as data processing through cloud products. Through learning, users can fully understand the construction and basic use of Hadoop cluster, SSH protocol foundation, Hadoop directory structure, ECS security group configuration. It provides a reference for enterprises to build Hadoop cluster by themselves.

Reference:https://www.alibabacloud.com/blog/how-to-install-hadoop-cluster-on-ubuntu-16-04_594845?spm=a2c41.12910455.0.0

--

--

Alibaba Cloud
Alibaba Cloud

Written by Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com

No responses yet