How to Install Hadoop Cluster on Ubuntu 16.04

In this tutorial, we will learn how to setup an Apache Hadoop on a single node cluster in an Alibaba Cloud Elastic Compute Service (ECS) instance with Ubuntu 16.04.

Install Hadoop

Before starting, you will need to download the latest version of the Hadoop from their official website. You can download it with the following command:

wget http://www-eu.apache.org/dist/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz
tar -xvzf hadoop-3.1.0.tar.gz
mv hadoop-3.1.0 /opt/hadoop
chown -R hadoop:hadoop /opt/hadoop/
su - hadoop
nano .bashrc
export HADOOP_HOME=/opt/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
source .bashrc
readlink -f /usr/bin/java | sed "s:bin/java::"
/usr/lib/jvm/java-8-openjdk-amd64/jre/
nano /opt/hadoop/etc/hadoop/hadoop-env.sh
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/

Configure Hadoop

Next, you will need to configure multiple configuration files to setup Hadoop infrastructure. First, log in with hadoop user and create a directory for hadoop file system storage:

mkdir -p /opt/hadoop/hadoopdata/hdfs/namenode
mkdir -p /opt/hadoop/hadoopdata/hdfs/datanode
nano /opt/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
nano /opt/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///opt/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///opt/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
nano /opt/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
nano /opt/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Related Blog Posts

How to Safeguard Apache Web Server on Ubuntu

Apache is the cornerstone of modern web servers and is a powerful software solution for a large percentage of today’s internet economy.

How to Install and Configure Seafile on Ubuntu 16.04

Seafile is a free, open source and cross-platform file storage system similar to Dropbox. It is used for sharing and syncing files between users and groups. It can be easily integrated with LDAP and WebDAV. It supports file versioning, snapshots and two-factor authentication. You can deploy it with MySQL, MariaDB, PostgreSQL, Apache and Nginx web server. Files are stored on Seafile server and can be synchronized with personal computers and mobile devices through apps. You can also access and manage Seafile through a web browser.

Related Market Product

Apache-Solr powered by Websoft9(Ubuntu16.04)

Websoft9 Apache Solr is a pre-configured, ready to run image for running Apache Solr on Alibaba Cloud.Solr is a standalone enterprise search server with a REST-like API.

Related Documentation

Deploy a Java Web project

This article describes how to deploy a Java Web project on a Linux instance with the basic configuration. This method is applicable to individual users who are new to website construction by using ECS, including installation of Tomcat with Apache.

Harden Hadoop environment security

Hadoop is an open-source, highly reliable and extensible distributed computing framework developed by the Apache Software Foundation. The core design of the Hadoop framework is HDFS and MapReduce.

Related Products

Anti-DDoS Pro

Alibaba Cloud Anti-DDoS Pro is a paid service that features a set of high-defensive IPs, and acts as a protective barrier for the origin. It safeguards network servers under high volume DDoS attacks. After configuring the high defensive IPs for the network servers, all traffic passes through the Anti-DDoS Pro instance before rerouting to the origin.

Elastic Compute Service

Alibaba Cloud Elastic Compute Service (ECS) provides fast memory and the latest Intel CPUs to help you to power your cloud applications and achieve faster results with low latency. All ECS instances come with Anti-DDoS protection to safeguard your data and applications from DDoS and Trojan attacks.

Hadoop Cluster Installation on Alibaba Cloud ECS

This course is designed to help users who want to understand Big Data technology, as well as data processing through cloud products. Through learning, users can fully understand the construction and basic use of Hadoop cluster, SSH protocol foundation, Hadoop directory structure, ECS security group configuration. It provides a reference for enterprises to build Hadoop cluster by themselves.

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store