How to Install Hadoop Cluster on Ubuntu 16.04
In this tutorial, we will learn how to setup an Apache Hadoop on a single node cluster in an Alibaba Cloud Elastic Compute Service (ECS) instance with Ubuntu 16.04.
Install Hadoop
Before starting, you will need to download the latest version of the Hadoop from their official website. You can download it with the following command:
wget http://www-eu.apache.org/dist/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz
Once the download is completed, extract the downloaded file with the following command:
tar -xvzf hadoop-3.1.0.tar.gz
Next, move the extracted directory to the /opt with the following command:
mv hadoop-3.1.0 /opt/hadoop
Next, change the ownership of the hadoop directory using the following command:
chown -R hadoop:hadoop /opt/hadoop/
Next, you will need to set an environment variable for Hadoop. You can do this by editing .bashrc file:
First, log in to hadoop user:
su - hadoop
Next, open .bashrc file:
nano .bashrc
Add the following lines at the end of the file:
export HADOOP_HOME=/opt/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Save and close the file, when you are finished. Then, initialize the environment variables using the following command:
source .bashrc
Next, you will also need to setup Java environment variable for Hadoop. You can do this by editing hadoop-env.sh
file:
First, find the default Java path using the following command:
readlink -f /usr/bin/java | sed "s:bin/java::"
Output:
/usr/lib/jvm/java-8-openjdk-amd64/jre/
Now, open hadoop-env.sh
file and paste above output in the hadoop-env.sh
file:
nano /opt/hadoop/etc/hadoop/hadoop-env.sh
Make the following changes:
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
Save and close the file, when you are finished.
Configure Hadoop
Next, you will need to configure multiple configuration files to setup Hadoop infrastructure. First, log in with hadoop user and create a directory for hadoop file system storage:
mkdir -p /opt/hadoop/hadoopdata/hdfs/namenode
mkdir -p /opt/hadoop/hadoopdata/hdfs/datanode
First, you will need to edit core-site.xml file. This file contains the Hadoop port number information, file system allocated memory, data store memory limit and the size of Read/Write buffers.
nano /opt/hadoop/etc/hadoop/core-site.xml
Make the following changes:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Save the file, then open the hdfs-site.xml
file. This file contains the replication data value, namenode path and datanode path for local file systems.
nano /opt/hadoop/etc/hadoop/hdfs-site.xml
Make the following changes:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property><property>
<name>dfs.name.dir</name>
<value>file:///opt/hadoop/hadoopdata/hdfs/namenode</value>
</property><property>
<name>dfs.data.dir</name>
<value>file:///opt/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
Save the file, then open the mapred-site.xml file.
nano /opt/hadoop/etc/hadoop/mapred-site.xml
Make the following changes:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Save the file, then open the yarn-site.xml
file:
nano /opt/hadoop/etc/hadoop/yarn-site.xml
Make the following changes:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Save and close the file, when you are finished. If you would like to have more information, like how to install java to get started, access Hadoop services and test hadoop, please go to this tutorial.
Related Blog Posts
How to Safeguard Apache Web Server on Ubuntu
Apache is the cornerstone of modern web servers and is a powerful software solution for a large percentage of today’s internet economy.
In this guide, we will show you how to safeguard your Apache web server hosted on Alibaba Cloud Elastic Compute Service (ECS) against DDoS and brute-force attacks.
How to Install and Configure Seafile on Ubuntu 16.04
Seafile is a free, open source and cross-platform file storage system similar to Dropbox. It is used for sharing and syncing files between users and groups. It can be easily integrated with LDAP and WebDAV. It supports file versioning, snapshots and two-factor authentication. You can deploy it with MySQL, MariaDB, PostgreSQL, Apache and Nginx web server. Files are stored on Seafile server and can be synchronized with personal computers and mobile devices through apps. You can also access and manage Seafile through a web browser.
Related Market Product
Apache-Solr powered by Websoft9(Ubuntu16.04)
Websoft9 Apache Solr is a pre-configured, ready to run image for running Apache Solr on Alibaba Cloud.Solr is a standalone enterprise search server with a REST-like API.
Related Documentation
Deploy a Java Web project
This article describes how to deploy a Java Web project on a Linux instance with the basic configuration. This method is applicable to individual users who are new to website construction by using ECS, including installation of Tomcat with Apache.
Harden Hadoop environment security
Hadoop is an open-source, highly reliable and extensible distributed computing framework developed by the Apache Software Foundation. The core design of the Hadoop framework is HDFS and MapReduce.
Related Products
Anti-DDoS Pro
Alibaba Cloud Anti-DDoS Pro is a paid service that features a set of high-defensive IPs, and acts as a protective barrier for the origin. It safeguards network servers under high volume DDoS attacks. After configuring the high defensive IPs for the network servers, all traffic passes through the Anti-DDoS Pro instance before rerouting to the origin.
Elastic Compute Service
Alibaba Cloud Elastic Compute Service (ECS) provides fast memory and the latest Intel CPUs to help you to power your cloud applications and achieve faster results with low latency. All ECS instances come with Anti-DDoS protection to safeguard your data and applications from DDoS and Trojan attacks.
Related Course
Hadoop Cluster Installation on Alibaba Cloud ECS
This course is designed to help users who want to understand Big Data technology, as well as data processing through cloud products. Through learning, users can fully understand the construction and basic use of Hadoop cluster, SSH protocol foundation, Hadoop directory structure, ECS security group configuration. It provides a reference for enterprises to build Hadoop cluster by themselves.