Setup a Single-Node Hadoop Cluster Using Docker

Background Information

Main Tutorial

root@alibaba-docker:~# docker -v
Docker version 18.09.6, build 481bc77
root@alibaba-docker:~# docker container run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
1b930d010525: Pull complete
Digest: sha256:41a65640635299bab090f783209c1e3a3f11934cf7756b09cb2f1e02147c6ed8
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
$ docker run -it ubuntu bash
root@alibaba-docker:~# docker pull sequenceiq/hadoop-docker:2.7.0
2.7.0: Pulling from sequenceiq/hadoop-docker
b253335dcf03: Pulling fs layer
a3ed95caeb02: Pulling fs layer
69623ef05416: Pulling fs layer
63aebddf4bce: Pulling fs layer
46305a4cda1d: Pulling fs layer
70ff65ec2366: Pulling fs layer
72accdc282f3: Pulling fs layer
5298ddb3b339: Pulling fs layer
ec461d25c2ea: Pulling fs layer
315b476b23a4: Pulling fs layer
6e6acc31f8b1: Pulling fs layer
38a227158d97: Pulling fs layer
319a3b8afa25: Pulling fs layer
11e1e16af8f3: Pulling fs layer
834533551a37: Pulling fs layer
c24255b6d9f4: Pulling fs layer
8b4ea3c67dc2: Pulling fs layer
40ba2c2cdf73: Pulling fs layer
5424a04bc240: Pulling fs layer
7df43f09096d: Pulling fs layer
b34787ee2fde: Pulling fs layer
4eaa47927d15: Pulling fs layer
cb95b9da9646: Pulling fs layer
e495e287a108: Pulling fs layer
3158ca49a54c: Pulling fs layer
33b5a5de9544: Pulling fs layer
d6f46cf55f0f: Pulling fs layer
40c19fb76cfd: Pull complete
018a1f3d7249: Pull complete
40f52c973507: Pull complete
49dca4de47eb: Pull complete
d26082bd2aa9: Pull complete
c4f97d87af86: Pull complete
fb839f93fc0f: Pull complete
43661864505e: Pull complete
d8908a83648e: Pull complete
af8b686deb23: Pull complete
c1214abd7b96: Pull complete
9d00f27ba8d2: Pull complete
09f787a7573b: Pull complete
4e86267d5247: Pull complete
3876cba35aed: Pull complete
23df48ffdb39: Pull complete
646aedbc2bb6: Pull complete
60a65f8179cf: Pull complete
046b321f8081: Pull complete
Digest: sha256:a40761746eca036fee6aafdf9fdbd6878ac3dd9a7cd83c0f3f5d8a0e6350c76a
Status: Downloaded newer image for sequenceiq/hadoop-docker:2.7.0
root@alibaba-docker:~#:~$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest fce289e99eb9 5 months ago 1.84kB
sequenceiq/hadoop-docker 2.7.0 789fa0a3b911 4 years ago 1.76GB
root@alibaba-docker:~# docker run -it sequenceiq/hadoop-docker:2.7.0 /etc/bootstrap.sh -bash
/
Starting sshd: [ OK ]
Starting namenodes on [9f397feb3a46]
9f397feb3a46: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-9f397feb3a46.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-9f397feb3a46.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-9f397feb3a46.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-9f397feb3a46.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-9f397feb3a46.out
bash-4.1#
bash-4.1# jps
942 Jps
546 ResourceManager
216 DataNode
371 SecondaryNameNode
126 NameNode
639 NodeManager
bash-4.1#
root@alibaba-docker:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9f397feb3a46 sequenceiq/hadoop-docker:2.7.0 "/etc/bootstrap.sh -…" 5 minutes ago Up 5 minutes 2122/tcp, 8030-8033/tcp, 8040/tcp, 8042/tcp, 8088/tcp, 19888/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp, 50090/tcp determined_ritchie
bash-4.1# ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:02
inet addr:172.17.0.2 Bcast:172.17.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:93 errors:0 dropped:0 overruns:0 frame:0
TX packets:21 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:9760 (9.5 KiB) TX bytes:1528 (1.4 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:3160 errors:0 dropped:0 overruns:0 frame:0
TX packets:3160 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:455659 (444.9 KiB) TX bytes:455659 (444.9 KiB)
bash-4.1# cd $HADOOP_PREFIX
bash-4.1# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input output 'dfs[a-z.]+'
19/06/25 11:28:55 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/06/25 11:28:58 INFO input.FileInputFormat: Total input paths to process : 31
19/06/25 11:28:59 INFO mapreduce.JobSubmitter: number of splits:31
19/06/25 11:28:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1561475564487_0001
19/06/25 11:29:00 INFO impl.YarnClientImpl: Submitted application application_1561475564487_0001
19/06/25 11:29:01 INFO mapreduce.Job: The url to track the job: http://9f397feb3a46:8088/proxy/application_1561475564487_0001/
19/06/25 11:29:01 INFO mapreduce.Job: Running job: job_1561475564487_0001
19/06/25 11:29:22 INFO mapreduce.Job: Job job_1561475564487_0001 running in uber mode : false
19/06/25 11:29:22 INFO mapreduce.Job: map 0% reduce 0%
19/06/25 11:30:22 INFO mapreduce.Job: map 13% reduce 0%
19/06/25 11:30:23 INFO mapreduce.Job: map 19% reduce 0%
19/06/25 11:31:19 INFO mapreduce.Job: map 23% reduce 0%
19/06/25 11:31:20 INFO mapreduce.Job: map 26% reduce 0%
19/06/25 11:31:21 INFO mapreduce.Job: map 39% reduce 0%
19/06/25 11:32:11 INFO mapreduce.Job: map 39% reduce 13%
19/06/25 11:32:13 INFO mapreduce.Job: map 42% reduce 13%
19/06/25 11:32:14 INFO mapreduce.Job: map 55% reduce 15%
19/06/25 11:32:18 INFO mapreduce.Job: map 55% reduce 18%
19/06/25 11:32:59 INFO mapreduce.Job: map 58% reduce 18%
19/06/25 11:33:00 INFO mapreduce.Job: map 61% reduce 18%
19/06/25 11:33:02 INFO mapreduce.Job: map 71% reduce 19%
19/06/25 11:33:05 INFO mapreduce.Job: map 71% reduce 24%
19/06/25 11:33:45 INFO mapreduce.Job: map 74% reduce 24%
19/06/25 11:33:46 INFO mapreduce.Job: map 81% reduce 24%
19/06/25 11:33:47 INFO mapreduce.Job: map 84% reduce 26%
19/06/25 11:33:48 INFO mapreduce.Job: map 87% reduce 26%
19/06/25 11:33:50 INFO mapreduce.Job: map 87% reduce 29%
19/06/25 11:34:28 INFO mapreduce.Job: map 90% reduce 29%
19/06/25 11:34:29 INFO mapreduce.Job: map 97% reduce 29%
19/06/25 11:34:30 INFO mapreduce.Job: map 100% reduce 32%
19/06/25 11:34:32 INFO mapreduce.Job: map 100% reduce 100%
19/06/25 11:34:32 INFO mapreduce.Job: Job job_1561475564487_0001 completed successfully
19/06/25 11:34:32 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=345
FILE: Number of bytes written=3697508
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=80529
HDFS: Number of bytes written=437
HDFS: Number of read operations=96
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=1
Launched map tasks=32
Launched reduce tasks=1
Data-local map tasks=32
Total time spent by all maps in occupied slots (ms)=1580786
Total time spent by all reduces in occupied slots (ms)=191081
Total time spent by all map tasks (ms)=1580786
Total time spent by all reduce tasks (ms)=191081
Total vcore-seconds taken by all map tasks=1580786
Total vcore-seconds taken by all reduce tasks=191081
Total megabyte-seconds taken by all map tasks=1618724864
Total megabyte-seconds taken by all reduce tasks=195666944
Map-Reduce Framework
Map input records=2060
Map output records=24
Map output bytes=590
Map output materialized bytes=525
Input split bytes=3812
Combine input records=24
Combine output records=13
Reduce input groups=11
Reduce shuffle bytes=525
Reduce input records=13
Reduce output records=11
Spilled Records=26
Shuffled Maps =31
Failed Shuffles=0
Merged Map outputs=31
GC time elapsed (ms)=32401
CPU time spent (ms)=19550
Physical memory (bytes) snapshot=7076614144
Virtual memory (bytes) snapshot=22172876800
Total committed heap usage (bytes)=5196480512
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=76717
File Output Format Counters
Bytes Written=437
19/06/25 11:34:32 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/06/25 11:34:33 INFO input.FileInputFormat: Total input paths to process : 1
19/06/25 11:34:33 INFO mapreduce.JobSubmitter: number of splits:1
19/06/25 11:34:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1561475564487_0002
19/06/25 11:34:33 INFO impl.YarnClientImpl: Submitted application application_1561475564487_0002
19/06/25 11:34:33 INFO mapreduce.Job: The url to track the job: http://9f397feb3a46:8088/proxy/application_1561475564487_0002/
19/06/25 11:34:33 INFO mapreduce.Job: Running job: job_1561475564487_0002
19/06/25 11:34:50 INFO mapreduce.Job: Job job_1561475564487_0002 running in uber mode : false
19/06/25 11:34:50 INFO mapreduce.Job: map 0% reduce 0%
19/06/25 11:35:04 INFO mapreduce.Job: map 100% reduce 0%
19/06/25 11:35:18 INFO mapreduce.Job: map 100% reduce 100%
19/06/25 11:35:19 INFO mapreduce.Job: Job job_1561475564487_0002 completed successfully
19/06/25 11:35:19 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=291
FILE: Number of bytes written=230543
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=570
HDFS: Number of bytes written=197
HDFS: Number of read operations=7
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=10489
Total time spent by all reduces in occupied slots (ms)=12436
Total time spent by all map tasks (ms)=10489
Total time spent by all reduce tasks (ms)=12436
Total vcore-seconds taken by all map tasks=10489
Total vcore-seconds taken by all reduce tasks=12436
Total megabyte-seconds taken by all map tasks=10740736
Total megabyte-seconds taken by all reduce tasks=12734464
Map-Reduce Framework
Map input records=11
Map output records=11
Map output bytes=263
Map output materialized bytes=291
Input split bytes=133
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=291
Reduce input records=11
Reduce output records=11
Spilled Records=22
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=297
CPU time spent (ms)=1610
Physical memory (bytes) snapshot=346603520
Virtual memory (bytes) snapshot=1391702016
Total committed heap usage (bytes)=245891072
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=437
File Output Format Counters
Bytes Written=197
bash-4.1# bin/hdfs dfs -cat output/*
6 dfs.audit.logger
4 dfs.class
3 dfs.server.namenode.
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.replication
1 dfs.file
bash-4.1#

Original Source

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store