Let us learn Oracle Database Administration skills: Install & Setup Hadoop in Standalone Mode

Monday, March 1, 2021

Install & Setup Hadoop in Standalone Mode

Environment:

Os: Centos 7.9

Kernel: 3.10.0-1160.6.1.el7.x86_64

Java version:

openjdk version "1.8.0_282"

OpenJDK Runtime Environment (build 1.8.0_282-b08)

OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)

1) Setup hadoop user in centos machine(follow blog link below)

https://oracledbaplanner.blogspot.com/2021/02/adding-linux-usergroup-and-modifying.html

2) Setup mountpoint for Apache hadoop install

[root@localhost ~]# mkdir /opt/hadoop

[root@localhost ~]# chown hadoop:bigdata /opt/hadoop

[root@localhost ~]# ls -altr /opt/hadoop
total 0
drwxr-xr-x. 4 root root 53 Mar 1 02:14 ..
drwxr-xr-x. 2 hadoop bigdata 6 Mar 1 02:14 .

[root@localhost ~]#

3) Download the hadoop binary from url https://downloads.apache.org/hadoop/common/
Specifically like below..

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

4) Once download is finished, gzip decompress and untar the binary

[hadoop@localhost hadoop]$ ls -altr
total 489016
-rw-r--r--. 1 hadoop bigdata 500749234 Jul 15 2020 hadoop-3.3.0.tar.gz
drwxr-xr-x. 4 root root 53 Mar 1 02:14 ..
drwxr-xr-x. 2 hadoop bigdata 33 Mar 1 02:32 .
[hadoop@localhost hadoop]$ gzip -d hadoop-3.3.0.tar.gz

[hadoop@localhost hadoop]$ pwd
/opt/hadoop

[hadoop@localhost hadoop]$ ls -altr
total 1034752
-rw-r--r--. 1 hadoop bigdata 1059584000 Jul 15 2020 hadoop-3.3.0.tar
drwxr-xr-x. 4 root root 53 Mar 1 02:14 ..
drwxr-xr-x. 2 hadoop bigdata 30 Mar 1 23:13 .
[hadoop@localhost hadoop]$ tar -tvf hadoop-3.3.0.tar|head
drwxr-xr-x brahma/brahma 0 2020-07-06 15:50 hadoop-3.3.0/
-rw-rw-r-- brahma/brahma 175 2020-03-24 13:23 hadoop-3.3.0/README.txt
..

I evaluated the nohup.out to see if there are any errors reported during in untar operation.So the file is nearly 1GB in size. After untar the total usage of the directory is 2GB in size.

[hadoop@localhost hadoop]$ ls -altr
total 1038160
drwxr-xr-x. 10 hadoop bigdata 215 Jul 6 2020 hadoop-3.3.0
-rw-r--r--. 1 hadoop bigdata 1059584000 Jul 15 2020 hadoop-3.3.0.tar
drwxr-xr-x. 4 root root 53 Mar 1 02:14 ..
drwxr-xr-x. 3 hadoop bigdata 67 Mar 1 23:13 .
-rw-------. 1 hadoop bigdata 3486147 Mar 1 23:14 nohup.out

[hadoop@localhost hadoop]$ du -sh .
2.0G .

[hadoop@localhost hadoop]$ du -sk .
2092064 .
[hadoop@localhost hadoop]$

5) Detect and set the java home. Run the below command on the centos machine, look for java.home and use that value to set JAVA_HOME in hadoop's .bash_profile file.

java -XshowSettings:properties -version

JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-1.el7_9.x86_64/jre
export JAVA_HOME

6) Now launch a new session of hadoop user and follow the below steps [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html]

a) Goto the untarred directory of the hadoop-3.3.0

[hadoop@localhost hadoop-3.3.0]$ ls -altr
total 84
-rw-r--r--. 1 hadoop bigdata 175 Mar 24 2020 README.txt
-rw-r--r--. 1 hadoop bigdata 1541 Mar 24 2020 NOTICE.txt
-rw-r--r--. 1 hadoop bigdata 27570 Mar 24 2020 NOTICE-binary
-rw-r--r--. 1 hadoop bigdata 15697 Mar 24 2020 LICENSE.txt
-rw-r--r--. 1 hadoop bigdata 22976 Jul 4 2020 LICENSE-binary
drwxr-xr-x. 3 hadoop bigdata 4096 Jul 6 2020 sbin
drwxr-xr-x. 3 hadoop bigdata 20 Jul 6 2020 etc
drwxr-xr-x. 2 hadoop bigdata 4096 Jul 6 2020 licenses-binary
drwxr-xr-x. 3 hadoop bigdata 20 Jul 6 2020 lib
drwxr-xr-x. 10 hadoop bigdata 215 Jul 6 2020 .
drwxr-xr-x. 2 hadoop bigdata 203 Jul 6 2020 bin
drwxr-xr-x. 2 hadoop bigdata 106 Jul 6 2020 include
drwxr-xr-x. 4 hadoop bigdata 288 Jul 6 2020 libexec
drwxr-xr-x. 4 hadoop bigdata 31 Jul 6 2020 share
drwxr-xr-x. 3 hadoop bigdata 67 Mar 1 23:13 ..
[hadoop@localhost hadoop-3.3.0]$

[hadoop@localhost hadoop-3.3.0]$ pwd
/opt/hadoop/hadoop-3.3.0
[hadoop@localhost hadoop-3.3.0]$

b) Create input directory and copy *.xml files from etc/hadoop directory

[hadoop@localhost hadoop-3.3.0]$ mkdir input

[hadoop@localhost hadoop-3.3.0]$ ls -ld input
drwxr-xr-x. 2 hadoop bigdata 6 Mar 2 01:13 input
[hadoop@localhost hadoop-3.3.0]$ cp etc/hadoop/*.xml input/

[hadoop@localhost hadoop-3.3.0]$

c) Launch the below command to let the example copy the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar grep input output 'dfs[a-z.]+'

Output:

2021-03-02 01:24:00,805 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2021-03-02 01:24:00,932 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2021-03-02 01:24:00,932 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2021-03-02 01:24:01,129 INFO input.FileInputFormat: Total input files to process : 10
2021-03-02 01:24:01,172 INFO mapreduce.JobSubmitter: number of splits:10
2021-03-02 01:24:01,460 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local440722701_0001
2021-03-02 01:24:01,460 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-03-02 01:24:01,680 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2021-03-02 01:24:01,680 INFO mapreduce.Job: Running job: job_local440722701_0001
2021-03-02 01:24:01,686 INFO mapred.LocalJobRunner: OutputCommitter set in config null
...
2021-03-02 01:24:04,925 INFO mapreduce.Job: map 100% reduce 100%
2021-03-02 01:24:04,926 INFO mapreduce.Job: Job job_local2096128567_0002 completed successfully
2021-03-02 01:24:04,931 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=1203532
FILE: Number of bytes written=3576646
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=1
Map output bytes=17
Map output materialized bytes=25
Input split bytes=127
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=25
Reduce input records=1
Reduce output records=1
Spilled Records=2
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=41
Total committed heap usage (bytes)=273997824
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=123
File Output Format Counters
Bytes Written=23
[hadoop@localhost hadoop-3.3.0]$

d) Verify the output

[hadoop@localhost hadoop-3.3.0]$ cat output/*
1 dfsadmin
[hadoop@localhost hadoop-3.3.0]$

[hadoop@localhost hadoop-3.3.0]$ ls -altr output/*
-rw-r--r--. 1 hadoop bigdata 11 Mar 2 01:24 output/part-r-00000
-rw-r--r--. 1 hadoop bigdata 0 Mar 2 01:24 output/_SUCCESS
[hadoop@localhost hadoop-3.3.0]$ cat output/part-r-00000
1 dfsadmin
[hadoop@localhost hadoop-3.3.0]$

# We are done with Standalone mode of operation. We will see Pseudo-Distributed Operation in seperate blog.

Let us learn Oracle Database Administration skills

Monday, March 1, 2021

Install & Setup Hadoop in Standalone Mode

Install & Setup Hadoop in Standalone Mode

No comments:

Post a Comment

Troubleshooting the “Cannot Generate SSPI Context” Error After SQL Server Migration