Hadoop Installation

By | November 5, 2014

Hadoop version used: 2.2 downloaded from apache website.

Platform: RHEL 6.4 64bit

Hostname : host1.test.local

Prerequisites:

  1. Installing java
  2. Adding dedicated Hadoop system user.
  3. Configuring SSH access.
  1. Installing java

[root@host1 ~]# rpm -qa|grep java

tzdata-java-2012j-1.el6.noarch

java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64

 

We have openjdk , but not openjdk-devel.  We will require open-jdk also because it provides a tool jps which we’ll use later. Rhel6.4 64bit distribution also bundles with open-jdk. Install it.

After installing :

[root@host1 ~]# rpm -qa|grep java

tzdata-java-2012j-1.el6.noarch

java-1.7.0-openjdk-devel-1.7.0.9-2.3.4.1.el6_3.x86_64

java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64


Now, issue

[root@host1 ~]# java -version

java version “1.7.0_09-icedtea”

OpenJDK Runtime Environment (rhel-2.3.4.1.el6_3-x86_64)

OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)

If that returns an error, troubleshoot.

Find the location where  openjdk is installed:

[root@host1 bin]#  find / -name java

On my machine openjdk is installed at /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9.x86_64 location. Take a note of this location. We will use this location as our JAVA_HOME environment variable.

  1. 2.       Adding hadoop user

$groupadd  hadoop

$useradd  –g  hadoop hadoop

$passwd  hadoop

 

  1. 3.       Configuring ssh access

The need for SSH Key based authentication is required so that the master node can then login to slave nodes (and the secondary node) to start/stop them and also local machine if you want to use Hadoop with it. For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the previous section.

[root@host1 ~]#su – hadoop

[hadoop@host1 ~]$  ssh-keygen

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

4f:31:a6:83:95:09:6b:c4:46:68:3a:93:39:da:a7:62 hadoop@host1.test.local

The key’s randomart image is:

+–[ RSA 2048]—-+

|     ++          |

|    o.oo o       |

|   = .o + +      |

|  B  . o o o     |

| o +  . S .      |

|. . .    +       |

|   o      .      |

|.E.              |

|..               |

+—————–+

[hadoop@host1 ~]$ cat $HOME/.ssh/id_rsa.pub > $HOME/.ssh/authorized_keys

Now, test ssh connectivity by issuing:

[hadoop@host1 ~]$ ssh host1

If it fails, then check the file permission of $HOME/.ssh/authorized_keys.  It must be 600. Make it correct  if it is not the same.

 

Installing  and Configuring Hadoop

Software resides in /stage location. Go to /stage directory and un-compress the hadoop software file:


[hadoop@host1 ~]$ tar –xvzf hadoop-2.2.0.tar.gz

[hadoop@host1 ~]$ mv hadoop-2.2.0 /usr/local

[hadoop@host1 ~]$ cd /usr/local

[hadoop@host1 ~]$ chown –R hadoop:hadoop hadoop

Now, login to hadoop user and set following env variables in  .bash_profile.

Edit .bash_profile to look like following:

JAVA_HOME= /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9.x86_64

export JAVA_HOME

export HADOOP_INSTALL=/usr/local/hadoop

PATH=$PATH:$HOME/bin:$HADOOP_INSTALL/sbin:$JAVA_HOME/bin

export PATH

export HADOOP_MAPRED_HOME=$HADOOP_INSTALL

export HADOOP_COMMON_HOME=$HADOOP_INSTALL

export HADOOP_HDFS_HOME=$HADOOP_INSTALL

export YARN_HOME=$HADOOP_INSTALl

You also need to set JAVA_HOME in $HADOOP_INSTALL/etc/hadoop/hadoop_env.sh file.

Open this file and add following:

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9.x86_64

Re-login with hadoop user. Now, try issuing

[hadoop@host1 ~]$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.2.0.jar

If that returns an error, troubleshoot.

The following are the required files we will use for the perfect configuration of the single node Hadoop cluster.

Following are the configuration files that require editing. These files are located in /usr/local/hadoop/etc/hadoop

Go to /usr/local/hadoop/etc/hadoop .

$cd /usr/local/hadoop/etc/hadoop
$ls


Now, open yarn-site.xml. The file is empty. Now, enter:

 

<configuration>

<!– Site specific YARN configuration properties –>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

</configuration>

 

Next, open core-site.xml. Enter:

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://host1.test.local:9000</value>

</property>

</configuration>

 

Enter your correct hostname in core-site.xml.

 

Then, open mapred-site.xml. If this file does not exist, copy mapred-site.xml.template as mapred-site.xml.

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

Now, create two directories to be used by name node and data node.

[hadoop@host1 hadoop]$ cd ~

mkdir –p mydata/hdfs/datanode

mkdir –p mydata/hdfs/namenode

At last, edit the hdfs-site.xml file.

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/home/hadoop/mydata/hdfs/namenode</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/home/hadoop/mydata/hdfs/datanode</value>

</property>

</configuration>

 

Replace the bolded name of the directory with the directory that you created above.

Formatting the HDFS filesystem via the NameNode

The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your cluster. You need to do this the first time you set up a Hadoop cluster. Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS).

Issue:

[hadoop@host1 hadoop]$ Hadoop namenode –format

**********/output truncated/*************

14/11/03 12:13:29 INFO util.ExitUtil: Exiting with status 0

14/11/03 12:13:29 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at host1.test.local/192.168.64.14

Starting Hadoop services

Start Hadoop Daemons by running the following commands:

[hadoop@host1 ~]$ start-dfs.sh

14/11/04 14:03:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Starting namenodes on [host1.test.local]

host1.test.local: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-host1.test.local.out

localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-host1.test.local.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-host1.test.local.out

14/11/04 14:04:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Now, issue jps which will show you what services are running.

[hadoop@host1 ~]$ jps

5179 Jps

5072 SecondaryNameNode

4805 NameNode

4982 DataNode

Next, issue
[hadoop@host1 ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-host1.test.local.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-host1.test.local.out

Again issue
[hadoop@host1 ~]$ jps
5874 NodeManager
6165 Jps
5772 ResourceManager
5072 SecondaryNameNode
4805 NameNode
4982 DataNode
U will find that some more services are brought up.

Or, alternatively

You can start each of these processes displayed in the output of jps separately as follows:

[hadoop@host1 ~]$  hadoop-daemon.sh start namenode

[hadoop@host1 ~]$   hadoop-daemon.sh start datanode

[hadoop@host1 ~]$  yarn-daemon.sh start resourcemanager

[hadoop@host1 ~]$  yarn-daemon.sh start nodemanager

 

Then, issue jps to view services running.

 

Stop Hadoop by running the following command

stop-dfs.sh

stop-yarn.sh

Hadoop comes with several web interfaces that we can use as:

 

 

 

 

Share itShare on FacebookEmail this to someoneTweet about this on TwitterShare on Google+Share on LinkedInPrint this page

Leave a Reply

Your email address will not be published. Required fields are marked *

Current month ye@r day *