1 Answer

0 votes
by
edited by
Hadoop Installation

Environment required for Hadoop: The production environment of Hadoop is UNIX, but it can also be used in Windows using Cygwin. Java 1.6 or above is needed to run Map Reduce Programs. For Hadoop installation from tar ball on the UNIX environment you need

Java Installation

SSH installation

Hadoop Installation and File Configuration

1) Java Installation

Step 1. Type "java -version" in prompt to find if the java is installed or not. If not then download java from http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html . The tar filejdk-7u71-linux-x64.tar.gz will be downloaded to your system.

Step 2. Extract the file using the below command

#tar zxf jdk-7u71-linux-x64.tar.gz  

Step 3. To make java available for all the users of UNIX move the file to /usr/local and set the path. In the prompt switch to root user and then type the command below to move the jdk to /usr/lib.

# mv jdk1.7.0_71 /usr/lib/  

Now in ~/.bashrc file add the following commands to set up the path.

# export JAVA_HOME=/usr/lib/jdk1.7.0_71  

# export PATH=PATH:$JAVA_HOME/bin  

Now, you can check the installation by typing "java -version" in the prompt.

2) SSH Installation

SSH is used to interact with the master and slaves computer without any prompt for password. First of all create a Hadoop user on the master and slave systems

# useradd hadoop  

# passwd Hadoop  

To map the nodes open the hosts file present in /etc/ folder on all the machines and put the ip address along with their host name.

# vi /etc/hosts  

Enter the lines below

190.12.1.114    hadoop-master  

190.12.1.121    hadoop-salve-one  

190.12.1.143   hadoop-slave-two  

Set up SSH key in every node so that they can communicate among themselves without password. Commands for the same are:

# su hadoop   

$ ssh-keygen -t rsa   

$ ssh-copy-id -i ~/.ssh/id_rsa.pub tutorialspoint@hadoop-master   

$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp1@hadoop-slave-1   

$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp2@hadoop-slave-2   

$ chmod 0600 ~/.ssh/authorized_keys   

$ exit  

3) Hadoop Installation

Hadoop can be downloaded from yahoo

Now extract the Hadoop and copy it to a location.

$ mkdir /usr/hadoop  

$ sudo tar vxzf  hadoop-2.2.0.tar.gz ?c /usr/hadoop  

Change the ownership of Hadoop folder

$sudo chown -R hadoop  usr/hadoop  

Change the Hadoop configuration files:

All the files are present in /usr/local/Hadoop/etc/hadoop

1) In hadoop-env.sh file add

export JAVA_HOME=/usr/lib/jvm/jdk/jdk1.7.0_71  

2) In core-site.xml add following between configuration tabs,

<configuration>  

<property>  

<name>fs.default.name</name>  

<value>hdfs://hadoop-master:9000</value>  

</property>  

<property>  

<name>dfs.permissions</name>  

<value>false</value>  

</property>  

</configuration>  

3) In hdfs-site.xmladd following between configuration tabs,

<configuration>  

<property>  

<name>dfs.data.dir</name>  

<value>usr/hadoop/dfs/name/data</value>  

<final>true</final>  

</property>  

<property>  

<name>dfs.name.dir</name>  

<value>usr/hadoop/dfs/name</value>  

<final>true</final>  

</property>  

<property>  

<name>dfs.replication</name>  

<value>1</value>  

</property>  

</configuration>  

4) Open the Mapred-site.xml and make the change as shown below

<configuration>  

<property>  

<name>mapred.job.tracker</name>  

<value>hadoop-master:9001</value>  

</property>  

</configuration>  

5) Finally, update your $HOME/.bahsrc

cd $HOME  

vi .bashrc  

Append following lines in the end and save and exit  

#Hadoop variables   

export JAVA_HOME=/usr/lib/jvm/jdk/jdk1.7.0_71  

export HADOOP_INSTALL=/usr/hadoop  

export PATH=$PATH:$HADOOP_INSTALL/bin   

export PATH=$PATH:$HADOOP_INSTALL/sbin  

export HADOOP_MAPRED_HOME=$HADOOP_INSTALL   

export HADOOP_COMMON_HOME=$HADOOP_INSTALL  

export HADOOP_HDFS_HOME=$HADOOP_INSTALL   

export YARN_HOME=$HADOOP_INSTALL  

On the slave machine install Hadoop using the command below

# su hadoop   

$ cd /opt/hadoop   

$ scp -r hadoop hadoop-slave-one:/usr/hadoop   

$ scp -r hadoop hadoop-slave-two:/usr/Hadoop  

Configure master node and slave node

$ vi etc/hadoop/masters  

hadoop-master  

  

$ vi etc/hadoop/slaves  

hadoop-slave-one   

hadoop-slave-two  

After this format the name node and start all the deamons

# su hadoop   

$ cd /usr/hadoop   

$ bin/hadoop namenode -format  

  

$ cd $HADOOP_HOME/sbin  

$ start-all.sh  

The easiest step is the usage of cloudera as it comes with all the stuffs pre-installed which can be downloaded from http://content.udacity-data.com/courses/ud617/Cloudera-Udacity-Training-VM-4.1.1.c.zip
...