Monday, May 14, 2012

Hadoop pseudo-cluster installation

Install Java and cloudera yum repo
yum install java-1.6.0-openjdk.x86_64
curl -O
mv cloudera-cdh3.repo /etc/yum.repos.d/

Ensure that you have hostname and localhost entries in /etc/hosts
comment out ipv6 entry

Create hadoop user and group manually
Create "hdfs" and "mapred" user with group "hadoop"
groupadd hadoop
useradd -G hadoop hdfs
useradd -G hadoop mapred
passwd hdfs 
passwd mapred 

Install hadoop packages
yum install hadoop-0.20
yum install hadoop-0.20-conf-pseudo

Create directories for hdfs files and mapred temporary files as root
mkdir -p /data/hadoop
chown -R hdfs:hadoop /data/hadoop

as hdfs
chmod -R 755 /data/hadoop
mkdir -p /data/hadoop/cache
chmod 777 /data/hadoop/cache
chmod +t /data/hadoop/cache

mkdir -p /data/hadoop/tmp
chown hdfs:hadoop /data/hadoop/tmp
chmod 777 /data/hadoop/tmp

mkdir -p /data/hadoop/nn
chown hdfs:hadoop /data/hadoop/nn

mkdir -p /data/hadoop/dn
chown hdfs:hadoop /data/hadoop/dn

mkdir -p /data/hadoop/snn
chown hdfs:hadoop /data/hadoop/snn

As mapred:
mkdir /data/hadoop/cache/mapred-tmp
chown mapred:hadoop /data/hadoop/cache/mapred-tmp

mkdir /data/hadoop/cache/mapred-local
chown mapred:hadoop /data/hadoop/cache/mapred-local

mkdir -p /data/hadoop/mapred-system
chmod 777 /data/hadoop/mapred-system
chown -R mapred:hadoop /data/hadoop/mapred-system

Move and store default configuration to another directory as root.
mkdir -p /etc/hadoop/conf.pseudo.copy
cp /etc/hadoop/conf.pseudo/* /etc/hadoop/conf.pseudo.copy/
cd /etc/hadoop/conf.pseudo/

Edit various configuration files with entries for the directories made above



You may have to set the JAVA_HOME sometimes. Mostly the scripts should be able to figure that out.
vi /etc/profile and vi /usr/bin/hadoop
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk.x86_64

You can also add it to : /usr/lib/hadoop-0.20/bin/ if "sudo service hadoop-0.20-namenode start" cribs about java_home

Format Hadoop file system with command:
sudo -u hdfs hadoop namenode -format

Now you can start services with the following commands:
/etc/init.d/hadoop-0.20-namenode start 
/etc/init.d/hadoop-0.20-secondarynamenode start
/etc/init.d/hadoop-0.20-datanode star
/etc/init.d/hadoop-0.20-jobtracker start
/etc/init.d/hadoop-0.20-tasktracker start

or you can configure services to startup during boot time:
sudo chkconfig hadoop-0.20-namenode on
sudo chkconfig hadoop-0.20-jobtracker on
sudo chkconfig hadoop-0.20-secondarynamenode on
sudo chkconfig hadoop-0.20-tasktracker on
sudo chkconfig hadoop-0.20-datanode on

Thats it and try testing your installation with a few simple hadoop commands.


  1. Hi,

    For apache hadoop-2.0.0-alpha installation on two linux machines, what should be values of fs.defaultFS and and properties on both name nodes????

    one machine hostname is rsi-nod-nsn1 and another one is rsi-nod-nsn2...

    i want to make both as federated namenodes.. and both should be used as datanodes too..

    i want to configure both federation anf YARN.

    what should be configuration changes for the same? i am not finding masters, mapred-site.xml, and files in hadoopHome/etc/hadoop folder... how do i make changes for these files?


  2. This is a great inspiring tutorials on hadoop.I am pretty much pleased with your good work.You put really very helpful information. Keep it up.
    Hadoop Training in hyderabad

  3. Actually, you have explained the technology to the fullest. Thanks for sharing the information you have got. It helped me a lot. I experimented your thoughts in my training program.

    Hadoop Training Chennai
    Hadoop Training in Chennai
    Big Data Training in Chennai

  4. Thank you for sharing such a usefull information on your blog, I am inspired with your post writing style & how continuously you describe this topic.

    Hadoop Online Training | Qlikview Online Training | Tableau Online Training | SAS Online Training | Android Online Training | Business Analyst Online Training

  5. Well Said. The content provided is true up to my knowledge. This made me to understand the concepts very clear. Thanks for sharing this wonderful information in here. Keep blogging article like this. I have bookmarked this page for future reference as well.

    Hadoop Training Chennai | Big Data Training in Chennai | JAVA Course in Chennai

  6. Great information. Thanks for providing us such a useful information. Keep up the good work and continue providing us more quality information from time to time. Big data Hadoop Training

  7. Thanks for providing this informative information…..
    You may also refer-

  8. Thanks for sharing Valuable information. Greatful Info about hadoop. Really helpful. Keep sharing........... If it possible share some more tutorials.........


  9. i really likes your blog and You have shared the whole concept really well. and Very beautifully
    soulful read! thanks for sharing.
    GCLUB มือถือ

  10. Thanks a lot very much for the high your blog post quality and results-oriented help. I won’t think twice to endorse to anybody who wants and needs support about this area.
    datascience training in chennai

  11. I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
    safety course in chennai

  12. Does your blog have a contact page? I’m having problems locating it but, I’d like to shoot you an email. I’ve got some recommendations for your blog you might be interested in hearing.
    industrial course in chennai

  13. I am so proud of you and your efforts and work make me realize that anything can be done with patience and sincerity. Well I am here to say that your work has inspired me without a doubt.
    python Training institute in Pune
    python Training institute in Chennai
    python Training institute in Bangalore

  14. Good article.
    For Python training in bangalore,visit:
    Python training in bangalore