Tuesday, May 15, 2012

HBase pseudo-cluster installation

I have been preparing a vm with Hbase installed in pseudo-cluster mode for experimental purposes. There are quite a few useful blogs on installing Hbase. I settled on the following minimum installation procedure.

I am blogging it for future reference. Hopefully it will help others too.

Before proceeding to install Hbase in pseudo cluster mode, you can check out the procedures for installing Hadoop in pseudo-cluster mode.

A few tweaks are required in OS configuration. Add the following to /etc/security/limits.conf:
  • hdfs  -       nofile  32768
  • hbase  -       nofile  32768

A few changes are required to hadoop configuration that I have mentioned earlier. Add following to hdfs-site.xml

   <property>
      <name>dfs.datanode.max.xcievers</name>
      <value>4096</value>
   </property>


Add the following to mapred-site.xml

   <property>
      <name>mapred.child.ulimit</name>
      <value>1835008</value>
   </property>
   <property>
      <name>mapred.tasktracker.map.tasks.maximum</name>
      <value>2</value>
   </property>
   <property>
      <name>mapred.tasktracker.reduce.tasks.maximum</name>
      <value>2</value>
   </property>


Recreate the hadoop file system if required and format the name node to start afresh
sudo -u hdfs hadoop namenode -format

Now it is time to install zookeeper.
yum install hadoop-zookeeper-server

and make changes to /etc/zookeeper/zoo.cfg
  • Change localhost to 127.0.0.1
  • Add: maxClientCnxns=0
Now it is time to start hadoop services
  • service hadoop-0.20-namenode start
  • service hadoop-0.20-secondarynamenode start
  • service hadoop-0.20-datanode start
  • service hadoop-0.20-jobtracker start
  • service hadoop-0.20-tasktracker start
Make sure that hadoop services are running by doing a simple command line check
  • sudo -u hdfs hadoop fs -lsr /

Start / Restart the zookeeper service with the following command:
  • service hadoop-zookeeper-server restart

Having started hadoop and zookeeper, we should be able to create a path for hbase in HDFS
sudo -u hdfs hadoop fs -mkdir /hbase
sudo -u hdfs hadoop fs -chown hbase /hbase

It is now time to install Hbase
yum install hadoop-hbase
yum install hadoop-hbase-master

Edit Hbase configuration file "/etc/hbase/conf/hbase-site.xml" and add the following:

   <property>
      <name>hbase.cluster.distributed</name>
      <value>true</value>
   </property>
   <property>
      <name>hbase.rootdir</name>
      <value>hdfs://localhost/hbase</value>
   </property>

Some times, I have faced issues with having to add JAVA_HOME, you can add that too "/etc/hbase/conf/hbase-env.sh" directly.
  • export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk.x86_64
  • export HBASE_MANAGES_ZK=false
Now it is time to start Hbase services :
service hadoop-hbase-master restart

Finally you must install region server and start it
yum install hadoop-hbase-regionserver
service hadoop-hbase-regionserver start

Now you should have Hbase running in pseudo-cluster mode for your little Hadoop experiments.

No comments:

Post a Comment