Setup HBase Indexer (Part 1)


The scope of this post does not cover Hadoop/Hbase setup. I asume that you have a running Hbase environment with a Master (HMaster) and two region servers (rs1 and rs2).

I’ll be using the HDP2.5 release from HortonWorks setup on CentOS 7.2.

1 – Setup Solr

Actually, I don’t want Ambari to manage my Solr instance because, we have some specific configurations to add and we won’t alter default ambari-agent’s behaviour.

sudo rpm --import
sudo cd /etc/yum.repos.d/
sudo wget
sudo yum install lucidworks-hdpsearch

2- Start Solr Server in cloud mode:

sudo /opt/lucidworks-hdpsearch/solr/bin/solr start -c -z,,

3- Edit “hbase-indexer-site.xml” in the Hbase-indexer configuration (default:  /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase-indexer-site.xml)

<?xml version='1.0'?>

4- In Ambari GUI, go to  HBase > Configs > Custom Hbase-site.xml and add the following custom properties:


5- Copy HBase Indexer specific jars to hmaster, rs1 and rs2 libs (hadoop is my default linux user for HBase environment setup):

scp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep*
scp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep*
scp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep*

Restart HBase Master, and the two region servers.

6- Grab you hbase-site.xml from hmaster to the hbase-indexer conf directory:

sudo scp /opt/lucidworks-hdpsearch/hbase-indexer/conf/

7 – Create a new Solr collection for ou Hbase Data :

/opt/lucidworks-hdpsearch/solr/bin/solr create -c a_new_collection -d data_driven_schema_configs -n hbase_config -s 2 -rf 2

8 – Create a hbase-indexer mapping file :

vi /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase_config.xml

There’s a lot of details here about this file content.

<?xml version='1.0'?>
<indexer table='t01'>
<field value='f0:a01'  name='book_isbn'    type='string'/>
<field value='f0:a02'  name='publish_date_dt' />
<field value='f0:a03'  name='main_author_alz'  />

The unique HBase data type is bytes. In order to make Solr guess the correct field type, add suffixes. By default, in Solr shemaless configuration, the suffix “_dt” means that the field is a Date. Beware here, not all date formats are accepted by default.
If we want to add some search capabilities (custom analyzer, …), we can declare a new dynamic field with suffix “*_alz”. Then, in Solr, use “add-dynamic-field” to add a dynamic field rule and handle the hbase column as you like!

9 – Start HBase-Indexer

/opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer -zookeeper,,

10 – Add the indexer to HBase-Indexer configuration

/opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer add-indexer -n hbaseindexer -c /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase_config.xml  -cp,, -cp solr.collection=a_new_collection --zookeeper,,


11- Now , open HBase Shell (sudo hbase shell) and create a new table with custom replication scope:

t= create 't01' , {NAME =&gt; 'f0', REPLICATION_SCOPE=&gt;'1'}}

12- Let’s put some data:

put 'f01', 'aRowKey','a0:a01','1449396100'
put 'f01', 'aRowKey','a0:a02','2011-09-01'
put 'f01', 'aRowKey','a0:a03','Lars George'
put 'f01', 'aRowKey','a0:a04','Just Another Field'


13-  Refresh your Solr Indexes, and check that the  Book we just added to Hbase was indexed in Solr.

sudo curl /update?commit=true

This post is quite long and full of small details. I’ll be adding some explanations on a separate post.


About Salem Ben Afia

Big Data & Java developer

Posted on July 24, 2017, in BigData, Solr and tagged , , , , . Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: