Setup HBase Indexer (Part 1)

Pre-requisites:

The scope of this post does not cover Hadoop/Hbase setup. I asume that you have a running Hbase environment with a Master (HMaster) and two region servers (rs1 and rs2).

I’ll be using the HDP2.5 release from HortonWorks setup on CentOS 7.2.

1 – Setup Solr

Actually, I don’t want Ambari to manage my Solr instance because, we have some specific configurations to add and we won’t alter default ambari-agent’s behaviour.

sudo rpm --import http://public-repo-1.hortonworks.com/HDP-SOLR-2.5-100/repos/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
sudo cd /etc/yum.repos.d/
sudo wget http://public-repo-1.hortonworks.com/HDP-SOLR-2.5-100/repos/centos7/hdp-solr.repo
sudo yum install lucidworks-hdpsearch

2- Start Solr Server in cloud mode:

sudo /opt/lucidworks-hdpsearch/solr/bin/solr start -c -z hmaster.dev.fr:2181,rs1.dev.fr:2181,rs2.dev.fr:2181

3- Edit “hbase-indexer-site.xml” in the Hbase-indexer configuration (default:  /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase-indexer-site.xml)


<?xml version='1.0'?>
<configuration>
<property>
<name>hbaseindexer.zookeeper.connectstring</name>
<value>hmaster.dev.fr:2181,rs1.dev.fr:2181,rs2.dev.fr:2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hmaster.dev.fr,rs1.dev.fr,rs2.dev.fr</value>
</property>
</configuration>

4- In Ambari GUI, go to  HBase > Configs > Custom Hbase-site.xml and add the following custom properties:


hbase.replication=true
replication.source.ratio=1.0
replication.source.nb.capacity=1000
replication.replicationsource.implementation=com.ngdata.sep.impl.SepReplicationSource

5- Copy HBase Indexer specific jars to hmaster, rs1 and rs2 libs (hadoop is my default linux user for HBase environment setup):

scp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep* hadoop@hmaster.dev.fr:/usr/hdp/current/hbase-master/lib/
scp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep* hadoop@rs1.dev.fr:/usr/hdp/current/hbase-master/lib/
scp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep* hadoop@rs2.dev.fr:/usr/hdp/current/hbase-master/lib/

Restart HBase Master, and the two region servers.

6- Grab you hbase-site.xml from hmaster to the hbase-indexer conf directory:

sudo scp hadoop@hmaster.dev.fr:/etc/hbase/conf/hbase-site.xml /opt/lucidworks-hdpsearch/hbase-indexer/conf/

7 – Create a new Solr collection for ou Hbase Data :

/opt/lucidworks-hdpsearch/solr/bin/solr create -c a_new_collection -d data_driven_schema_configs -n hbase_config -s 2 -rf 2

8 – Create a hbase-indexer mapping file :

vi /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase_config.xml

There’s a lot of details here about this file content.

<?xml version='1.0'?>
<indexer table='t01'>
<field value='f0:a01'  name='book_isbn'    type='string'/>
<field value='f0:a02'  name='publish_date_dt' />
<field value='f0:a03'  name='main_author_alz'  />
</indexer>

The unique HBase data type is bytes. In order to make Solr guess the correct field type, add suffixes. By default, in Solr shemaless configuration, the suffix “_dt” means that the field is a Date. Beware here, not all date formats are accepted by default.
If we want to add some search capabilities (custom analyzer, …), we can declare a new dynamic field with suffix “*_alz”. Then, in Solr, use “add-dynamic-field” to add a dynamic field rule and handle the hbase column as you like!

9 – Start HBase-Indexer

/opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer -zookeeper hmaster.dev.fr:2181,rs1.dev.fr:2181,rs1.dev.fr:2181

10 – Add the indexer to HBase-Indexer configuration


/opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer add-indexer -n hbaseindexer -c /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase_config.xml  -cp solr.zk=hmaster.dev.fr:2181,rs1.dev.fr:2181,rs2.dev.fr:2181 -cp solr.collection=a_new_collection --zookeeper hmaster.dev.fr:2181,rs1.dev.fr:2181,rs2.dev.fr:2181

 

11- Now , open HBase Shell (sudo hbase shell) and create a new table with custom replication scope:


t= create 't01' , {NAME =&gt; 'f0', REPLICATION_SCOPE=&gt;'1'}}

12- Let’s put some data:


put 'f01', 'aRowKey','a0:a01','1449396100'
put 'f01', 'aRowKey','a0:a02','2011-09-01'
put 'f01', 'aRowKey','a0:a03','Lars George'
put 'f01', 'aRowKey','a0:a04','Just Another Field'

 

13-  Refresh your Solr Indexes, and check that the  Book we just added to Hbase was indexed in Solr.


sudo curl http://solr.dev.fr:8983/solr/a_new_collection /update?commit=true

This post is quite long and full of small details. I’ll be adding some explanations on a separate post.

Advertisements

About Salem Ben Afia

Big Data & Java developer Search Engine Architect, Lucene Expert

Posted on July 24, 2017, in BigData, Solr and tagged , , , , . Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: