Ever wondered if the Logstash can pump comma separated data into ElasticSearch? Yes, the whiskers man can do more!
This configuration was tested against version 2.4.0 and ElasticSearch 2.4.1 on Windows 7.
Our input data are tab separated values of academic articles formatted as :
CODE TITLE YEAR
An excerpt from our input :
W09-2307 Discriminative Reordering with Chinese Grammatical Relations Features 2009 W04-2607 Non-Classical Lexical Semantic Relations 2004 W01-1314 A System For Extraction Of... And Semantic Constraints 2001 W04-1910 Bootstrapping Parallel Treebanks 2004 W09-3306 Evaluating a Statistical CCG Parser on Wikipedia 2009
I created a file named tab-articles.conf to process current input :
input { file { path => ["D:/csv/*.txt"] type => "core2" start_position => "beginning" } } filter { csv { columns => ["code","title","year"] separator => " " } } output { elasticsearch { action => "index" <strong>hosts => ["localhost"] </strong> index => "papers-%{+YYYY.MM.dd}" workers => 1 } }
Note that the filter.separator field does not contain “\t” (=TAB character), but the raw value of a TAB as written in the input file.
Note also that the output server attribute is [hosts] and not [host].
Now, check if your ElasticSearch server is running, and run the following command:
logstash.bat -f tab-articles.conf
If you already have a file called articles.txt under d:\csv, it won’t be injected into ES, bacause logstash is mainly intended for logs parsing, and thus acts by default as a “tail -f” reader.
So, after starting Logstash, copy your file into the configured input directory.