Sunday, August 26, 2018

ElasticSearch Tutorial

ElasticSearch is a distributed , scalable, search and analytics engine.

It is similar to Apache Solr with a difference that is built to be scalable from ground up.

Like Solr, ElasticSearch is built on top of Apache Lucene which is a full text search library.

What is difference between a database and a search engine ? Read this blog.

1.0 Key features


Based on very successful search library Apache Lucene.
Provides the ablity to store and search documents.
Supports full text search.
Schema free.
Ability to analyze data - count , summarize ,aggregate etc.
Horizontally scalable and distributed architecture.
REST API support.
Easy to install and operate.
API support for several languages.

2.0 Concepts

An elasticsearch server process called a node is a single instance of a java process.

A key differentiator for elasticsearch is that it was built to be horizontally scalable from ground up.

In production environment, you generally run multiple nodes. A cluster is a collection of nodes that store your data.

A document is a unit of data that can be stored in elasticsearch. JSON is the format.

An Index is a collection of documents of a particular type. For example you might have one index for customer documents and another for product information. Index is the data structure that helps the search engine find the document fast. The document being stored is analyzed and broken into tokens based on rules. Each token is indexed - meaning - given the token -there is pointer back to the document - just like the index at the back of the book. Full text search or the ability to search on any token or partial token in the document is what differentiates a search engine from a more traditional database.

Elasticsearch documentation sometimes use the term inverted index to refer to their indexes. This author believes that the term "inverted index" is just confusing and this is nothing but an index.

In the real world, you never use just one node. You will use an elasticsearch cluster with multiple nodes. To scale horizontally, elasticsearch partitions the index into shards that get assigned to nodes. For redundancy, the shards are also replicated, so that they are available at multiple nodes.

3.0 Install ElasticSearch

Download from https://www.elastic.co/downloads/elasticsearch the latest version of elasticsearch. You will download elasticsearch-version.tar.gz.

Untar it to a directory of your choice.

4.0 Start ElasticSearch


For this tutorial we will use just a single node. The rest of the tutorial will use curl to send http requests to a elasticsearch node to demonstrate basic functions. Most of it is self explanatory.

To start elasticsearch type

install_dir/bin/elasticsearch

To confirm it is running

curl -X GET "localhost:9200/_cat/health?v"

5.0 Create an index


Let us create a index person to store person information such as name , sex , age , person etc

curl -X PUT "localhost:9200/person"{"acknowledged":true,"shards_acknowledged":true,"index":"person"}

List the indexes created so far

curl -X GET "localhost:9200/_cat/indices?v"

health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   person   AJCSCg0gTXaX6N5g6malnA   5   1          0            0      1.1kb          1.1kb

6.0 Add Documents


Let us add a few documents to the person index.
In the url, _doc is the type of document. It is way to group documents of a particular type
In /person/_doc/1, the number 1 is the id of the document we provided. If we do not provide an id , elasticsearch with generate an id.
You will notice that the data elasticsearch accepts is JSON.

curl -X PUT "localhost:9200/person/_doc/1" -H 'Content-Type: application/json' -d'
{
  "name": "Big Stalk",
  "sex":"male",
  "age":41,
  "interests":"Hiking Cooking Reading"
}
'
curl -X PUT "localhost:9200/person/_doc/2" -H 'Content-Type: application/json' -d'
{
  "name": "Kelly Kidney",
  "sex":"female",
  "age":35,
  "interests":"Dancing Cooking Painting"
}
'

curl -X PUT "localhost:9200/person/_doc/3" -H 'Content-Type: application/json' -d'
{
  "name": "Marco Dill",
  "sex":"male",
  "age":26,
  "interests":"Sports Reading Painting"
}
'

curl -X PUT "localhost:9200/person/_doc/4" -H 'Content-Type: application/json' -d'
{
  "name": "Missy Ketchat",
  "sex":"female",
  "age":22,
  "interests":"Singing Cooking Dancing"
}
'

curl -X PUT "localhost:9200/person/_doc/5" -H 'Content-Type: application/json' -d'
{
  "name": "Hal Spito",
  "sex":"male",
  "age":31,
  "interests":"Sports Singing Hiking"
}

'

7.0 Search or Query

The query can be provided either as a query parameter or in the body of a GET. Yes, Elasticsearch accepts query data in the body of a GET request. 


7.1 Query string example


To retrieve all documents:

curl -X GET "localhost:9200/person/_search?q=*"

Response is not shown to save space.

Exact match search as query string:

curl -X GET "localhost:9200/person/_search?q=sex:female"

{"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":2,"max_score":0.18232156,"hits":[{"_index":"person","_type":"_doc","_id":"2","_score":0.18232156,"_source":
{
  "name": "Kelly Kidney",
  "sex":"female",
  "age":35,
  "interests":"Dancing Cooking Painting"
}
},{"_index":"person","_type":"_doc","_id":"4","_score":0.18232156,"_source":
{
  "name": "Missy Ketchat",
  "sex":"female",
  "age":22,
  "interests":"Singing Cooking Dancing"
}


7.2 GET body examples


Query syntax when sent as body is much more expressive and rich. It merits a blog of its own.
This query finds persons with singing and dancing in the interest field. This is full text search on a field.

curl -X GET "localhost:9200/person/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "should": [
        { "match": { "interests": "singing" } },
        { "match": { "interests": "dancing" } }
      ]
    }
  }
}'

{"took":15,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":3,"max_score":0.87546873,"hits":[{"_index":"person","_type":"_doc","_id":"4","_score":0.87546873,"_source":
{
  "name": "Missy Ketchat",
  "sex":"female",
  "age":22,
  "interests":"Singing Cooking Dancing"
}
},{"_index":"person","_type":"_doc","_id":"5","_score":0.2876821,"_source":
{
  "name": "Hal Spito",
  "sex":"male",
  "age":31,
  "interests":"Sports Singing Hiking"
}
},{"_index":"person","_type":"_doc","_id":"2","_score":0.18232156,"_source":
{
  "name": "Kelly Kidney",
  "sex":"female",
  "age":35,
  "interests":"Dancing Cooking Painting"
}

Below is a range query on a field.

curl -X GET "localhost:9200/person/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "age": [
        { "gte": 30, "lte":40 }

      ]
    }
  }
}'

{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"person","_type":"_doc","_id":"5","_score":1.0,"_source":
{
  "name": "Hal Spito",
  "sex":"male",
  "age":31,
  "interests":"Sports Singing Hiking"
}
},{"_index":"person","_type":"_doc","_id":"2","_score":1.0,"_source":
{
  "name": "Kelly Kidney",
  "sex":"female",
  "age":35,
  "interests":"Dancing Cooking Painting"
}
}]}}

8.0 Update a document



$curl -X POST "localhost:9200/person/_doc/5/_update" -H 'Content-Type: application/json' -d'
{
  "doc": { "name": "Hal Spito Jr" }
}

'

After executing the above update, do a search for "Jr". The above document will be returned.


9.0 Delete a document



curl -X DELETE "localhost:9200/person/_doc/1"

This will delete the document with id for 1. Any searches will not return this document anymore

10. Delete Index

curl -X DELETE "localhost:9200/person"
{"acknowledged":true}

That deletes the index we created.


11. Conclusion


This has been a brief introduction of elasticsearch just enough to get you started. There are lot of more details in each category of APIs. We will explore them in subsequent APIs.