No Comments

How to explore data using Elastic search and Kibana? Part – 1

Tushar Raut I Full Stack Developer, Ashnik
Mumbai, 18 Jun 2018
Tushar-pic

by , , No Comments

18-Jun-2018

Introduction:

Exploratory Data Analysis (EDA) helps to uncover the underlying structure of data and its dynamics through which we can maximize the insights. EDA is also critical to extract important variables and to detect outliers and anomalies. Even though there are many algorithms in Machine Learning, EDA is one of the most critical parts to understand and drive the business.

In this part of the article, I am going to talk about installation and configuration of Elasticsearch and Kibana with an x-pack basic license, indexing data using python and some use cases of elastic stack’s Graph with sample dashboards.

Elastic Search:

Elasticsearch is a highly scalable open-source full-text search and analytics search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is a RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data, so you can discover the expected and uncover the unexpected.

Some use cases:
1. Movie Recommendation system
2. Loan predictions system
3. Online web store for various products and design recommendation system based on the history of purchase.
4. Price alerting about various products like I am interested in buying some mobile phone and I want to be notified if the price of gadget falls below $X from any provider within the next month.

Installation of Elastic Search

1. Install java, elastic search requires at least java
2. Download the latest Elastic search 6.3.0 tar as follows:
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.0.tar.gz
3. Then extract it as follows:
tar -xvf elasticsearch-6.3.0.tar.gz
4. It will then create a bunch of files and folders in your current directory. We then go into the bin directory as follows:
cd elasticsearch-6.3.0/bin
And now we are ready to start our node and single cluster using the command:
./elasticsearch

Elasticsearch instance should be running at http://localhost:9200 in your browser if you run with default configuration.
Keep the terminal open where the above command elastic search is running to be able to keep the instance running. you could also use nohup mode to run the instance in the background.
nohup ./elasticsearch &

Kibana:

Kibana is an open source data exploration and visualization tool built on Elastic Search to help you understand data better. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster. Users can create bar, line and scatter plots, or pie charts and maps on top of large volumes of data.

Installation of Kibana:

1. Download the latest Kibana for Windows () and for Linux()
2. Unzip or Untar the file and open config/kibana.yml in an editor. Set elasticsearch.url to point at your Elasticsearch instance in our case it should be like localhost:9200
3. For Windows Run bin/kibana or bin\kibana.bat
4. Open http://localhost:5601 which will show you the Kibana UI.
Note: If you are using an x-pack in Kibana then the default username is elastic and password is changeme.

If are not using x-pack, then the Kibana URL(http://localhost:5601) will redirect you on the main page and if you have installed the x-pack, the first page will look like:
(In this article, I am using x-pack).
image-1

Creating Dashboards:

A Kibana dashboard displays a collection of saved visualizations.

image-2

The first page of Kibana UI

image-3

Sample Dashboard:

Indexing data

Elastic Search indexes data into its internal data format and stores them in a basic data structure like a JSON object. Below is the python code to insert datainto ES. Install elasticsearch library as shown for indexing through python.

pip install elasticsearch

Note: The code assumes that the elastic search is running on localhost with default configuration.

Creating Simple Index using Python:

1. Create py file and copy following code.
from datetime import datetime
from elasticsearch import Elasticsearch
es = Elasticsearch()  # This line will change for x-pack,
# need to add user name and password.
doc = {
‘author’: ‘tushar raut’,
‘text’: ‘Elasticsearch: ELK stack is cool.’,
‘timestamp’: datetime.now(),

}
res = es.index(index="test-index", doc_type='tweet', id=1, body=doc)
print(res['created'])
res = es.get(index="test-index", doc_type='tweet', id=1)
print(res['_source'])
es.indices.refresh(index="test-index")
res = es.search(index="test-index", body={"query": {"match_all": {}}})
print("Got %d Hits:" % res['hits']['total'])
for hit in res['hits']['hits']:
print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])

2. Execute the above python script: python index_test.py
3. Go to the Kibana à Management à Index Patterns, there you will see the index which is created in Elasticsearch.
4. Create an index and click on discover menu to see data within that index.

Creating Graph:

There are potential relationships living among the documents in your Elastic Stack; linkages between people, places, preferences, products, you name it. Graph offers a relationship-oriented approach that lets you explore the connections in your data using the relevant capabilities of Elasticsearch.

Graph is an API- and UI-driven tool that helps you surface relevant relationships in your data while leveraging Elasticsearch features like distributed query execution, real-time data availability, and indexing at any scale.

Use cases:

1. Fraud: Discover which vendor is responsible for a group of compromised credit cards by exploring the shops where purchases were made.
2. Recommendations: Suggest the next best song for a listener who digs Mozart based on their preferences and keep them engaged and happy.
3. Security: Identify potential bad actors and other unexpected associates by looking at external IPs that machines on your network are talking to.

Example:

We have a very good example of movie recommendation system. The source and data is available here:

The simple graph is just to recommend movies based on parameters like number of likes for a movie for the respective year.

1. Click on Graph, then select index, click on + icon to select fields.
2. Add some movie name in search bar and click on the search icon.
3. The graph will be shown as follows:

image-4

Another example of graph based on security analysis:

image-5

So, from the above graph, it becomes very easy to understand which movies are highly liked by people – the movie ”Rocky” was liked by people who also liked Rocky-II, Jaws, and some others. And this way, the graph makes life easy to understand the insights of data by plotting and visualizing using Elastic stack and graph feature.

So, in this article, I wanted to cover the first step of data exploration, i.e. installation and configuration of Elasticsearch and Kibana, indexing data using python module of Elasticsearch. In the next part I’ll go through how to deal with large dataset using python and load that data into Elasticsearch for real-time search and analytics and explore that data with Elastic stack’s Machine Learning feature. Watch this space!

Tushar Raut I Full Stack Developer, Ashnik


Tushar is a Full Stack Developer at Ashnik. One of his key responsibilities is to help design, develop and test integration services for the Elastic Stack. He works closely with Solution Architect teams on the integration and implementation aspects of customer solutions.
He is experienced in working with various technologies like Python, Java, C, Bash Scripting and has also developed tools using Django and Flask Web frameworks.


3
0


More from  Tushar Raut I Full Stack Developer, Ashnik :
18-Jun-2018
Tags: , , , , , , , , , , , , ,