Sunday, 14 July 2019

Upgrading the Existing Architecture to Elasticsearch | ELK Stack

In this article, we would be discussing the detailed procedure of how you can upgrade your existing architecture to Elasticsearch or ELK Stack. In one of the previous articles, we have discussed in great depth the Introduction to Elasticsearch and the ELK stack.

Existing Architecture:

Before upgrading our architecture to Elasticsearch/ELK Stack, let us have an overview of how a very simple and common architecture may look like:

Common Client-Server-Database Architecture
Common Client-Server-Database Architecture

Suppose that we have an e-commerce application running on a web server. The data is stored within a database, such as the product categories and the products themselves. So when a product page is requested, the web application looks up the product within the database, renders the page, and sends it back to the visitor’s browser. This is how a common Client-Server-Database Architecture looks like.

Now, you want to improve the search functionality on the website. As of now, you are using the database only for this purpose but databases are not a very optimal solution for text searches and hence you decide to look for a different solution and came across Elasticsearch. As a matter of fact, Elasticsearch is an excellent choice for this, but how would we integrate that to our existing architecture? Proceed to the next section for this.

Step Wise approach for Upgrading to Elasticsearch/ELK Stack:

[Step 1] - Adding Elasticsearch:

The easiest way to integrate Elasticsearch with our current architecture is to communicate with Elasticsearch from our Application. So when someone enters a search query on our website, a request is sent to the web application, which then sends a search query to Elasticsearch. When the application receives a response, it can process it and send the results back to the browser. It is as simple as that.

This communication can be easily done by plain HTTP requests or by libraries available for many programming languages. This is how our current architecture now looks like:

Integrate Elasticsearch
Integrate Elasticsearch

But now, how do we load data into Elasticsearch in the first place and how do we keep the Elasticsearch updated with the data? How would keep the Elasticsearch in sync with the database? To explain this, let us take a particular use case where we are using Elasticsearch for searches being performed on products. So, whenever a product is added or updated, you can simply perform the same action over Elasticsearch for that product.

You must be thinking that this will lead to duplication of some of the data. Yes, you are right but actually, it is the best approach which you will realize as you continue reading this. One case may also arise that what if you already have lots of data in your database? You can simply write a script that would index the whole data from your database to Elasticsearch.

[Step 2] - Adding Kibana:

Addition of Elasticsearch to the existing architecture is the perfect use case and needs no mandatory modification what so ever. But let us assume that you want a dashboard where you can visualize the revenue, no of orders per month e.t.c. We could have built an interface from scratch for that but fortunately, we have another solution provided under ELK stack, i.e, Kibana.

As we have already discussed in a previous article Introduction to Elasticsearch and the ELK stack, Kibana is a visualization platform which interacts with data over Elasticsearch via HTTP requests using APIs provided by Elasticsearch itself. This is how our current architecture now looks like:

Integrate Kibana
Integrate Kibana

[Step 3] - Adding Metricbeat and Filebeat:

As time passes by, suppose your web application really starts getting a lot of traffic. We want us to be able to handle traffic spikes and know when to add another server. Here, Metricbeat comes to rescue, which is just another tool under ELK Stack. Metricbeat enables us to monitor system-level metrics including CPU usage, Memory usage and so on. This is how our current architecture now looks like:

Integrate Metricbeat and Filebeat
Integrate Metricbeat and Filebeat

Suggested Read: Redis vs MySQL Benchmarks

[Step 4] - Adding Logstash:

Now, as the time passes by, we want to do some advanced event processing and perform advanced analysis, So far, we have been doing simple data transformation using Elasticsearch’s ingest nodes. But now, we want to do advanced data enrichment. As a matter of fact, we could do this easily into our web application itself. But it would lead to some disadvantages like:
  • Our business logic would be cluttered with event processing while that event processing is obviously not "required" for that business logic to be executed.
  • Event Processing would then be spread across our code and hence would be decentralized, making things harder to maintain.

Hence, it would be better if we can make all this data enrichment/transformation centralized. We can simply accomplish that using Logstash which is just another gem under the ELK Stack. After adding Logstash, this is how our current architecture now looks like:

Integrate Logstash
Integrate Logstash

We’ll be sending events from our web servers to Logstash over HTTP. We can configure Logstash to process the events as per our needs, and send them off to Elasticsearch. Hence, as you can see, we have decoupled the Event Processing from our business logic. Also, chances are very minimal that we would need custom parsing of system level metrics and hence we can continue sending data directly to Elasticsearch from Metricbeat.

But, in case of Access and Server logs, chances are high that we need to have a custom parsing logic of these logs and hence we can configure Filebeat to send data to Logstash instead of directly sending the same to Elasticsearch, this way we can write customized logic of parsing these logs over Logstash itself.


As a matter of fact, we can go ahead with this architecture. We can have our web application update the data in Elasticsearch directly. That’s the easiest approach for sure, but it is also more error-prone, as code errors in the application might break event processing. Ideally, our web application should only be querying Elasticsearch for data, not modifying any data, i.e, our web application should have read-only access to Elasticsearch. We should centralize everything through Logstash pipelines and hence all the Logs processing and data processing would be happening through Logstash pipelines and we would then only be querying Elasticsearch data and not directly modifying it.

Hence, this was the discussing the migration of the existing architecture to ELK Stack. Obviously, there might be many variations to this architecture as per the cases and conditions.

Liked this blog? Don't miss out on any future blog posts by Subscribing Here or you can also like our Facebook Page or follow us on Twitter

No comments:

Post a comment