ELK Search

ELK powers the Search for Bank Service Monitoring and Anomaly Detection

Written by Ashnik Team

| Jun 19, 2019

4 MIN READ

Our client is one of the global leaders in financial services technology, with a focus on retail and institutional banking, payments, asset and wealth management, risk & compliance, consulting and outsourcing solutions.
The client’s various financial services support transactions made by users, transactions in convenience stores, transactions using payment cards, internet banking, ATMs and other various mediums. It provides infra and support to small and mid-size banks along with payment switches, reducing their development and processing costs.
Previously any failure occurring in any of the payment systems was notified to technical staff responsible for monitoring systems via email or ServiceNow. This meant that the client’s IT support team was responsible for reporting failures to the relevant infra department and, in turn, the infra department would report incident details to the operations team where the failure had occurred. The IT team was solely responsible for delivering information about failures, tracing the cause of the failures, fixing them, and making the bank aware of the situation. Analysing these huge logs manually was hugely time consuming and tedious. This also prevented delivery of immediate and complete oversight of relevant failure data to the IT, infra and operations departments.
Hence, the client approached Ashnik for a solution to make their system more efficient and capable of being integrated with various sources for data ingestion. The client was also using multiple monitoring tools to track different issues. They were looking to consolidate all the error information from various sources into a single dashboard. This would help them reduce the latency in sharing the incidences with the team.
After having multiple discussions with the client on improving performance and integrating their monitoring systems, we came up with a proposition which comprised of a 3 node ELK cluster. This would provide a real-time analysis of the infrastructure and their customer transactions related data:
Kibana, with easy-to-use dashboards, was identified as the ideal product for real-time monitoring and analysis to share that information across their other departments and business units using one single monitoring platform.
Logstash was utilized to regularly update data within Elasticsearch using beats for transactional related logs and cluster performance.
Ashnik also helped the client to design a robust architecture with higher availability capabilities to reduce data loss at any level. Coordinating nodes were added to route searches on data nodes to render information more effectively. This not only provided a real time monitoring solution but also opened the gates to predictive analysis and machine learning.
elk-19062019
Thus, the monitoring team at the client’s end created dashboards that show ‘up’ status for various applications in green and ‘down’ services in red. The team monitoring these dashboards can quickly catch changes in the status of running and failed services, and act on them swiftly.
The other areas they can now work on more intuitively while quickly pinpointing transactional level problems are:

  • Monitoring daily transactions by the customers, in pending, or queued status. Effectively taking actions on pending queue in real time
  • Monitoring performance by designing a comparison chart, on various transactions in successful or failed state
  • Monitoring infrastructure related insights like:
    • Top Hosts by memory
    • CPU Usage Gauge
    • Performance
  • Monitoring number of unique users who are active at a particular time and visit a particular customer site. (This information is used to measure an online business’s general health and assessing the efficacy of its marketing campaigns. It also helps in gauging both present and potential customers’ experience.)elk-19062019-02

    (Monitoring Dashboard)

  • If any changes are noted in above, the team can quickly drill down on transactions at both the payment method level and application level to discover the origin of the issue and fix it in near real time.

On the other hand, the client also leveraged ‘alert’ mechanism to capture errors and send out emailer notification to respective team immediately. Watchers are triggered every 1 to 5 minutes that notifies the engineer on the defined parameters and threshold, such as:

  • Watcher to send a warning & critical message based on the number of pending requests.

For example, if the number of daily pending requests are less or more than certain threshold then it sends warning or critical message alert.

  • Watcher to create an alert based on response codes which implies error due service down or application URL not reachable.
  • Watcher to create an alert based on Latency spike based on process time. Any time lag will signify network related issues.
  • Watcher to create alert if there is any request time out error.

And many more.
To conclude, Ashnik along with the client was able to build a robust system that helped the client to deliver services to their customers more proactively and efficiently. This use case will keep evolving into more scalable solutions integrating more sources. The client has recently expanded the use of the Elastic Stack for data visualization for their new applications and plans to roll out at the regional/different geographical levels as well.
Want to design a custom solution for your applications? Get in touch with us on success@ashnik.com.


Go to Top