Log Aggregation Platform Using Elastic Stack

Storage Optimization and Search Optimization for a BFSI Enterprise

Industry : BFSI / Banking
Technology : Elasticsearch, Azure Blob Storage
Engagement : Log Aggregation + Regulatory Compliance Archival

THE CUSTOMER

A BFSI enterprise running a net banking and payments platform generating 700–800 GB of application logs daily across 35+ microservices.

THE CHALLENGE

700–800 GB daily log volume, Regulatory 11-year retention mandate, storage constraints, and a zero- downtime requirement – all at once.

THE SOLUTION

A fully architected log management platform with microservice-level indexing, dual-node rolling availability, and automated Azure Blob archival.

THE RESULT

1 second query response across 35+ indexes, 11-year regulatory compliance achieved, zero downtime during maintenance, and a fully automated archival pipeline.

700–800 GB

Application logs per day

1 sec

Query response per microservice

11 Years

Regulatory retention
mandate met

0

Minutes of downtime

CUSTOMER OVERVIEW

A BFSI enterprise operating a high-volume net banking and payments platform engaged Ashnik to design and deliver a centralised log management infrastructure capable of handling the scale, compliance, and availability demands of a live payment environment. The environment comprised 35+ microservices running across 10 Tomcat servers on a Kubernetes platform – collectively generating 700–800 GB of application logs every single day.

The platform needed to address three problems at once: ingest and structure logs at this scale, meet the Reserve Bank of India’s 11-year retention mandate through a verifiable archival strategy, and maintain availability through patching and upgrade cycles. Ashnik designed and delivered the architecture as a single integrated platform addressing all three.

THE CHALLENGE

Storage Constraints

On-premise storage could not absorb years of accumulation at this rate. A tiered approach – hot on-premise, cold on cloud – was the only viable path.

Log Volume at Scale

700–800 GB generated daily across 35+ microservices on 10 Tomcat servers. Without structure, search and incident investigation become unworkable.

Regulatory Compliance

Regulatory mandates 11-year retention of application logs. The solution needed to be verifiable and audit-ready at any point – not just stored.

Zero Downtime Mandate

A live payment environment cannot tolerate gaps in log visibility. Patching, upgrades, and maintenance had to happen without taking the platform offline.

Safe Archival

Automation was necessary, but deletion without confirmation was not acceptable. In a compliance environment, unrecoverable data loss is a regulatory
failure.

ASHNIK’S APPROACH

Distributed Log Collection via Filebeat

Five separate Filebeat instances were deployed – one per logical server grouping – rather than a single centralised agent. Each instance manages its own state independently, isolating failure risk at source. A single agent at this ingest volume would create a shared point of backpressure failure. Logs forwarded securely over TLS on port 5044 with full firewall whitelisting.

Microservice-Level Index Routing in Logstash

Logstash pipelines route logs into dedicated Elasticsearch indexes for each of the 35+ microservices. Filters – regex, grok, mutate – structure and enrich every event at ingest. UTC-to-IST timestamp normalisation is applied at this layer, ensuring the operations team sees correctly localised timestamps across every dashboard and query – without manual conversion during incident investigation.

4-Node Elasticsearch Cluster – Sized for Peak Ingest

Four nodes on Elasticsearch, each carrying deliberately separated roles – master, data, and ingest – following Elastic’s production best practice of preventing heavy indexing operations from destabilising cluster management. Mixing roles at this ingest rate risks GC pressure on the master node, which affects the entire cluster. Each node carries 5.8 TB storage, sized to sustain peak ingest while delivering 1 second query response times across all microservice indexes.

Rolling Dual-Node Architecture for Zero Downtime

Two Kibana nodes and two Logstash nodes deployed in a rolling configuration. One node stays active during any maintenance window – continuous log visibility with no interruption to operations.

Azure Blob Archival for 11-Year Regulatory Compliance

Weekly snapshots staged on NFS, compressed, and uploaded to Azure Blob Storage. Daily snapshots were considered but ruled out – at this ingest rate, snapshots compete with live indexing for disk I/O, and daily frequency would degrade cluster performance without proportional compliance benefit. Crucially, Elasticsearch snapshots after the first are incremental – only changed segments are written – so weekly frequency does not mean weekly full re-snapshots of the entire dataset. Credentials configured and whitelisted end-to-end.

Fully Automated Archival Pipeline

Snapshot, compression, and cloud upload automated via shell scripts and cron jobs – no manual effort in the regular cycle.

Manual Verification Gate Before Deletion

No local file is deleted until its presence in Azure Blob is manually confirmed. In a compliance environment where data recovery is not an option, this checkpoint is a design requirement – not an operational inconvenience.

BEFORE & AFTER

METRIC	BEFORE	AFTER
Log Retention	~No structured long-term archival	11-year Regulatory-compliant retention on Azure Blob
Storage Approach	On-premise only, hitting capacity limits	Tiered – hot on-premise, cold on Azure Blob
Query Response Time	Unstructured search across full dataset	1 seconds per microservice index
Microservice Visibility	No index-level separation	35+ dedicated indexes, full traceability
Archival Process	Manual, ad hoc	Fully automated – snapshot, compress, upload
Downtime During Maintenance	At risk	Zero – rolling dual-node architecture
Data Deletion Safety	No verification step	Manual confirmation gate before any deletion

Outcome

Query Performance

1 second response times sustained across 35+ microservice indexes at 700–800 GB daily ingest.

Data Integrity Preserved

Manual verification gate ensures no local file is deleted before confirmed cloud archival.

Full Traceability

Structured Kibana dashboards across all microservices enable fast incident investigation and transaction-level log search.

Storage Constraint Resolved

Active logs tiered on-premise, archival data moved to cloud – without burdening existing infrastructure.

Regulatory Compliance Met

11-year transaction log retention achieved through scalable, cloud-backed archival on Azure Blob Storage.

Operational Efficiency

End-to-end archival pipeline automated – ongoing operational overhead kept minimal despite the scale.

Zero Downtime

Rolling dual-node architecture for Kibana and Logstash ensured continuous availability across all maintenance cycles.

Conclusion

Ashnik’s approach to this engagement was not to deploy a stack – it was to architect a platform that could carry the weight of a BFSI compliance obligation across an 11-year window, at 700–800 GB of daily ingest, without a single point of failure. Every decision – role-separated Elasticsearch nodes, independent Filebeat instances, a manual verification gate before deletion – reflects a deliberate engineering choice, not a default configuration.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Migrating to NGINX Plus Ingress Controller: A Production-Grade Migration Plan

Revolutionize Your CX with
Unified Observability

CloudOps Automation tool for Infrastructure monitoring and deployment.

From Chaos to Control – Transforming Log Management for a Leading Payment Solution Company

Revolutionize Your CX with Unified Observability

Automate and monitor your PostgreSQL with ease.

The CloudOps Automation Tool for easy Infrastructure deployment and monitoring

Maximize Potential of Your Data with Streaming Data Pipeline Architecture

AI Is Not Failing Because of Models. It’s Failing Because of Architecture.

Watch: Building an MCP Server for PostgreSQL: Making Databases Talk to AI