Zero-Downtime Server Replacement at Scale — Upgrading a High-Volume Payments Log Infrastructure
- Industry : Payments
- Technology : Elastic Stack
- Kubernetes
- Engagement : Infrastructure Migration
1,00,000
Events per second post-migration
30 days
Log retention achieved
30 LBs
Log sources, up from 2
0
Minutes of downtime
CUSTOMER OVERVIEW
This engagement builds on an earlier transformation where Ashnik helped the same payment solution company overhaul log management using the Elastic Stack. At the close of that phase, the platform ingested ~20,000 events per second from 2 load balancers with a 10-day retention window.
Since then, ingest grew to 1,00,000 events per second across 30 load balancers, and the business required 30-day log retention. The existing infrastructure — 6 physical servers running 21 Elasticsearch pods, with uneven disk, CPU, and RAM configurations — could no longer sustain this load. A full hardware replacement was unavoidable. The constraint: the platform could not go down.
THE CHALLENGE
ASHNIK’S APPROACH
Rather than treating this as a hardware swap, Ashnik designed an application-aware migration strategy that worked
within Kubernetes and Elasticsearch constraints — not around them.
BEFORE & AFTER
Outcome
Conclusion
By treating a physical server replacement as an application-aware migration, Ashnik demonstrated that even the most constrained stateful infrastructure can be upgraded without disruption. The key was working with Elasticsearch’s shard allocation controls and Kubernetes’ pod lifecycle — not around them.
For a payments platform where downtime is not an option, this methodical, phased approach delivered a complete infrastructure modernization while the system continued processing transactions at full capacity. Large systems change safely not through disruption, but through controlled, phased, application-aware execution.