A leading stock exchange responsible for managing billions in daily equity and derivatives transactions. The client operates in an extremely high-frequency trading environment where system performance, scalability, and stability are mission-critical — especially during peak trading windows.
With increasing trading volumes and heightened volatility driven by algorithmic trading, the client needed a robust solution to:
Legacy benchmarking tools were insufficient to simulate realistic high-load scenarios or to provide fine-grained analysis of system behaviour under stress. Legacy systems were not equipped to handle modern market dynamics - especially the bursty nature of algo-driven trading and the concurrent flow of equity and derivatives orders.
Additional challenges included:
Oneture partnered with the client to design and build a Custom Benchmarking Simulation Platform — combining our expertise in Capital Markets, high-performance computing, and real-time systems.
Oneture designed and deployed a fully air-gapped Kubernetes cluster optimized for concurrent, low-latency trade ingestion and multi-asset class (Equity + Derivatives) support. We built all services in Golang for performance and deployed a scalable, pod-based TCP ingestion system inside the cluster.
Key features of our engagement included:
Time-Warping of Historical Order Data
Massive Load Generation
Cloud-Native Architecture
Granular Observability
The architecture centers around a highly available Kubernetes cluster (RKE2) running on RHEL 9, with one control-plane node and 10 worker nodes. Key technical components:
a. Golang-based TCP Ingestion System
b. Kubernetes-Native Pod Scaling
c. Real-Time Monitoring and Visualization
Enabled client to simulate high-stress scenarios and validate system stability before production rollout.
Helped identify performance bottlenecks early — allowing targeted infrastructure upgrades.
Reduced business risk during volatile trading days by ensuring system resilience.
Provided a reusable platform for ongoing capacity planning and future scalability studies.
Delivered full knowledge transfer and platform ownership to the client under BOT model.
Achieved sub-millisecond trade ingestion latency even at peak XXX orders/sec
Seamless horizontal scaling of TCP pods eliminated throughput bottlenecks
Provided full observability into market flows without external tools or SaaS dependencies
Enabled client teams to correlate Equity and Derivative order behaviour in near real-time — essential for regulatory alerts and market integrity
Enabled full visibility into TCP streams and real-time alerting via Grafana.
Kubernetes (RKE2), Golang, TCP, Prometheus, Grafana, Loki, Angular, Redis
Golang’s raw socket capabilities and efficient memory handling made it ideal for building low-latency TCP clients.
Kubernetes' native autoscaling and self-healing features were critical for supporting highly volatile trading volumes.
Observability (often overlooked) proved essential for gaining trust from client IT teams and for audit compliance.
Custom Kubernetes controllers (built using controller-runtime) gave us flexibility to manage pod lifecycle based on trading hours and trader load patterns.
TCP in Kubernetes is viable at scale—but it requires careful tuning of pod networking, socket reuse, and client-server handshake design.
Time-Warping Provides Realistic Scenarios: Simple replay of historical data is not enough — time-warping techniques were essential to simulate extreme volatility and stress conditions.
Scaling TCP at High Concurrency Needs Deep Tuning: Achieving stable XXX million orders per second load required careful design of TCP socket handling, buffer management, and connection pooling.
Network Throughput Is a Key Constraint: Many assumed system limits came from CPU or storage, but network bandwidth was often the first saturation point; precise throughput modelling and testing were necessary.
Observability Drives Confidence: Building in full telemetry — including metrics, logs, and visual dashboards — not only accelerated debugging but also helped client stakeholders build trust in the platform.
Cloud as a Proving Ground: Using cloud infrastructure for early-stage feasibility allowed faster iteration without disrupting production environments while still validating hardware sizing for on-premise rollout.
BOT Model Ensures Long-Term Client Ownership: Structuring the engagement with Build-Operate-Transfer allowed for full client ownership and IP transfer, ensuring sustainability beyond initial delivery.