×
Case Studies Blog Careers Technologies
Core Trading System benchmarking simulation platform for Capacity Planning for a Stock Exchange
Industry
Capital Markets
Technologies
Angular, Redis, Go Lang, Kubernetes
About Client

A leading stock exchange responsible for managing billions in daily equity and derivatives transactions. The client operates in an extremely high-frequency trading environment where system performance, scalability, and stability are mission-critical — especially during peak trading windows.

Problem Statement

With increasing trading volumes and heightened volatility driven by algorithmic trading, the client needed a robust solution to:

  • Accurately simulate and replay real-world trading loads.
  • Perform capacity planning and identify potential bottlenecks in the trading infrastructure.
  • Stress test their trading system components - from order management to network throughput.
  • Analyze scalability and latency across end-to-end trade flows.

Legacy benchmarking tools were insufficient to simulate realistic high-load scenarios or to provide fine-grained analysis of system behaviour under stress. Legacy systems were not equipped to handle modern market dynamics - especially the bursty nature of algo-driven trading and the concurrent flow of equity and derivatives orders.

Additional challenges included:

  • Receiving and processing hundreds of thousands of trade messages per second over raw TCP.
  • Ensuring real-time visualization and rule-based alerting for the client team.
  • Enforcing strict network isolation with zero reliance on public cloud or internet connectivity.
Oneture's Role

Oneture partnered with the client to design and build a Custom Benchmarking Simulation Platform — combining our expertise in Capital Markets, high-performance computing, and real-time systems.

Oneture designed and deployed a fully air-gapped Kubernetes cluster optimized for concurrent, low-latency trade ingestion and multi-asset class (Equity + Derivatives) support. We built all services in Golang for performance and deployed a scalable, pod-based TCP ingestion system inside the cluster. 

Key features of our engagement included:

  • Co-designing the platform architecture in close collaboration with the client’s technology and infrastructure teams.
  • Building a future-proof platform that allows the client full code ownership under a Build-Operate-Transfer (BOT) model.
  • Leveraging cloud-native capabilities for faster time-to-market while validating hardware feasibility for eventual production deployment.
  • Developing both load generation and time-warped historical order replay capabilities to simulate real-world trading peaks.
Solution

Time-Warping of Historical Order Data

  • Collected historical trade order data and applied "time-warping" algorithms to compress or expand timestamps.
  • Enabled realistic simulation of trading days under various volatility conditions.
  • Supported as-is, time-compressed (to stress test system limits), and time-expanded (to simulate prolonged high-load conditions) modes.
  • Maintained data integrity while handling edge cases like zero time deltas.

Massive Load Generation

  • Successfully simulated loads of up to XXX million order entries per second.
  • Developed custom code to generate controlled, incremental load scenarios, enabling soak testing and stepwise capacity testing.
  • Conducted detailed network throughput feasibility analysis:
    • Validated machine configurations (e.g. 15–20 Gbps network capacity) to achieve target loads.
    • Optimized packet sizes and TCP configuration for maximal throughput.

Cloud-Native Architecture

  • Built a base version on cloud infrastructure for rapid prototyping and validation.
  • Designed the platform to integrate with client’s on-premise systems for production deployments.
  • Ensured full compliance with client’s data security, confidentiality, and regulatory requirements.

Granular Observability

  • Incorporated real-time metrics, dashboards, and logs using Prometheus, Grafana, and Loki.
  • Provided full visibility into system performance, transaction latencies, and pod-level resource utilization.
  • Enabled historical playback and forensic analysis of any test run.

The architecture centers around a highly available Kubernetes cluster (RKE2) running on RHEL 9, with one control-plane node and 10 worker nodes. Key technical components:

a. Golang-based TCP Ingestion System

  • Developed a fleet of TCP client pods using Golang’s net package and goroutines to handle 1000s concurrent TCP connections.
  • Each pod runs an event-driven listener for trade orders from external Processing Engines over Layer 4.
  • Used Goroutine pools, channel buffering, and context-based cancellation for fault-tolerant message processing.
  • Orders are classified in real time into Equity and Derivatives, parsed, and pushed to Order Processing Engine.

b. Kubernetes-Native Pod Scaling

  • Deployed TCP clients as Stateful Sets for sticky sessions and horizontal scalability.
  • Configured custom resource limits and affinity rules to isolate workloads by trading segment.
  • Used native Kubernetes autoscaling (HPA + custom metrics) to spin up new client pods during trading peaks.
  • The cluster enabled segregation of traffic flows per trader ID, preventing noisy-neighbor problems during high volatility.

c. Real-Time Monitoring and Visualization

  • Prometheus scrapes pod-level metrics and node-level resource usage metrics.
  • Grafana dashboards give the client team immediate visibility into order volumes, message latency, and pod saturation.
  • Loki aggregates structured logs for every incoming connection and parsed order — useful for historical playback and debugging.