Scaling Accurate Demand Forecasting for a Leading Grocery Retailer: 85% Faster DeepAR Insights on AWS

About Client

A leading online grocery company aimed to improve forecast accuracy and reduce out-of-stock rates to under 2% by upgrading their existing demand forecasting solution. While they were already using DeepAR at the FC level, they engaged our team to implement the same ML-based approach at the DC level and optimize model training and inference time. By migrating from traditional statistical models and refining the DeepAR pipeline using AWS services like SageMaker, S3, and EC2, the solution enabled faster execution and laid the foundation for scalable, accurate forecasting across the supply chain.

Problem Statement

A leading online grocery company aimed to improve demand forecasting accuracy and operational efficiency across its large-scale supply chain network. While the DeepAR model was already implemented at the Fulfillment Center (FC) level (~600 FCs), the statistical models used at the Distribution Center (DC) level (~150 DCs, 26,000 SKUs) struggled to meet the company’s key target of 98% availability, achieving only ~50% availability in practice. The forecasting system faced several technical challenges:

Accuracy Gaps at DC Level: Statistical models failed to deliver the required accuracy for DC-level forecasts, directly impacting availability and planning decisions.
High Training & Inference Time: For example, training a DC with 26K SKUs took over 3 days and 7 hours before optimization. Post-optimization with dynamic batch sizing, this was reduced to 15 hours and 10 minutes. Similarly, FC 312 with ~8K SKUs saw training time drop from 7 hours 10 minutes to 3 hours 40 minutes.
Inefficient Serial Forecasting: Inference was originally run in a serial fashion, generating 60-day forecasts one day at a time. For instance, on FC 312, this process took over 3 hours. After implementing batch-wise parallel processing (batch size = 20), inference time for the same store was reduced to just 24 minutes, significantly improving execution efficiency.
Scalability & Automation Needs: With model retraining scheduled monthly and forecasting across thousands of SKU-FC/DC combinations, the system needed a scalable, reliable, and maintainable pipeline architecture that could handle growing business demand.

The company partnered with our team to re-engineer the existing pipeline, apply performance profiling, optimize batch-wise parallel forecasting, and implement DeepAR-based forecasting at the DC level using AWS infrastructure.

Solution

Oneture Technologies implemented a scalable and cost-effective demand forecasting solution on AWS, supporting both Fulfillment Center (FC) and Distribution Center (DC) level forecasting using machine learning models. The system was designed to support high-volume data ingestion, parallelized model training, and automated evaluation pipelines with full infrastructure monitoring and cost optimization.

Key Solution Features:

Modular ML Training Pipelines: Designed reusable, parameterized training pipelines for different business units (e.g., BBNow FC, DC, Integrated) using SageMaker Pipelines, supporting 30+ sites concurrently.
Custom Dockerized Environment: Built and deployed custom Docker images for consistent, dependency-managed training environments across SageMaker.
Parallel Training Architecture: Leveraged ProcessPoolExecutor to run forecasting models across multiple FCs and DCs simultaneously, drastically reducing end-to-end training time.
Dynamic Resource Utilization: Dynamically scaled compute usage to 75% of available vCPUs, using m5.4xlarge instances for FC-level training and m5.12xlarge instances for DC-level training, ensuring optimal parallel execution based on workload size.
Data Versioning & S3 Integration: Integrated S3-based input/output versioning and used metadata tagging for traceable model artifacts, metrics, and logs.
Automated Forecast Evaluation: Implemented custom evaluation logic for RMSE, p50/p90 quantiles, and item-level performance to select best models per item-bucket combination.
Elastic Scalability & Monitoring: Enabled scalability across training jobs using SageMaker's distributed compute and integrated CloudWatch for logging and failure monitoring.
Cost Optimization & Control: Separated compute for FC and DC training to balance cost and performance, using spot instances where appropriate and shutting down idle resources post-training.

Value Delivered

85% Reduction in Forecasting Latency: Reduced end-to-end model training and inference time from 2.5 hours to under 20 minutes per FC/DC.

Optimized Resource Utilization: Achieved dynamic scaling of vCPUs (75% usage) with parallel processing, enabling faster training with efficient cost control.

Improved Forecast Accuracy: Implemented quantile-based DeepAR evaluation logic and item-level RMSE selection, resulting in more accurate and reliable forecasts.

Scalable Architecture for Multi-FC Support: Designed to support 100+ Fulfillment Centers and Dark Stores with modular SageMaker Pipelines.

Automated & Reproducible Pipelines: Fully automated training, evaluation, and model upload workflows using Dockerized SageMaker environments.

Seamless Integration with BB Infrastructure: Integrated tightly with existing S3 storage, CloudWatch monitoring, and internal scheduling systems for production readiness.

Cost-Efficient Compute Strategy: Leveraged instance separation (m5.4xlarge for FC, m5.12xlarge for DC) and spot instances for ~30–40% infrastructure cost savings.

Lessons Learned

Right-Sizing Matters: Choosing appropriate EC2 instance types (e.g., m5.4xlarge for FCs and m5.12xlarge for DCs) was critical to balancing cost and performance. Overprovisioning leads to wasted cost; underprovisioning causes timeouts and delays.

Parallelization Drives Efficiency: Migrating from serial to parallel model training significantly improved execution speed (up to 80% faster). However, it required careful tuning of batch sizes and memory utilization.

Dynamic Resource Utilization Is Key: Dynamically allocating 75% of available vCPUs allowed optimal performance while avoiding system overload. Static configurations were either too expensive or too slow.

Monitoring Prevents Failures: Adding memory monitoring (e.g., via /proc/self/status) and CPU load tracking helped detect bottlenecks early, preventing job crashes.

One Size Doesn’t Fit All: FCs and DCs required different scaling strategies due to their data volume and complexity. Custom logic per workload was more effective than a generalized solution.

Automation Enhances Reliability: Automating the end-to-end training pipeline, including Docker builds, data prep, and SageMaker training, reduced manual errors and ensured repeatability.

Cost Awareness Should Be Proactive: Forecasting and budgeting for SageMaker and EC2 costs upfront helps avoid surprises. Retrospective cost breakdowns are useful but ideally should be planned early.