×
Case Studies Blog Careers Technologies
Migrating Historical Data From Snowflake to Redshift for One of the Largest Logistics and Supply Chain Company
Industry
Logistics and Supply Chain
Technologies
AWS Sagemaker, Amazon Redshift
Problem Statement

One of India’s largest logistics and supply chain companies, generates massive volumes of operational and transactional data daily. Over time, this resulted in the accumulation of over XX TB of historical data across ~XXX Snowflake tables. While Snowflake had served the company's analytical needs well initially, it began to pose challenges:

  • High recurring storage and compute costs, especially for infrequently accessed historical data
  • Limited integration with emerging AWS-native tools for machine learning and analytics
  • A growing need for audit-ready, long-term archival, driven by compliance and regulatory mandates
  • Desire to standardize analytics on a single cloud platform — AWS

To address these needs, client undertook a strategic migration of its historical data from Snowflake to Amazon Redshift and Amazon S3, with a future-ready architecture designed to enable both cost-efficiency and advanced analytics.

Why Migrate? (Business Drivers)

Business Need Challenge with Snowflake Redshift/S3 Benefit
Cost Optimization Snowflake's compute/storage model was proving expensive for data that's infrequently queried Redshift + S3 lifecycle storage significantly cuts costs for cold data
Audit & Regulatory Compliance Regulatory mandates required historical data availability & traceability S3-based versioned backups + Redshift Spectrum = reliable audit trail
Unified Analytics Platform Fragmented setup made cross-system queries harder Redshift + Spectrum allow integrated querying across recent & historical data
ML & GenAI Readiness Snowflake limits flexibility in ML pipeline integration S3-based data lake supports downstream ML/GenAI workloads on AWS
Cloud Strategy Alignment Snowflake was a silo outside AWS-native stack Redshift, EC2, S3 bring tighter integration with existing AWS ecosystem
Solution

A two-level S3 staging approach was implemented to ensure flexibility, data integrity, and recoverability:

Step Description
1. Schema Conversion Used AWS Schema Conversion Tool (SCT) to translate Snowflake schema to Redshift-compatible DDL
2. Data Extraction Exported data from Snowflake to Amazon S3 as STRING-type Parquet files to avoid format/type issues
3. S3 Level 1 - Raw Staging Raw backup of all exported data (STRING Parquet), version-controlled for audit and rollback
4. S3 Level 2 - Transformed Staging Converted STRING to typed format using SCT mapping + EC2 batch scripts (Python)
5. Load into Redshift Redshift COPY commands used to batch load 20–30 tables/day
6. Retention & Archival S3 Lifecycle Policies applied: older data moved to Glacier or Intelligent Tiering
7. Query Optimization Redshift Spectrum used to query archived data directly from S3
 
Validation, Cutover, and Rollback
  • Validation: Row count checks, NULL handling, timestamp/boolean verification, and JSON parsing
  • Rollback Plan: Reuse S3 backup, restart EC2 scripts from checkpoints, reload Redshift as needed
  • Cutover Strategy: Final delta export, dashboard redirection to Redshift, write-freeze on Snowflake, final S3 backup, Snowflake sunset

Proposed Solution Approach

Fig : Historical Data Migration from Snowflake to Redshift

Fig : Overall Solution Architecture for Incremental Data

Value Delivered
  • All dashboards and BI tools now connect directly to Redshift, streamlining reporting.
  • No dependency on Snowflake going forward—licenses and compute warehouses are being decommissioned.
  • Ability to scale analytics with pay-as-you-go S3 querying using Redshift Spectrum.
  • Future-ready architecture—historical data is now usable in ML pipelines and GenAI models for predictive decision-making.

This migration was not just a technical re-platforming. It will deliver:

  • Cost Reduction of up to 60–70% on historical data storage
  • Unified AWS Analytics Stack simplifying architecture using Redshift, S3 and Spectrum to power future AI/ML initiatives (route optimization, delivery time prediction, etc.)
  • Operational Efficiency by aligning dashboards, MIS reports, and analytics with a single platform
  • Improved Data Governance with better traceability, validation, and rollback capabilities
  • Compliance & Audit Readiness with Version-controlled backups and audit-friendly S3 storage
Technologies
  • AWS SCT: Schema conversion from Snowflake to Redshift
  • EC2 (On-Demand):  Hosts Python scripts for SnowflaketoS3 export and S3toS3 transformation
  • Amazon S3: Serves as both staging area and longterm storage (Level 1 & 2)
  • Redshift: Data Warehousing Solution
  • Aurora Zero ETL: Zero ETL for ingesting incremental data
  • Step Function: Orchestration of ETL pipeline
  • Cloudtrail/Cloudwatch: For logging and monitoring
  • IAM/KMS/Secrets Manager: Security services
  • Sagemaker:  ML model deployment