Migrating Historical Data From Snowflake to Redshift for One of the Largest Logistics and Supply Chain Company

Problem Statement

One of India’s largest logistics and supply chain companies, generates massive volumes of operational and transactional data daily. Over time, this resulted in the accumulation of over XX TB of historical data across ~XXX Snowflake tables. While Snowflake had served the company's analytical needs well initially, it began to pose challenges:

High recurring storage and compute costs, especially for infrequently accessed historical data
Limited integration with emerging AWS-native tools for machine learning and analytics
A growing need for audit-ready, long-term archival, driven by compliance and regulatory mandates
Desire to standardize analytics on a single cloud platform — AWS

To address these needs, client undertook a strategic migration of its historical data from Snowflake to Amazon Redshift and Amazon S3, with a future-ready architecture designed to enable both cost-efficiency and advanced analytics.

Why Migrate? (Business Drivers)

Business Need	Challenge with Snowflake	Redshift/S3 Benefit
Cost Optimization	Snowflake's compute/storage model was proving expensive for data that's infrequently queried	Redshift + S3 lifecycle storage significantly cuts costs for cold data
Audit & Regulatory Compliance	Regulatory mandates required historical data availability & traceability	S3-based versioned backups + Redshift Spectrum = reliable audit trail
Unified Analytics Platform	Fragmented setup made cross-system queries harder	Redshift + Spectrum allow integrated querying across recent & historical data
ML & GenAI Readiness	Snowflake limits flexibility in ML pipeline integration	S3-based data lake supports downstream ML/GenAI workloads on AWS
Cloud Strategy Alignment	Snowflake was a silo outside AWS-native stack	Redshift, EC2, S3 bring tighter integration with existing AWS ecosystem

Solution

A two-level S3 staging approach was implemented to ensure flexibility, data integrity, and recoverability:

Step	Description
1. Schema Conversion	Used AWS Schema Conversion Tool (SCT) to translate Snowflake schema to Redshift-compatible DDL
2. Data Extraction	Exported data from Snowflake to Amazon S3 as STRING-type Parquet files to avoid format/type issues
3. S3 Level 1 - Raw Staging	Raw backup of all exported data (STRING Parquet), version-controlled for audit and rollback
4. S3 Level 2 - Transformed Staging	Converted STRING to typed format using SCT mapping + EC2 batch scripts (Python)
5. Load into Redshift	Redshift COPY commands used to batch load 20–30 tables/day
6. Retention & Archival	S3 Lifecycle Policies applied: older data moved to Glacier or Intelligent Tiering
7. Query Optimization	Redshift Spectrum used to query archived data directly from S3

Validation, Cutover, and Rollback

Validation: Row count checks, NULL handling, timestamp/boolean verification, and JSON parsing
Rollback Plan: Reuse S3 backup, restart EC2 scripts from checkpoints, reload Redshift as needed
Cutover Strategy: Final delta export, dashboard redirection to Redshift, write-freeze on Snowflake, final S3 backup, Snowflake sunset

Proposed Solution Approach

Fig : Historical Data Migration from Snowflake to Redshift

Fig : Overall Solution Architecture for Incremental Data

Value Delivered

All dashboards and BI tools now connect directly to Redshift, streamlining reporting.
No dependency on Snowflake going forward—licenses and compute warehouses are being decommissioned.
Ability to scale analytics with pay-as-you-go S3 querying using Redshift Spectrum.
Future-ready architecture—historical data is now usable in ML pipelines and GenAI models for predictive decision-making.

This migration was not just a technical re-platforming. It will deliver:

Cost Reduction of up to 60–70% on historical data storage
Unified AWS Analytics Stack simplifying architecture using Redshift, S3 and Spectrum to power future AI/ML initiatives (route optimization, delivery time prediction, etc.)
Operational Efficiency by aligning dashboards, MIS reports, and analytics with a single platform
Improved Data Governance with better traceability, validation, and rollback capabilities
Compliance & Audit Readiness with Version-controlled backups and audit-friendly S3 storage

Technologies

AWS SCT: Schema conversion from Snowflake to Redshift
EC2 (On-Demand): Hosts Python scripts for SnowflaketoS3 export and S3toS3 transformation
Amazon S3: Serves as both staging area and longterm storage (Level 1 & 2)
Redshift: Data Warehousing Solution
Aurora Zero ETL: Zero ETL for ingesting incremental data
Step Function: Orchestration of ETL pipeline
Cloudtrail/Cloudwatch: For logging and monitoring
IAM/KMS/Secrets Manager: Security services
Sagemaker: ML model deployment