Buy me a hot chocolate! | About Me | Never miss a post, subscribe via RSS
If you’re on AWS for your cloud AND using Snowflake for data warehousing AND needing a way to ingest events into Snowflake then this is a bullet proof way to go.
It’s this simple.
Given Snowflake is often about analytics and often times that means order does not matter (but if your events are schema’d right there’s timestamps on the event and on the ingestion) then this setup works beautifully:
SNS (non-fifo) -> Data Firehose -> Snowflake
That’s it.
You don’t need ordered events it’s going to a data warehouse to get aggregated.
Non-Fifo SNS means you can directly connect it to Data Firehose which can also natively write to Snowflake.
And that’s it.
I have seen it at the $DAYJOB scale from 1 event a sec, even intermittently, to constant and bursty workflows doing hundreds of events ingested per second while also getting the safety of storing all or errors only on S3. Redrive policies with dead letter queues and all that jazz make this a fully autonomous setup.
The real flow is more or less like this:
SNS (non-fifo) ->
-> SQS DLQ (for any events that fail to copy to Firehose)
-> Data Firehose
-> S3 (write events that can't be deployed to Snowflake, or everything, even compression)
-> Snowflake -> Dynamic Table -> profit!
All of this is very much doable in straight forward Terraform.
It works very well with schema-on-read workflows, where the idea is ELT not ETL.
Extraction (E) here is the SNS to Firehose stuff, the Load (L) part is it landing in a table of JSON events and then Transform (T) happens by using a Dynamic Table
My favorite part about all of this is that it’s fully managed, it scales automatically, and it’s stupdendously cheap. I can’t disclose it exactly but it’s pennies an hour and about 100x cheaper than what Fivetran would cost.
Copyright Alex Narayan - 2025