Chapter 6: Real-World Example #1

🚀 Build a CSV Pipeline with S3, SNS, SQS, and Lambda

This is where serverless Ruby really shines.

Imagine a third party uploads a large CSV file to your S3 bucket, with hundreds of thousands of rows. If you try to process it in your Rails app (especially synchronously), you might block request cycles, chew through Sidekiq workers, or just crash the server.

Instead, we'll build a pipeline where:

A CSV is uploaded to S3
S3 sends an event to SNS
SNS fans out to SQS
SQS triggers your Lambda
The Lambda parses the CSV and performs your data processing

You'll be able to process large files with pennies of cost and zero infrastructure stress.

👩‍🏫 But Wait, What Are These Services?

If you read Chapter 2, you'll already have a good overview of these services, but let's briefly cover them again for anyone jumping ahead or who wants a quick refresher.

Let's take a moment to demystify what we're doing:

S3 is like Dropbox for AWS. We'll use it to store CSV files.
SNS is a megaphone, it broadcasts messages.
SQS is a mailbox, it holds messages until something picks them up.
Lambda is your serverless function, it runs Ruby code automatically when triggered.

In simple terms:

We're wiring up a file drop → to a notification → to a queue → to a Lambda function. All of this will happen automatically the moment a file lands in S3.

And don't worry, if you don't fully understand each component yet, that's totally normal. Just follow along step-by-step and you'll have a working pipeline by the end.

🔁 Overview of the Architecture

[S3 Upload]
     ↓
[SNS Topic] → (fanout potential)
     ↓
[SQS Queue] → (retry/delay logic)
     ↓
[AWS Lambda (Dockerized Ruby)]
     ↓
[Process CSV]

📋 Before We Start: Your Terraform Setup

This chapter builds on the Terraform foundation from Chapter 5. How you proceed depends on your starting point:

📖 If you completed Chapter 5: You already have an ECR repository and Lambda function set up. Perfect! You can keep your existing main.tf exactly as-is and just add the new CSV pipeline resources we'll cover in this chapter.

🚀 If you're jumping straight to this chapter: No worries! We'll show you the baseline setup and guide you through the deployment order to avoid common pitfalls.

Here's what your baseline main.tf should contain at minimum before we start adding the CSV pipeline resources:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  required_version = ">= 1.4.0"
}

provider "aws" {
  region = "us-east-1"
}

# ECR Repository (required for all scenarios)
resource "aws_ecr_repository" "ruby_lambda" {
  name = "my-ruby-lambda"
}

# Basic IAM role (we'll extend this with more permissions later)
resource "aws_iam_role" "lambda_exec" {
  name = "lambda-execution-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "lambda_basic" {
  role       = aws_iam_role.lambda_exec.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

# Lambda function (if you completed Chapter 5, you already have this)
# If you're starting fresh, we'll add this later after pushing a Docker image
resource "aws_lambda_function" "ruby_lambda" {
  function_name = "my-ruby-lambda"
  package_type  = "Image"
  image_uri     = "${aws_ecr_repository.ruby_lambda.repository_url}:latest"
  role          = aws_iam_role.lambda_exec.arn
  timeout       = 10
  memory_size   = 512
}

💡 New to this chapter? If you don't have a Lambda function yet, comment out the aws_lambda_function resource for now, we'll add it back after we build and push our Docker image to avoid deployment errors.

🧱 Step 1: Set Up the S3 Bucket

Our pipeline starts with an S3 bucket that will receive CSV uploads. When a file lands here, it will trigger our entire processing chain.

Add this to your main.tf:

resource "aws_s3_bucket" "csv_uploads" {
  bucket        = "test-july-2025-csv-uploads-bucket"
  force_destroy = true
}

The force_destroy = true lets Terraform delete the bucket even if it contains files, which is useful for development. In production, you'd want to remove this for safety.

⚠️ Important: S3 bucket names must be globally unique across all AWS accounts. Change "test-july-2025-csv-uploads-bucket" to something unique for your project.

📣 Step 2: Create the SNS Topic and Allow S3 to Publish to It

SNS (Simple Notification Service) acts as our message broadcaster. When S3 receives a file, it will publish a message to this topic. Think of it as a megaphone that announces "A new CSV file has arrived!"

Add this to your main.tf:

resource "aws_sns_topic" "csv_uploaded" {
  name = "csv-uploaded-topic"
}

resource "aws_sns_topic_policy" "allow_s3_publish" {
  arn = aws_sns_topic.csv_uploaded.arn

  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Sid       = "AllowS3Publish",
      Effect    = "Allow",
      Principal = { Service = "s3.amazonaws.com" },
      Action    = "SNS:Publish",
      Resource  = aws_sns_topic.csv_uploaded.arn,
      Condition = {
        ArnLike = { "aws:SourceArn" = aws_s3_bucket.csv_uploads.arn }
      }
    }]
  })
}

The key parts here:

aws_sns_topic creates our notification topic
aws_sns_topic_policy gives S3 permission to publish messages to this topic
The Condition ensures only our S3 bucket can publish to this topic (security best practice)

Continue reading with full access

Get unlimited access to Serverless Ruby and learn to build production-ready serverless Ruby applications.

📖

Complete guide with 8 chapters and bonus content

💻

Real-world examples with copy-paste code templates

🎯

Step-by-step tutorial and walkthroughs

Pay once to get full access to the book, including all future updates.