This is where serverless Ruby really shines.
Imagine a third party uploads a large CSV file to your S3 bucket, with hundreds of thousands of rows. If you try to process it in your Rails app (especially synchronously), you might block request cycles, chew through Sidekiq workers, or just crash the server.
Instead, we'll build a pipeline where:
You'll be able to process large files with pennies of cost and zero infrastructure stress.
If you read Chapter 2, you'll already have a good overview of these services, but let's briefly cover them again for anyone jumping ahead or who wants a quick refresher.
Let's take a moment to demystify what we're doing:
In simple terms:
We're wiring up a file drop → to a notification → to a queue → to a Lambda function. All of this will happen automatically the moment a file lands in S3.
And don't worry, if you don't fully understand each component yet, that's totally normal. Just follow along step-by-step and you'll have a working pipeline by the end.
[S3 Upload]
↓
[SNS Topic] → (fanout potential)
↓
[SQS Queue] → (retry/delay logic)
↓
[AWS Lambda (Dockerized Ruby)]
↓
[Process CSV]
This chapter builds on the Terraform foundation from Chapter 5. How you proceed depends on your starting point:
📖 If you completed Chapter 5: You already have an ECR repository and Lambda function set up. Perfect! You can keep your existing main.tf
exactly as-is and just add the new CSV pipeline resources we'll cover in this chapter.
🚀 If you're jumping straight to this chapter: No worries! We'll show you the baseline setup and guide you through the deployment order to avoid common pitfalls.
Here's what your baseline main.tf
should contain at minimum before we start adding the CSV pipeline resources:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
required_version = ">= 1.4.0"
}
provider "aws" {
region = "us-east-1"
}
# ECR Repository (required for all scenarios)
resource "aws_ecr_repository" "ruby_lambda" {
name = "my-ruby-lambda"
}
# Basic IAM role (we'll extend this with more permissions later)
resource "aws_iam_role" "lambda_exec" {
name = "lambda-execution-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy_attachment" "lambda_basic" {
role = aws_iam_role.lambda_exec.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}
# Lambda function (if you completed Chapter 5, you already have this)
# If you're starting fresh, we'll add this later after pushing a Docker image
resource "aws_lambda_function" "ruby_lambda" {
function_name = "my-ruby-lambda"
package_type = "Image"
image_uri = "${aws_ecr_repository.ruby_lambda.repository_url}:latest"
role = aws_iam_role.lambda_exec.arn
timeout = 10
memory_size = 512
}
💡 New to this chapter? If you don't have a Lambda function yet, comment out the aws_lambda_function
resource for now, we'll add it back after we build and push our Docker image to avoid deployment errors.
Our pipeline starts with an S3 bucket that will receive CSV uploads. When a file lands here, it will trigger our entire processing chain.
Add this to your main.tf
:
resource "aws_s3_bucket" "csv_uploads" {
bucket = "test-july-2025-csv-uploads-bucket"
force_destroy = true
}
The force_destroy = true
lets Terraform delete the bucket even if it contains files, which is useful for development. In production, you'd want to remove this for safety.
⚠️ Important: S3 bucket names must be globally unique across all AWS accounts. Change "test-july-2025-csv-uploads-bucket" to something unique for your project.
SNS (Simple Notification Service) acts as our message broadcaster. When S3 receives a file, it will publish a message to this topic. Think of it as a megaphone that announces "A new CSV file has arrived!"
Add this to your main.tf
:
resource "aws_sns_topic" "csv_uploaded" {
name = "csv-uploaded-topic"
}
resource "aws_sns_topic_policy" "allow_s3_publish" {
arn = aws_sns_topic.csv_uploaded.arn
policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Sid = "AllowS3Publish",
Effect = "Allow",
Principal = { Service = "s3.amazonaws.com" },
Action = "SNS:Publish",
Resource = aws_sns_topic.csv_uploaded.arn,
Condition = {
ArnLike = { "aws:SourceArn" = aws_s3_bucket.csv_uploads.arn }
}
}]
})
}
The key parts here:
aws_sns_topic
creates our notification topicaws_sns_topic_policy
gives S3 permission to publish messages to this topicCondition
ensures only our S3 bucket can publish to this topic (security best practice)